Back to Main Page

Additional Pipeline Results

More examples of fine-grained context classification and prediction

COinCO · CVPR 2026

We provide additional examples of our proposed pipeline, fine-grained context classification examples, and the objects-from-context prediction.

More Pipeline Results

Fine-Grained Context Classification

We provide concrete examples that highlight the differences in context reasoning before and after SFT using Qwen2.5-VL-3B, specifically across our three criteria: location, size, and co-occurrence, and demonstrate that SFT substantially improves both the accuracy of context classification and the plausibility of the model’s contextual reasoning as illustrated in Section 5.1.

Fine-Grained Context Example 1

Figure 1: Location Context Reasoning for the Sink (Before vs. After SFT). Example image from COCO.

Fine-Grained Context Example 2

Figure 2: Size Context Reasoning for the Dining Table (Before vs. After SFT). Example image from COinCO.

Fine-Grained Context Example 3

Figure 3: Co-occurrence Context Reasoning for the Toilet (Before vs. After SFT). Example image from COinCO.

Objects-from-Context Prediction

Here, we provide additional examples for the Objects-from-Context prediction task at the instance-level. Our trained multilayer perceptron is provided with an inpainted image and infers which object was present before the image was altered. Using the top prediction class, we reapply Stable Diffusion inpainting to reconstruct the original object within the mask region.

Inpainting Success Original Object Classes