Additional Pipeline Results

More examples of fine-grained context classification and prediction

COinCO · CVPR 2026

We provide additional examples of our proposed pipeline, fine-grained context classification examples, and the objects-from-context prediction.

More Pipeline Results

Inpainting, fake detection, and objects-from-context prediction results. Context reasoning responses are color-coded by location, size, and co-occurrence. Inpainted objects: backpack, carrot, broccoli, elephant, oven, zebra, and horse.

Inpainting, fake detection, and objects-from-context prediction results. Context reasoning responses are color-coded by location, size, and co-occurrence. Inpainted objects: chair, truck, fire hydrant, cake, train, traffic light, and cat.

Fine-Grained Context Classification

We provide concrete examples that highlight the differences in context reasoning before and after SFT using Qwen2.5-VL-3B, specifically across our three criteria: location, size, and co-occurrence, and demonstrate that SFT substantially improves both the accuracy of context classification and the plausibility of the model’s contextual reasoning as illustrated in Section 5.1.

Figure 1: Location Context Reasoning for the Sink (Before vs. After SFT). Example image from COCO.

Figure 2: Size Context Reasoning for the Dining Table (Before vs. After SFT). Example image from COinCO.

Figure 3: Co-occurrence Context Reasoning for the Toilet (Before vs. After SFT). Example image from COinCO.

Objects-from-Context Prediction

Here, we provide additional examples for the Objects-from-Context prediction task at the instance-level. Our trained multilayer perceptron is provided with an inpainted image and infers which object was present before the image was altered. Using the top prediction class, we reapply Stable Diffusion inpainting to reconstruct the original object within the mask region.

Inpainting Success Original Object Classes