UNOBench tutorial

Data, starter kit, and baseline reproduction

This page is a compact guide for getting started with UNOBench. It explains which repository to use for each task and links to the detailed documentation maintained in the method, challenge, and dataset repositories.

Dataset

UNOBench on Hugging Face

Download RGB images, Set-of-Mark images, annotations, challenge queries, and metadata.

Open dataset README

Starter kit

UNOBench Challenge example

Use the minimal runnable examples to verify query files, prediction format, and local evaluators.

Open challenge README

Baseline

UnoGrasp reproduction code

Run UnoGrasp inference and evaluation on the released synthetic small split with public checkpoints.

Open method README

Which repo should I use?

Goal	Use this repo	Go to details
Download or inspect UNOBench files and metadata.	Hugging Face dataset repo	Dataset structure
Check challenge input and output format before submitting.	Challenge starter kit	Challenge quick start
Reproduce UnoGrasp inference and evaluation.	UnoGrasp method repo	Method quick start

Tutorial video

Recommended workflow

1Download the dataset

Start from the Hugging Face dataset repo. It contains the full file list, download commands, archive extraction instructions, and metadata descriptions.

hf download FBK-TeV/UNOBench \
  --repo-type dataset \
  --local-dir ./UNOBench/UNOBenchSyn

See full download instructions

2Choose an evaluation setting

UNOBench supports a Set-of-Mark setting and a Natural Language setting. Use the challenge starter kit to understand the exact query and prediction formats.

Setting	Input	Expected output	Details
SoM	Set-of-Mark image + query object ID	Obstructing object IDs	Track 1
NLP	RGB image + natural-language description of query object	Point coordinates on the obstructing objects	Track 2

3Run either the starter kit or the full method

For a format sanity check, run the challenge starter kit evaluators. For reproducing the released UnoGrasp results, use the method repo with the released checkpoints and the synthetic small split.

Dataset at a glance

Both evaluation settings can be used by VLM-based methods. The Set-of-Mark setting also supports non-language classical methods, graph-based reasoning, and modular robotic pipelines.

RGB / NLP input

Query object red and orange toy drill

Expected target output [(x1, y1)]

Note Any point on the yellow detergent bottle is accepted.

Set-of-Mark input

Query object ID 2

Expected target IDs [3]

Metadata

{
  "index": 1992,
  "image_path": "images/image_000578.png",
  "som_image_path": "images_som/image_000578.png",
  "image_id": 578,
  "query_object": {
    "obj_id": 2,
    "object_name": "red and orange toy drill"
  },
  "target_objects": [
    {
      "obj_id": 3,
      "object_name": "yellow detergent bottle"
    }
  ],
  "occlusion_paths": [[3, 2]],
  "difficulty": "Easy",
  "k_min": 1,
  "num_paths": 1,
  "only_som": 0
}

Adapting Metadata to Your Method

You can use the provided UNOBench metadata to generate datasets tailored to your own method. For example, UnoGrasp uses a prompt-generation script to convert UNOBench metadata into VQA-style data with human instruction prompts and the UnoGrasp system prompt for training and evaluation.

Reference

This challenge and dataset are based on the work presented at CVPR 2026:

@InProceedings{Jiao_2026_CVPR,
    author    = {Jiao, Runyu and Bortolon, Matteo and Giuliari, Francesco and Fasoli, Alice and Povoli, Sergio and Mei, Guofeng and Wang, Yiming and Poiesi, Fabio},
    title     = {Obstruction Reasoning for Robotic Grasping},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2026},
    pages     = {20755-20764}
}