UnoBench is a benchmark for obstruction-aware robotic grasping in cluttered scenes.
It is designed to evaluate whether a method can perform target-centric obstruction reasoning before grasping, either from natural-language instructions or from object-indexed visual representations.
UnoBench supports both vision-language models and traditional / modular robotic reasoning algorithms through two complementary settings:
Unlike conventional grasping datasets that mainly focus on direct object detection or grasp pose prediction, UnoBench emphasizes which objects obstruct the target object and how such obstruction relationships can be represented, measured, and reasoned about.
UnoBench contains synthetic cluttered-scene data with RGB images, Set-of-Mark images, annotations, and obstruction-related metadata.
The benchmark focuses on target-centric obstruction reasoning. Given a target object, the method is expected to identify the objects that block or constrain access to the target before grasping.
The dataset provides:
"index": 1992,
"image_path": "images/image_000578.png",
"som_image_path": "images_som/image_000578.png",
"image_id": 578,
"query_object": {
"obj_id": 2,
"object_name": "red and orange toy drill"
},
"target_objects": [
{
"obj_id": 3,
"object_name": "yellow detergent bottle"
}
],
"occlusion_paths": [
[
3,
2
]
],
"difficulty": "Easy",
"k_min": 1,
"num_paths": 1,
"only_som": 0
UnoBench/
├── images.zip
├── images_som.zip
├── annotations.zip
├── test_GT_small.json
├── test_nlp_small.jsonl
├── test_som_small.jsonl
└── meta_data/
├── Synthetic_train.json
├── image_id_scene_view_id_mapping.json
├── name_for_all.json
└── occ_info/
├── obs_information.json
└── masks.zip
| File | Description |
|---|---|
images.zip |
RGB images of synthetic cluttered scenes. |
images_som.zip |
Set-of-Mark images with object IDs / visual prompts. |
annotations.zip |
Instance segmentation masks associated with each image. You can find SoM ID for each object from here. |
| File | Description |
|---|---|
test_GT_small.json |
Ground-truth obstruction annotations for the small test split. |
test_nlp_small.jsonl |
Evaluation samples for the NLP setting. |
test_som_small.jsonl |
Evaluation samples for the SoM setting. |
| File | Description |
|---|---|
meta_data/Synthetic_train.json |
Metadata for the synthetic training split. |
meta_data/image_id_scene_view_id_mapping.json |
Mapping between image IDs, scene IDs, and view IDs. |
meta_data/name_for_all.json |
Human-annotated descriptions for each object used in the dataset. |
meta_data/occ_info/obs_information.json |
Obstruction / occlusion relationship information for each pair, including obstruction ratio, contact point and obstruction degree. |
meta_data/occ_info/masks.zip |
Instance mask for each obstruction pair. |
You can download the full dataset with:
hf download rjiao/UnoBench \
--repo-type dataset \
--local-dir ./UnoBench
Or download individual files:
hf download rjiao/UnoBench images.zip \
--repo-type dataset \
--local-dir ./UnoBench
hf download rjiao/UnoBench images_som.zip \
--repo-type dataset \
--local-dir ./UnoBench
hf download rjiao/UnoBench annotations.zip \
--repo-type dataset \
--local-dir ./UnoBench
After downloading, unzip the main archives:
cd UnoBench
unzip images.zip
unzip images_som.zip
unzip annotations.zip
unzip meta_data/occ_info/masks.zip -d meta_data/occ_info/
After extraction, the dataset should contain RGB images, Set-of-Mark images, annotation files, and obstruction-related metadata.
UnoBench provides two complementary evaluation settings to support different types of methods.
In the NLP setting, each sample provides a natural-language instruction and the corresponding RGB image.
The method needs to identify the target object from language and visual input, then reason about which objects obstruct the target before grasping.
This setting is suitable for evaluating:
Relevant file:
test_nlp_small.jsonl
In the SoM setting, each sample provides a Set-of-Mark image where objects are explicitly indexed with IDs.
This setting does not require natural-language input. Instead, the target and candidate objects can be represented through object IDs or structured object-level information.
This makes the SoM setting suitable for evaluating:
Relevant file:
test_som_small.jsonl
The ground-truth obstruction annotations used for evaluation are provided in:
test_GT_small.json
The metadata files provide scene-level and object-level information, including image IDs, scene IDs, view IDs, object names, and obstruction relationships.
Typical fields may include:
scene_id
view_id
target_object
occlusion_paths
top_objects
depends_on
num_paths
difficulty
The obstruction information is organized in a target-centric manner. For each target object, the dataset describes the objects that obstruct it and the corresponding obstruction paths.
UnoBench is designed for research on:
UnoBench focuses on high-level obstruction reasoning before grasping. It is not limited to a specific low-level grasp planner or robot controller.
Researchers can use UnoBench to evaluate whether a method can correctly identify obstructing objects before executing a grasp. The benchmark can therefore be used together with different downstream grasping pipelines, including both learning-based and classical robotic systems.