UNOBench Challenge

On-device Embodied World Models Workshop @ ECCV 2026

News

[May 19, 2026] Challenge page schema and structural evaluation engine guidelines have been deployed!
[July 1st, 2026] Challenge has been officially launched! Submission portal is online!

Introduction

Autonomous robotic grasping in highly cluttered environments, such as industrial bins, warehouse logistics totes, or chaotic domestic spaces, remains a fundamental bottleneck in robotics. While modern vision systems excel at identifying isolated, clearly visible surface-level objects, achieving true operational autonomy requires models to process complex semantic and geometric physical hierarchies.

To grasp a buried or partially obscured item safely, an intelligent agent cannot merely rely on standard vision-language grounding. Instead, it must implicitly map the environment, reason systematically about obstructions and physical occlusions, and determine the exact sequence of clearing actions required to safely unlock an unobstructed trajectory to the target object.

Goal of the Challenge

The core objective of the UNOBench Challenge is to accelerate the development of deep learning frameworks, specifically Vision-Language Models (VLMs) and multi-modal embodied network architectures capable of executing visually-grounded obstruction reasoning.

Participants will train and evaluate their approaches using the newly curated UNOBench dataset. Built on top of MetaGraspNetV2, UNOBench provides a rigorous benchmark for obstruction reasoning, in scenarios with different complexity levels.

Winners of the challenge will be able to present their work at the On-device Embodied World Models (ODEWM) Workshop at ECCV 2026.

Key Objectives & Evaluation Pillars

Submissions will be evaluated and ranked based on their ability to identify all the objects that should be removed first to grasp a particular object (i.e., all the last object in the occluding chain).

The challenge offers two main track:

Set of Marks (SoM): For this track the objects in the images are overlayed an ID that identifies them. The query object is also identified by its ID (i.e., "object 2"). The goal is to identify all objects that should removed first and return them as a list of object IDs (i.e., "[object 5, object 7]")
Natural language (NL): For this track the objects in the images do not have overlayed IDs. The query object is also identified by a description (i.e., "the red and orange drill toy"). The goal is to identify all objects that should removed first and return them as a list of Object coordiantes (i.e., "[(x1,y1), (x2,y2)]")

Additional details regarding the images, data, and evaluation metrics are in their respective pages.

Timelines

All deadlines are strictly enforced at 23:59 UTC on the dates listed below:

Jun 13 Sep 8

54% of the challenge timeline elapsed

Challenge & Data Launch Jun 13, 2026

Evaluation Phase Jul 1, 2026

Final Test Submissions Aug 30, 2026

Winner Announcement Sep 1, 2026

Workshop Day Sep 8, 2026

Phase / Milestone	Date	Details
Challenge & Data Launch	June 13, 2026	UNOBench challenge data, and baseline model (UNOGrasp) code released.
Evaluation Phase	July 1st, 2026	The evaluation server opens. Live public leaderboard tracks participant validation submissions.
Final Test Submissions	August 30, 2026	Final submission deadline.
Winner Announcement	September 1, 2026	Top performing architectures finalized following verification against cheating/overfitting.
Workshop Day	September 8, 2026	Winning teams present technical talks alongside invited keynotes at our official conference venue.

Pick Your Challenge

Two tracks, one winner. Choose your path.

Set-of-Marks track

Test your foundational knowledge in this introductory challenge.

Natural Language track

A much harder test for those who have mastered the basics.

Organizers

This challenge has been organized by the Technologies of Vision unit at Fondazione Bruno Kessler.

Reference

The UNOBench challenge and dataset are based on the work presented at CVPR 2026:

@InProceedings{Jiao_2026_CVPR,
    author    = {Jiao, Runyu and Bortolon, Matteo and Giuliari, Francesco and Fasoli, Alice and Povoli, Sergio and Mei, Guofeng and Wang, Yiming and Poiesi, Fabio},
    title     = {Obstruction Reasoning for Robotic Grasping},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2026},
    pages     = {20755-20764}
}