UnoBench Challenge 2026

News

  • [May 19, 2026] Challenge page schema and structural evaluation engine guidelines have been deployed!
  • [Upcoming] Keep track of this section for the official Codabench/Kaggle evaluation platform registration link and our GitHub repository deployment.

Introduction

Autonomous robotic grasping in highly cluttered environments, such as industrial bins, warehouse logistics totes, or chaotic domestic spaces, remains a fundamental bottleneck in robotics. While modern vision systems excel at identifying isolated, clearly visible surface-level objects, achieving true operational autonomy requires models to process complex semantic and geometric physical hierarchies.

To grasp a buried or partially obscured item safely, an intelligent agent cannot merely rely on standard vision-language grounding. Instead, it must implicitly map the environment, reason systematically about obstructions and physical occlusions, and determine the exact sequence of clearing actions required to safely unlock an unobstructed trajectory to the target object.


My image

Goal of the Challenge

The core objective of this workshop challenge is to accelerate the development of deep learning frameworks, specifically Vision-Language Models (VLMs) and multi-modal embodied network architectures capable of executing visually-grounded obstruction reasoning.

Participants will train and evaluate their approaches using the newly curated UNOBench dataset. Built on top of MetaGraspNetV2, UNOBench provides a rigorous benchmark for obstruction reasoning, in scenarios with different complexity levels.


Key Objectives & Evaluation Pillars

Submissions will be evaluated and ranked based on their ability to identify all the objects that should be removed first to grasp a particular object (i.e., all the last object in the occluding chain).

The challenge offers two main track:

  • Set-of-Marks (SoM): For this track the objects in the images are overlayed an ID that identifies them. The query object is also identified by its ID (i.e., "object 2"). The goal is to identify all objects that should removed first and return them as a list of object IDs (i.e., "[object 5, object 7]")
  • Natural language (NL): For this track the objects in the images do not have any overlay. The query object is also identified by a description (i.e., "the red and orange drill toy"). The goal is to identify all objects that should removed first and return them as a list of Object coordiantes (i.e., "[(x1,y1), (x2,y2)]")
Additional details regarding the images, data, and evaluation metrics are in their respective pages.

Timelines

All deadlines are strictly enforced at 23:59 UTC on the dates listed below:

Phase / Milestone Date Details
Challenge & Data Launch June 13, 2026 UNOBench challenge data, and baseline model (UNOGrasp) code released.
Evaluation Phase July 1st, 2026 The evaluation server opens. Live public leaderboard tracks participant validation submissions.
Final Test Submissions TBD Final submission deadlines.
Winner Announcement TBD Top performing architectures finalized following verification against cheating/overfitting.
Workshop Day TBD Winning teams present technical talks alongside invited keynotes at our official conference venue.

Pick Your Challenge

Two challenges, one winner. Select below.

Set-of-Marks track

Test your foundational knowledge in this introductory challenge.

Start Challenge 1

Natural Language track

A much harder test for those who have mastered the basics.

Start Challenge 2

Organizers: