Project Overview
We organized the IROS 2025 Challenge, which includes two tracks: Manipulation and Navigation. The challenge was also featured in the IROS 2025 Workshop. I personally participated in maintaining the Manipulation track, namely Vision-Language Manipulation in Open Tabletop Environments. The online competition submission deadline is September 30, and everyone is welcome to participate!
Supplement: Congratulations to the successful conclusion of the IROS 2025 Hangzhou on-site final! I had the opportunity to participate in real-world robot debugging on-site and also served as one of the chief judges. After comprehensive evaluation, we are pleased to congratulate Team HonorEmbodiment for winning the Manipulation track championship.
Contributions As the evaluation lead for the manipulation track, I primarily participated in and completed the following work:
- Built the GenManip benchmark evaluation environment
- Developed the system framework from scratch based on InternUtopia
- Implemented the evaluation pipeline and protocol in InternManip
- Conducted real-robot debugging during the offline finals and served as one of the chief judges and technical advisors
Environment Composition
Tasks
-
A total of 10 manipulation tasks, divided into two categories:
- Seen (objects appearing in the training set)
- Unseen (previously unseen novel objects)
-
Each task and category contains 10 data samples, including USD files and metadata
Dataset link: huggingface dataset
validation
├── IROS_C_V3_Aloha_seen
│ ├── collect_three_glues/
│ │ ├── 000
│ │ │ ├── meta_info.pkl
│ │ │ ├── scene.usd
│ │ │ └── SubUSDs -> ../SubUSDs
│ │ ├── 001/
│ │ ├── ...
│ │ └── 009/
│ ├── collect_two_alarm_clocks/
│ ├── collect_two_shoes/
│ ├── gather_three_teaboxes/
│ ├── make_sandwich/
│ ├── oil_painting_recognition/
│ ├── organize_colorful_cups/
│ ├── purchase_gift_box/
│ ├── put_drink_on_basket/
│ └── sort_waste/
└── IROS_C_V3_Aloha_unseen
└── ...
Robots
Controllers
- Joint position control
- Inverse kinematics solver
Observation & Action Space
The following is an example using the Franka robot. For detailed specifications, please refer to the I/O specification documentation.
Observation Space (Franka)
observations: List[Dict] = [
{
"robot": {
"robot_pose": (position, orientation),
"joints_state": {
"positions": array,
"velocities": array
},
"eef_pose": (position, orientation),
"sensors": {
"realsense": {
"rgb": (480, 640, 3),
"depth": (480, 640)
},
"obs_camera": {...},
"obs_camera_2": {...}
},
"instruction": str,
"metric": {
"task_name": str,
"episode_name": str,
"episode_sr": int,
"first_success_step": int,
"episode_step": int
},
"step": int,
"render": bool
}
}
]
Action Space (Franka)
Three formats are supported:
List[float]
{
'arm_action': List[float],
'gripper_action': Union[List[float], int]
}
{
'eef_position': List[float],
'eef_orientation': List[float],
'gripper_action': Union[List[float], int]
}
Sensors
- Franka: front-facing view / gripper first-person view / rear-side view
- Aloha: head view / left gripper first-person view / right gripper first-person view
Metrics
The primary metric is success rate:
- Soft success: partial task completion counts as partial success (used in the competition)
- Hard success: only full completion of all subtasks is considered success
Additional Components
- Developed a custom recorder for asynchronously capturing image frames and logging both state and image data at each timestep, improving runtime efficiency.
- Implemented support for batch evaluation across multiple environments or parallel instances of Isaac Sim.
- A full list of configurable parameters and additional features can be found in the official documentation.
InternManip Integration
The above evaluation environment is integrated as a benchmark module in InternManip.
Main implementations include:
- wrapper environment
- evaluator
- Ray-based parallel evaluation
- agent / model integration interface
The evaluation functionality in InternManip has been refactored into InternManip-Eval, which is introduced in a separate project page.