Project Overview

This is a standalone evaluation framework that I personally extracted and refactored from the InternManip project.

It is a universal, user-friendly, and efficient evaluation framework designed for embodied manipulation tasks across multiple benchmarks. Currently, it supports the Calvin, SimplerEnv, and GenManip benchmarks, and also includes ARX LIFT2 real-robot control (from the IROS offline competition, although the robot service startup dependencies are not included).

Highlights

Unified
- Integrates evaluation pipelines for different benchmarks into a single framework.
- Easily extensible to support additional benchmarks.
Easy to Use
- All evaluation settings are centralized in a single configuration file.
- One-command installation of Python dependencies.
- Adopts a client-server (C-S) evaluation architecture that decouples agents from environments, allowing each side to maintain its own dependency stack. Don't be intimidated by the C-S setup—the codebase supports automatic client-server evaluation startup out of the box, while also allowing manual deployment (particularly useful when deploying agents on different nodes or assigning custom ports to agent servers).
Efficient
- Supports distributed evaluation acceleration, while hiding the underlying implementation details from users.

Framework Structure

InternManip_Eval

Project Overview

Highlights

Framework Structure