Jiantong Chen | AI Engineer

Intro

Hello! I’m Jiantong Chen, an AI engineer at the Embodied Intelligence Center of the Shanghai Artificial Intelligence Laboratory.

I build evaluation and learning systems for embodied intelligence, focusing on how robotic agents can be systematically measured, compared, and improved through scalable and reproducible frameworks. I view evaluation as a fundamental component of intelligence itself, where infrastructure is designed to reveal what embodied systems can perceive, reason about, and execute under diverse and open-ended conditions.

More broadly, I work on multimodal and agentic AI systems, spanning computer vision, large language models, and graph-based learning, with an emphasis on translating research ideas into deployable AI products.

Across domains ranging from robotic manipulation to real-world AI products, I focus on building AI that is not only intelligent in principle, but measurable, controllable, and deployable in practice.

News

[2026.05]Blink-Call is an assistive calling system designed for ALS patients. It recognizes user-defined blink patterns from camera input and triggers visible and audio call alerts without requiring speech, hand movement, or physical touch.

[2026.04]EBench and Challenge are here! EBench is an indoor VLA benchmark built on NVIDIA Isaac Sim, featuring long-horizon, dexterous, and mobile manipulation tasks for fair and efficient evaluation of embodied model capabilities.

[2025.11]We launched InternManip-Eval, a cleanly separated evaluation module extracted from InternManip framework. It supports Calvin, SimplerEnv, GenManip, and ARX LIFT2 real-world robot control in the IROS offline challenge. It adopts a unified client–server evaluation paradigm with concurrent execution for efficient, scalable, and consistent embodied AI benchmarking.

[2025.10]Congratulations on the successful completion of the IROS 2025 offline finals in Hangzhou! I had the honor of participating in on-site real-robot debugging and serving as a chief judge. After comprehensive evaluation, we are pleased to congratulate the Team HonorEmbodiment for winning the championship in the manipulation track. Related links: Link1, Link2.

[2025.08]We are hosting the Challenge on Multimodal Robot Learning with two tracks. The Vision-Language Manipulation in Open Tabletop Environments challenge will be featured at the IROS 2025 Workshop, with submission deadline on September 30th. I will continue to develop InternManip to support competitions.

Experience

Shanghai Artificial Intelligence Laboratory

2022.11 — Present

Embodied Intelligence CenterAI Engineer

2024.11 — Present

Deeply involved in the EBench project: participated in the development of the VLA Benchmark (EBench), responsible for designing and implementing VR + teleoperation toolchains for 7 dexterous manipulation tasks. Completed teleoperation trajectory data collection and processing pipeline construction; built a data feedback loop mechanism with the model training team to iteratively improve the benchmark. Contributed to evaluation infrastructure design and development, and maintained engineering quality.
Deeply involved in the InternManip / InternManip-Eval projects: responsible for designing and developing embodied manipulation benchmark evaluation infrastructure. Built a Ray-based distributed evaluation system supporting multi-machine and multi-GPU scaling to improve evaluation throughput and scalability. Designed a Client–Server decoupled architecture separating agents and simulators, supporting multiple experimental and deployment modes. Established benchmark standardization mechanisms with unified configuration and one-click execution entry, standardizing benchmark development and evaluation workflows across different simulators. Developed and maintained the IROS Challenge: Vision-Language Manipulation in Open Tabletop Environments online evaluation system, and participated in offline competition debugging, participant support, and judging work.
Deeply involved in the InternUtopia project: responsible for developing a low-cost teleoperation toolchain based on gesture interaction. Participated in code reviews and open-source community building, including the “Play With InternUtopia” livestream event, as well as handling and responding to GitHub issues.

Applied Research DivisionAlgorithm Engineer

2022.11 — 2024.11

Continued the Smart Care Project originally developed at SenseTime: promoted product iteration to version 2.0, upgraded both software and hardware architecture, and improved system stability. Collaborated with hospitals, associations, and NGOs across Beijing, Shanghai, Xi’an, and Taiwan to continuously organize large-scale public welfare initiatives.
Built a embodied intelligent agent system with memory mechanisms, visual environment perception, map-based navigation, and natural language control of the robot body. Integrated image embeddings and continuously collected user profiles to enable identity retrieval and long-term memory-enhanced dialogue.
Developed an intelligent AR navigation and guide application: based on Function Calling, ReAct / ReWOO, and Reflection frameworks, built an agent system capable of autonomous task decomposition and execution, supporting navigation, pluggable knowledge retrieval, and action generation, and delivered a digital human AR guiding system.
Worked on SLM fine-tuning (SFT): improved coding capabilities of InternLM-base-7B under resource-constrained environments and successfully applied it to AirSim drone control, achieving natural language command success rates of 82%–96% depending on task complexity.
Developed a multimodal RAG production system: designed and implemented a multimodal RAG control platform enabling knowledge base construction, retrieval, and modular workflow orchestration. Built a rare disease QA system covering data collection, knowledge base construction, indexing strategies, and safety mechanisms, and provided free services to ALS patients and families via both web and WeChat channels.

SenseTime Xi’an

2021.11 — 2022.11

Applied Research LabAlgorithm Engineer (Project Lead)

Led the end-to-end delivery of a home-based care product for severely disabled users from 0-to-1. Responsibilities included: 1) project management including scheduling, cross-team communication, and risk estimation; 2) product design, including multi-channel user requirement collection, hardware adaptation, feature design and prioritization, APP UX design, and documentation writing; achieved a user negative feedback rate of only 1.6%; 3) organizing, recording, and iterating testing processes, including bug classification and resolution tracking; 4) algorithm development and on-site testing for an ambassador-facing smart care project; 5) leading public welfare activities with hospitals and associations across Beijing, Shanghai, Xi’an, and Taiwan, and supporting PR-related work after anonymization; the project received 8 domestic and international awards.
Algorithm development: covered full pipeline including data collection, annotation, training strategy design, model development, and system integration: 1) developed a near-infrared face and facial landmark detection model (98 landmarks) under extreme pose angles, dual-frequency lighting, and illumination interference, using a self-built dataset of over 1.06M images, achieving bbox mAP50=99% and NME[60,72]=0.0748, with inference speed >20 FPS on RV1126 edge devices; 2) developed eye-state sequence recognition under challenging conditions, achieving ACC>96% on a dataset of over 670K images, with robustness to head poses of |yaw| > 35°; 3) developed reflective point motion trajectory tracking algorithms supporting multi-point tracking, achieving SR>93%; 4) developed acoustic event detection for alarm sounds with Recall>98%.
Operations: 1) led 4 rounds of PoC deployments, covering product promotion, user onboarding, and operational support, achieving deployment in 130+ households with over 81% active usage by end of 2023; 2) coordinated the 6.21 World ALS Day campaign with local associations in Xi’an and Shanghai, produced PR materials, wrote press releases, organized media interviews (including The Paper), achieving over 10M+ global media impressions within one week.

RedNote

2020.06 — 2021.10

Ecological Security DepartmentAlgorithm Engineer

Built a near real-time anti-fraud system for community traffic and interaction scenarios using Flink, Kafka, and Nebula graph database.
Developed and deployed graph mining and clustering algorithms for black/gray market detection, including GCN, Fraudar+, Louvain, KMeans, and KPrototypes.
Improved community content traffic anti-fraud interception by 60%, eliminated approximately 80% of black-market traffic operations, significantly compressing arbitrage space for fraudulent traffic.

Publications

EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies

arXiv, 2026

PreprintEBenchVLAMobile Manipulation

View publication

Toward Motion Robustness: A Masked Attention Regularization Framework in Remote Photoplethysmography

CVPR 2024 Workshop, 2024

Conference PaperrPPGMotion Robustness

View publication

Dual-graph Convolutional Network Based on Band Attention and Sparse Constraint for Hyperspectral Band Selection

Knowledge-Based Systems, 2021

Journal ArticleGCNHyperspectral ImagingBand Selection

View publication

Convolutional Neural Network Based on Bandwise-independent Convolution and Hard Thresholding for Hyperspectral Band Selection

IEEE Transactions on Cybernetics, 2020

Journal ArticleBHCNNHyperspectral ImagingBand Selection

View publication

Generative Adversarial Networks Based on Collaborative Learning and Attention Mechanism for Hyperspectral Image Classification

Remote Sensing, 2020

Journal ArticleCA-GANHyperspectral ImagingData Generation

View publication

Projects

The following projects and information are selectively presented, with sensitive and confidential details appropriately redacted, and only partial work content disclosed.

Blink Call

A blink-based assistive calling system for ALS patients

EBench

An Indoor VLA Manipulation Benchmark Built on NVIDIA Isaac Sim

InternManip_Eval

A unified evaluation framework that supports multiple embodied manipulation benchmarks.

IROS 2025 Challenge

The Vision-Language Manipulation in Open Tabletop Environments challenge.

Gesture-Based Teleoperation for Manipulation

Real-time gesture-based teleoperation system for robotic arm control

Novel Generator

A modular, multi-agent, cross-platform system for automated long-form fiction generation.

Skills

What I do

Embodied AI Systems & Evaluation
Large Language Model Agent Systems
Applied AI Systems & Products

Robotics & Embodied AI

Teleoperation Systems
Data Collection Pipelines
Simulation Platforms
Scalable Evaluation Systems
Benchmark Design

LLM & Agent Stack

Agent Systems
Tool Use & Planning
Function Calling & APIs
RAG Systems
Memory-Augmented LLMs
Reasoning Frameworks (ReAct/ReWOO)
SLM Training & Deployment
Prompt Engineering

Computer Vision

General Computer Vision Algorithms
Diverse Vision Tasks and Application Scenarios

Machine Learning & Deep Learning

CNN
GNN
RNN
Attention Mechanisms
Representation Learning
Feature Selection
Optimization Methods

Languages & Frameworks

Python
PyTorch
Ray
OpenCV
FastAPI
PyQt
Nuitka
Git
Conda
Linux

Product & Leadership

Technical Leadership
Project Management
AI Product Design
Cross-functional Collaboration
Open-source Development
Production Testing & Operations

About me

I don’t treat research and engineering as separate tracks.

In my experience, the most impactful AI systems emerge when research ideas are tightly coupled with engineering constraints, such as data availability, latency, scalability, and deployment environments.

So I tend to think in terms of system and business rather than papers or models alone.

Intro

News

[2026.05]Blink-Call is an assistive calling system designed for ALS patients. It recognizes user-defined blink patterns from camera input and triggers visible and audio call alerts without requiring speech, hand movement, or physical touch.

[2026.04]EBench and Challenge are here! EBench is an indoor VLA benchmark built on NVIDIA Isaac Sim, featuring long-horizon, dexterous, and mobile manipulation tasks for fair and efficient evaluation of embodied model capabilities.

Experience

Shanghai Artificial Intelligence Laboratory

Embodied Intelligence CenterAI Engineer

Applied Research DivisionAlgorithm Engineer

SenseTime Xi’an

Applied Research LabAlgorithm Engineer (Project Lead)

RedNote

Ecological Security DepartmentAlgorithm Engineer

Publications

EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies

Toward Motion Robustness: A Masked Attention Regularization Framework in Remote Photoplethysmography

Dual-graph Convolutional Network Based on Band Attention and Sparse Constraint for Hyperspectral Band Selection

Convolutional Neural Network Based on Bandwise-independent Convolution and Hard Thresholding for Hyperspectral Band Selection

Generative Adversarial Networks Based on Collaborative Learning and Attention Mechanism for Hyperspectral Image Classification

Projects

Blink Call

EBench

InternManip_Eval

IROS 2025 Challenge

Gesture-Based Teleoperation for Manipulation

Novel Generator

Skills

What I do

Robotics & Embodied AI

LLM & Agent Stack

Computer Vision

Machine Learning & Deep Learning

Languages & Frameworks

Product & Leadership

About me

How do I think about research vs engineering?

What problems do I enjoy solving the most?

What kind of AI systems have I built end-to-end?

What motivates me the most?

What do I like doing when I’m not working?