Intro

Hello! I’m Jiantong Chen, an AI engineer at the Embodied Intelligence Center of the Shanghai Artificial Intelligence Laboratory.

I build evaluation and learning systems for embodied intelligence, focusing on how robotic agents can be systematically measured, compared, and improved through scalable and reproducible frameworks. I view evaluation as a fundamental component of intelligence itself, where infrastructure is designed to reveal what embodied systems can perceive, reason about, and execute under diverse and open-ended conditions.

More broadly, I work on multimodal and agentic AI systems, spanning computer vision, large language models, and graph-based learning, with an emphasis on translating research ideas into deployable AI products.

Across domains ranging from robotic manipulation to real-world AI products, I focus on building AI that is not only intelligent in principle, but measurable, controllable, and deployable in practice.

News

[2026.05]Blink-Call is an assistive calling system designed for ALS patients. It recognizes user-defined blink patterns from camera input and triggers visible and audio call alerts without requiring speech, hand movement, or physical touch.

[2026.04]EBench and Challenge are here! EBench (Elemental Mobile Manipulation Benchmark) is an indoor VLA benchmark built on NVIDIA Isaac Sim, featuring long-horizon, dexterous, and mobile manipulation tasks for fair and efficient evaluation of embodied model capabilities.

[2025.11]We launched InternManip-Eval, a cleanly separated evaluation module extracted from InternManip framework. It supports Calvin, SimplerEnv, GenManip, and ARX LIFT2 real-world robot control in the IROS offline challenge. It adopts a unified client–server evaluation paradigm with concurrent execution for efficient, scalable, and consistent embodied AI benchmarking.

[2025.10]Congratulations on the successful completion of the IROS 2025 offline finals in Hangzhou! I had the honor of participating in on-site real-robot debugging and serving as a chief judge. After comprehensive evaluation, we are pleased to congratulate the Team HonorEmbodiment for winning the championship in the manipulation track. Related links: Link1, Link2.

[2025.08]We are hosting the Challenge on Multimodal Robot Learning with two tracks. The Vision-Language Manipulation in Open Tabletop Environments challenge will be featured at the IROS 2025 Workshop, with submission deadline on September 30th. I will continue to develop InternManip to support competitions.

Experience

Shanghai Artificial Intelligence Laboratory

Embodied Intelligence CenterAI Engineer

2024.11 — Present

  • Deeply involved in the EBench project: contributed to the development of the VLA Benchmark (EBench), responsible for designing and implementing VR + teleoperation toolchains for 7 dexterous manipulation tasks. Built data collection and trajectory processing pipelines, including raw trajectory processing and data augmentation workflows such as background randomization. Established a data feedback loop with the model training team, continuously iterating on the benchmark and improving evaluation framework quality.
  • Deeply involved in the InternManip / InternManip-Eval projects, responsible for designing and developing infrastructure for embodied manipulation benchmarking: built a distributed evaluation system based on Ray, supporting multi-machine, multi-GPU scaling to significantly improve throughput and scalability. Designed a Client–Server decoupled architecture separating agents and simulators, enabling flexible adaptation across experimental and deployment settings. Established benchmark standardization mechanisms with unified configuration and one-click execution, standardizing development and evaluation workflows across different simulators. Developed and maintained the IROS Challenge: Vision-Language Manipulation in Open Tabletop Environments evaluation system, and participated in real-robot debugging, participant support, and competition judging for the offline challenge.
  • In the InternUtopia project, responsible for developing teleoperation data collection toolchains, conducting code reviews, and contributing to open-source community development, including participation in the “Play With InternUtopia” livestream event and handling GitHub issues and discussions.

Applied Research DivisionAlgorithm Engineer

2022.11 — 2024.11

  • Continued development of the Assistive Care System project originally from SenseTime, upgrading the product to version 2.0, while collaborating with associations across Beijing, Shanghai, Xi’an, and Taiwan to organize large-scale public welfare initiatives.
  • Built a personalized dialogue system based on RAG and memory mechanisms, integrating image embeddings and user profiling for identity recognition and long-term memory enhancement, and introduced ReAct / ReWOO frameworks for task planning and tool-use decoupling.
  • Designed a multimodal agent platform and AR navigation system. Built a multi-tool orchestration framework based on Function Calling (navigation / knowledge retrieval / action generation), enabling automatic decomposition and execution of complex instructions.
  • Worked on large model training and deployment: independently trained InternLM-base-7B. Under resource-constrained environments, improved coding capabilities and successfully applied the model to AirSim drone control, achieving natural language instruction success rates of 96% to 82% depending on task complexity. Built a Text-RAG-based medical QA system covering data collection, knowledge base construction, indexing strategies, and safety-oriented response filtering.
  • Designed and developed a multimodal RAG control platform, enabling modular construction of multimodal knowledge bases and workflow orchestration. Reduced the barrier for application development through frontend componentization, supporting rapid prototyping and scalable deployment of customized RAG applications.

SenseTime

Applied Research LabAlgorithm Engineer (Project Lead)

  • From scratch, delivered a home-based care product for individuals with severe disabilities. Responsibilities included: (1) project management such as scheduling, cross-team communication, and risk assessment; (2) user requirement collection through multiple channels, product hardware adaptation, feature design and prioritization, APP UX design, and documentation writing; achieved a user negative feedback rate of only 1.6%; (3) organizing, recording, and iterating testing processes, including bug classification and resolution tracking.
  • Algorithm development: (1) developed a facial landmark detection model (98 landmarks) under extreme pose variations, challenging lighting conditions, and near-infrared sensor inputs, using a self-built dataset of over 1.06M images, achieving bbox mAP50=99% and NME[60,72]=0.0748, with >20 FPS on RV1126 edge devices; (2) developed eye-state sequence recognition models under similar conditions, achieving >96% accuracy on a dataset of 670K images; (3) developed reflective point motion trajectory tracking algorithms supporting multi-point tracking, achieving >93% accuracy; (4) developed an alarm sound event detection system with recall >98%. Covered the full pipeline including data collection, annotation, training strategy design, model selection, and product integration.
  • Operations: (1) led four PoC deployments covering product promotion, user guidance, and operational support; achieved deployment in 130+ households with over 81% active usage by the end of 2023; (2) coordinated the 6.21 World ALS Day campaign with local associations in Xi’an and Shanghai, authored press releases, organized media interviews (including The Paper), and achieved over 10 million cumulative media impressions globally.

RedNote

Ecological Security DepartmentAlgorithm Engineer

  • Built a near real-time anti-fraud system for community traffic and interaction scenarios based on Flink, Kafka, and the Nebula Graph database.
  • Developed and deployed graph mining and clustering algorithms, including GCN, Fraudar+, Louvain, KMeans, and KPrototypes, for fraud and black-market detection.
  • Improved real-time traffic interception rates by 60%, significantly reducing large-scale fraudulent traffic operations and compressing arbitrage opportunities in the black-market ecosystem.

Publications

Toward Motion Robustness: A Masked Attention Regularization Framework in Remote Photoplethysmography

Toward Motion Robustness: A Masked Attention Regularization Framework in Remote Photoplethysmography

CVPR 2024 Workshop, 2024

Conference PaperrPPGMotion Robustness
View publication
Dual-graph Convolutional Network Based on Band Attention and Sparse Constraint for Hyperspectral Band Selection

Dual-graph Convolutional Network Based on Band Attention and Sparse Constraint for Hyperspectral Band Selection

Knowledge-Based Systems, 2021

Journal ArticleGCNHyperspectral ImagingBand Selection
View publication
Convolutional Neural Network Based on Bandwise-independent Convolution and Hard Thresholding for Hyperspectral Band Selection

Convolutional Neural Network Based on Bandwise-independent Convolution and Hard Thresholding for Hyperspectral Band Selection

IEEE Transactions on Cybernetics, 2020

Journal ArticleBHCNNHyperspectral ImagingBand Selection
View publication
Generative Adversarial Networks Based on Collaborative Learning and Attention Mechanism for Hyperspectral Image Classification

Generative Adversarial Networks Based on Collaborative Learning and Attention Mechanism for Hyperspectral Image Classification

Remote Sensing, 2020

Journal ArticleCA-GANHyperspectral ImagingData Generation
View publication
CNN-based Multilayer Spatial-Spectral Feature Fusion and Sample Augmentation with Local and Nonlocal Constraints for Hyperspectral Image Classification

CNN-based Multilayer Spatial-Spectral Feature Fusion and Sample Augmentation with Local and Nonlocal Constraints for Hyperspectral Image Classification

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2019

Journal ArticleMSLN-CNNHyperspectral ImagingSpatial-Spectral Feature FusionSample Augmentation
View publication

Skills

What I do

  • Embodied AI Systems & Evaluation
  • Large Language Model Agent Systems
  • Applied AI Systems & Products

Robotics & Embodied AI

  • Teleoperation Systems
  • Data Collection Pipelines
  • Simulation Platforms
  • Scalable Evaluation Systems
  • Benchmark Design

LLM & Agent Stack

  • Agent Systems
  • Tool Use & Planning
  • Function Calling & APIs
  • RAG Systems
  • Memory-Augmented LLMs
  • Reasoning Frameworks (ReAct/ReWOO)
  • LLM Training & Deployment
  • Prompt Engineering

Languages & Frameworks

  • Python
  • PyTorch
  • Ray
  • OpenCV
  • FastAPI
  • PyQt
  • Nuitka
  • Git
  • Conda
  • Linux

Product & Leadership

  • Technical Leadership
  • Project Management
  • AI Product Design
  • Cross-functional Collaboration
  • Open-source Development
  • Production Testing & Operations

About me

How do I think about research vs engineering?

I don’t treat research and engineering as separate tracks.

In my experience, the most impactful AI systems emerge when research ideas are tightly coupled with engineering constraints, such as data availability, latency, scalability, and deployment environments.

So I tend to think in terms of system and business rather than papers or models alone.