Intro

Hello! I’m Jiantong Chen, an AI engineer at the Embodied Intelligence Center of the Shanghai Artificial Intelligence Laboratory.

I build evaluation and learning systems for embodied intelligence, focusing on how robotic agents can be systematically measured, compared, and improved through scalable and reproducible frameworks. I view evaluation as a fundamental component of intelligence itself, where infrastructure is designed to reveal what embodied systems can perceive, reason about, and execute under diverse and open-ended conditions.

More broadly, I work on multimodal and agentic AI systems, spanning computer vision, large language models, and graph-based learning, with an emphasis on translating research ideas into deployable AI products.

Across domains ranging from robotic manipulation to real-world AI products, I focus on building AI that is not only intelligent in principle, but measurable, controllable, and deployable in practice.

News

[2026.05]Blink-Call is an assistive calling system designed for ALS patients. It recognizes user-defined blink patterns from camera input and triggers visible and audio call alerts without requiring speech, hand movement, or physical touch.

[2026.04]EBench and Challenge are here! EBench is an indoor VLA benchmark built on NVIDIA Isaac Sim, featuring long-horizon, dexterous, and mobile manipulation tasks for fair and efficient evaluation of embodied model capabilities.

[2025.11]We launched InternManip-Eval, a cleanly separated evaluation module extracted from InternManip framework. It supports Calvin, SimplerEnv, GenManip, and ARX LIFT2 real-world robot control in the IROS offline challenge. It adopts a unified client–server evaluation paradigm with concurrent execution for efficient, scalable, and consistent embodied AI benchmarking.

[2025.10]Congratulations on the successful completion of the IROS 2025 offline finals in Hangzhou! I had the honor of participating in on-site real-robot debugging and serving as a chief judge. After comprehensive evaluation, we are pleased to congratulate the Team HonorEmbodiment for winning the championship in the manipulation track. Related links: Link1, Link2.

[2025.08]We are hosting the Challenge on Multimodal Robot Learning with two tracks. The Vision-Language Manipulation in Open Tabletop Environments challenge will be featured at the IROS 2025 Workshop, with submission deadline on September 30th. I will continue to develop InternManip to support competitions.

Experience

Shanghai Artificial Intelligence Laboratory

Embodied Intelligence CenterAI Engineer

2024.11 — Present

  • Deeply involved in the EBench project: participated in the development of the VLA Benchmark (EBench), responsible for designing and implementing VR + teleoperation toolchains for 7 dexterous manipulation tasks. Completed teleoperation trajectory data collection and processing pipeline construction; built a data feedback loop mechanism with the model training team to iteratively improve the benchmark. Contributed to evaluation infrastructure design and development, and maintained engineering quality.
  • Deeply involved in the InternManip / InternManip-Eval projects: responsible for designing and developing embodied manipulation benchmark evaluation infrastructure. Built a Ray-based distributed evaluation system supporting multi-machine and multi-GPU scaling to improve evaluation throughput and scalability. Designed a Client–Server decoupled architecture separating agents and simulators, supporting multiple experimental and deployment modes. Established benchmark standardization mechanisms with unified configuration and one-click execution entry, standardizing benchmark development and evaluation workflows across different simulators. Developed and maintained the IROS Challenge: Vision-Language Manipulation in Open Tabletop Environments online evaluation system, and participated in offline competition debugging, participant support, and judging work.
  • Deeply involved in the InternUtopia project: responsible for developing a low-cost teleoperation toolchain based on gesture interaction. Participated in code reviews and open-source community building, including the “Play With InternUtopia” livestream event, as well as handling and responding to GitHub issues.

Applied Research DivisionAlgorithm Engineer

2022.11 — 2024.11

  • Continued the Smart Care Project originally developed at SenseTime: promoted product iteration to version 2.0, upgraded both software and hardware architecture, and improved system stability. Collaborated with hospitals, associations, and NGOs across Beijing, Shanghai, Xi’an, and Taiwan to continuously organize large-scale public welfare initiatives.
  • Built a embodied intelligent agent system with memory mechanisms, visual environment perception, map-based navigation, and natural language control of the robot body. Integrated image embeddings and continuously collected user profiles to enable identity retrieval and long-term memory-enhanced dialogue.
  • Developed an intelligent AR navigation and guide application: based on Function Calling, ReAct / ReWOO, and Reflection frameworks, built an agent system capable of autonomous task decomposition and execution, supporting navigation, pluggable knowledge retrieval, and action generation, and delivered a digital human AR guiding system.
  • Worked on SLM fine-tuning (SFT): improved coding capabilities of InternLM-base-7B under resource-constrained environments and successfully applied it to AirSim drone control, achieving natural language command success rates of 82%–96% depending on task complexity.
  • Developed a multimodal RAG production system: designed and implemented a multimodal RAG control platform enabling knowledge base construction, retrieval, and modular workflow orchestration. Built a rare disease QA system covering data collection, knowledge base construction, indexing strategies, and safety mechanisms, and provided free services to ALS patients and families via both web and WeChat channels.

SenseTime Xi’an

Applied Research LabAlgorithm Engineer (Project Lead)

  • Led the end-to-end delivery of a home-based care product for severely disabled users from 0-to-1. Responsibilities included: 1) project management including scheduling, cross-team communication, and risk estimation; 2) product design, including multi-channel user requirement collection, hardware adaptation, feature design and prioritization, APP UX design, and documentation writing; achieved a user negative feedback rate of only 1.6%; 3) organizing, recording, and iterating testing processes, including bug classification and resolution tracking; 4) algorithm development and on-site testing for an ambassador-facing smart care project; 5) leading public welfare activities with hospitals and associations across Beijing, Shanghai, Xi’an, and Taiwan, and supporting PR-related work after anonymization; the project received 8 domestic and international awards.
  • Algorithm development: covered full pipeline including data collection, annotation, training strategy design, model development, and system integration: 1) developed a near-infrared face and facial landmark detection model (98 landmarks) under extreme pose angles, dual-frequency lighting, and illumination interference, using a self-built dataset of over 1.06M images, achieving bbox mAP50=99% and NME[60,72]=0.0748, with inference speed >20 FPS on RV1126 edge devices; 2) developed eye-state sequence recognition under challenging conditions, achieving ACC>96% on a dataset of over 670K images, with robustness to head poses of |yaw| > 35°; 3) developed reflective point motion trajectory tracking algorithms supporting multi-point tracking, achieving SR>93%; 4) developed acoustic event detection for alarm sounds with Recall>98%.
  • Operations: 1) led 4 rounds of PoC deployments, covering product promotion, user onboarding, and operational support, achieving deployment in 130+ households with over 81% active usage by end of 2023; 2) coordinated the 6.21 World ALS Day campaign with local associations in Xi’an and Shanghai, produced PR materials, wrote press releases, organized media interviews (including The Paper), achieving over 10M+ global media impressions within one week.

RedNote

Ecological Security DepartmentAlgorithm Engineer

  • Built a near real-time anti-fraud system for community traffic and interaction scenarios using Flink, Kafka, and Nebula graph database.
  • Developed and deployed graph mining and clustering algorithms for black/gray market detection, including GCN, Fraudar+, Louvain, KMeans, and KPrototypes.
  • Improved community content traffic anti-fraud interception by 60%, eliminated approximately 80% of black-market traffic operations, significantly compressing arbitrage space for fraudulent traffic.

Publications

Toward Motion Robustness: A Masked Attention Regularization Framework in Remote Photoplethysmography

Toward Motion Robustness: A Masked Attention Regularization Framework in Remote Photoplethysmography

CVPR 2024 Workshop, 2024

Conference PaperrPPGMotion Robustness
View publication
Dual-graph Convolutional Network Based on Band Attention and Sparse Constraint for Hyperspectral Band Selection

Dual-graph Convolutional Network Based on Band Attention and Sparse Constraint for Hyperspectral Band Selection

Knowledge-Based Systems, 2021

Journal ArticleGCNHyperspectral ImagingBand Selection
View publication
Convolutional Neural Network Based on Bandwise-independent Convolution and Hard Thresholding for Hyperspectral Band Selection

Convolutional Neural Network Based on Bandwise-independent Convolution and Hard Thresholding for Hyperspectral Band Selection

IEEE Transactions on Cybernetics, 2020

Journal ArticleBHCNNHyperspectral ImagingBand Selection
View publication
Generative Adversarial Networks Based on Collaborative Learning and Attention Mechanism for Hyperspectral Image Classification

Generative Adversarial Networks Based on Collaborative Learning and Attention Mechanism for Hyperspectral Image Classification

Remote Sensing, 2020

Journal ArticleCA-GANHyperspectral ImagingData Generation
View publication
CNN-based Multilayer Spatial-Spectral Feature Fusion and Sample Augmentation with Local and Nonlocal Constraints for Hyperspectral Image Classification

CNN-based Multilayer Spatial-Spectral Feature Fusion and Sample Augmentation with Local and Nonlocal Constraints for Hyperspectral Image Classification

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2019

Journal ArticleMSLN-CNNHyperspectral ImagingSpatial-Spectral Feature FusionSample Augmentation
View publication

Skills

What I do

  • Embodied AI Systems & Evaluation
  • Large Language Model Agent Systems
  • Applied AI Systems & Products

Robotics & Embodied AI

  • Teleoperation Systems
  • Data Collection Pipelines
  • Simulation Platforms
  • Scalable Evaluation Systems
  • Benchmark Design

LLM & Agent Stack

  • Agent Systems
  • Tool Use & Planning
  • Function Calling & APIs
  • RAG Systems
  • Memory-Augmented LLMs
  • Reasoning Frameworks (ReAct/ReWOO)
  • SLM Training & Deployment
  • Prompt Engineering

Computer Vision

  • General Computer Vision Algorithms
  • Diverse Vision Tasks and Application Scenarios

Machine Learning & Deep Learning

  • CNN
  • GNN
  • RNN
  • Attention Mechanisms
  • Representation Learning
  • Feature Selection
  • Optimization Methods

Languages & Frameworks

  • Python
  • PyTorch
  • Ray
  • OpenCV
  • FastAPI
  • PyQt
  • Nuitka
  • Git
  • Conda
  • Linux

Product & Leadership

  • Technical Leadership
  • Project Management
  • AI Product Design
  • Cross-functional Collaboration
  • Open-source Development
  • Production Testing & Operations

About me

I don’t treat research and engineering as separate tracks.

In my experience, the most impactful AI systems emerge when research ideas are tightly coupled with engineering constraints, such as data availability, latency, scalability, and deployment environments.

So I tend to think in terms of system and business rather than papers or models alone.