News
[2025/05]🎉 WorldSimBench gets accepted to ICML 2025 and is also selected as Oral Presentation in CVPR 2025 @ WorldModelBench! Welcome to evalute your world models here!
[2025/02]🎉 Code-as-Monitor is accepted at CVPR 2025! Come on to see our demos!
[2024/12]🎉 AGFSync gets accepted to AAAI 2025! See you in Philadelphia!
[2024/05]🎉 Honored to organize two workshop (TiFA, MFM-EAI) challenge
in ICML 2024!
[2024/02]🎉 MP5 is accepted at CVPR 2024! Please check out the demos
in our webpage!
Publications ( *, †, ‡ indicates the equal contributions, corresponding author, project leader, respectively.)
Currently, my interest lies in Embodied Agents,
which are at the intersection of Multimodal Large Language Models and Embodied AI,
with particular interests in high-level planning and low-level control with spatio-temporal intelligence,
working towards an generalist agent in a complex real-world environment.
Representative works are highlighted.
|
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics
Enshen Zhou *,
Jingkun An *,
Cheng Chi *‡,
Yi Han,
Shanyu Rong,
Chi Zhang,
Pengwei Wang,
Zhongyuan Wang,
Tiejun
Huang,
Lu Sheng†,
Shanghang Zhang†
Paper /
Project /
Code /
Copy BibTeX
Copy Success!
TL;DR: From words to exactly where you mean using RoboRefer!
Arxiv 2025
|
|
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure
Detection
Enshen Zhou *,
Qi Su *,
Cheng Chi *†,
Zhizheng
Zhang,
Zhongyuan
Wang,
Tiejun
Huang,
Lu Sheng†,
He Wang†
Paper /
Project /
BiilBili Video /
Copy BibTeX
Copy Success!
TL;DR: Enjoy Open-world Failure Detection with Real-time high precision!
CVPR 2025
|
|
WorldSimBench: Towards Video Generation Models as World Simulators
Yinran Qin *,
Zhelun Shi *,
Jiwen Yu,
Xijun Wang,
Enshen Zhou,
Lijun Li,
Zhenfei Yin‡,
Xihui Liu,
Lu Sheng,
Jing Shao†
Lei Bai,
Wanli Ouyang,
Ruimao Zhang†
TL;DR: Evalute Video Generation Models as World Simulators!
Paper /
Project /
Copy BibTeX
Copy Success!
ICML 2025
CVPR 2025 @ WorldModelBench, Oral Presentation
|
|
AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image
Generation
Jingkun An *,
Yinghao Zhu *,
Zongjian Li*,
Enshen Zhou,
Haoran Feng,
Xijie Huang,
Bohua
Chen,
Yemin Shi,
Chengwei Pan†
Paper /
Project /
Code /
Copy BibTeX
Copy Success!
TL;DR: Train T2I Diffusion model with AI-Generated Feedback for DPO!
AAAI 2025
|
|
MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World
Control
Enshen Zhou *,
Yinran Qin *,
Zhenfei Yin,
Yuzhou Huang,
Ruimao Zhang†,
Lu Sheng†,
Yu Qiao,
Jing Shao‡
Paper /
Project /
Code /
Copy BibTeX
Copy Success!
TL;DR: Use Imagination to Guide agent itself How to Act step-by-step!
NeurIPS 2024 @ OWA
|
|
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
Yinran Qin *,
Enshen Zhou *,
Qichang Liu *,
Zhenfei Yin,
Lu Sheng†,
Ruimao Zhang†,
Yu Qiao,
Jing Shao‡
Paper /
Project /
BiilBili Video /
Code /
Copy BibTeX
Copy Success!
TL;DR: Multi-Agent System can Solve Endless Open-ended Long-horizion tasks!
CVPR 2024
|
Services
Workshop Challenge Organizer:
- Trustworthy Multi-modal Foundation Models and AI Agents (TiFA) in ICML 2024.
- Multi-modal Foundation Model meets Embodied AI (MFM-EAI) in ICML 2024.
Reviewer: CVPR
|
Selected Awards and Honors
2024: Outstanding Graduate of Beihang University.
2023: Special Prize (Top 1) in "Challenge Cup" Competition of Science Achievement in China.
2017: Rank 1st/68k in National High School Entrance Examination of ShenZhen.
|
|