2024

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
CVPR (The IEEE/CVF Conference on Computer Vision and Pattern Recognition), 2024

Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE
ICLR (The International Conference on Learning Representations), 2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Technical Report, 2024

RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents
Arxiv 2024

Assessment of Multimodal Large Language Models in Alignment with Human Values
Arxiv 2024

MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control
Arxiv 2024