2024
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
CVPR (The IEEE/CVF Conference on Computer Vision and Pattern Recognition), 2024
Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE
ICLR (The International Conference on Learning Representations), 2024
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Technical Report, 2024
RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents
Arxiv 2024
Assessment of Multimodal Large Language Models in Alignment with Human Values
Arxiv 2024
MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control
Arxiv 2024