LAMM (pronounced as /læm/, means cute lamb to show appreciation to LLaMA), is a growing open-source community aimed at helping researchers and developers quickly train and evaluate Multi-modal Large Language Models (MLLM), and futher build multi-modal AI agents capable of bridging the gap between ideas and execution, enabling seamless interaction between humans and AI machines.
As one of the very first open-source endeavors in the MLLM field, our goal is to create an ecosystem where every researcher and developer can apply, study, and even contribute. We work on various aspects including MLLM datasets, frameworks, benchmarks, optimizations, and applications as AI Agents. As a fully transparent open-source community, any form of collaboration is welcome!
Tutorial
Learn how to prepare the dataset, model, environment, and start training and evaluation.
Datasets
Download the datasets.
Models
Use LAMM Models.
Leaderboards
View the leaderboards of multimodal large language models.
News
📆 [2024-01]
1. Octavius is accepted by ICLR 2024!
2. From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities is released on Arxiv!
📆 [2023-12]
1. DepictQA: Depicted Image Quality Assessment based on Multi-modal Language Models released on Arxiv!
2. MP5: A Multi-modal LLM based Open-ended Embodied System in Minecraft released on Arxiv!
📆 [2023-11]
1. ChEF: A comprehensive evaluation framework for MLLM released on Arxiv!
2. Octavius Mitigating Task Interference in MLLMs by combining Mixture-of-Experts (MoEs) with LoRAs released on Arxiv!
3. Camera ready version of LAMM is available on Arxiv.
📆 [2023-10]
1. LAMM is accepted by NeurIPS2023 Datasets & Benchmark Track! See you in December!
📆 [2023-09]
1. Light training framework for V100 or RTX3090 is available! LLaMA2-based finetuning is also online.
2. Our demo moved to OpenXLab.
📆 [2023-07]
1. Checkpoints & Leaderboard of LAMM on huggingface updated on new code base.
2. Evaluation code for both 2D and 3D tasks are ready.
3. Command line demo tools updated.
📆 [2023-06]
1. LAMM: 2D & 3D dataset & benchmark for MLLM
2. Watch demo video for LAMM at YouTube or Bilibili!
3. Full paper with Appendix is available on Arxiv.
4. LAMM dataset released on Huggingface & OpenDataLab for Research community!
5. LAMM code is available for Research community!
Publications
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark
Zhenfei Yin*, Jiong Wang*, JianJian Cao*, Zhelun Shi*, Dingning Liu, Mukai Li, Lu Sheng, Lei Bai†, Xiaoshui Huang, Zhiyong Wang, Jing Shao†, Wanli Ouyang
NeurIPS, 2023, Datasets and Benchmarks TrackOctavius: Mitigating Task Interference in MLLMs via MoE
Zeren Chen*, Ziqin Wang*, Zhen Wang*, Huayang Liu, Zhenfei Yin, Si Liu, Lu Sheng†, Wanli Ouyang, Yu Qiao, Jing Shaoâ€
ICLR, 2024Preprints
ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models
Zhelun Shi*, Zhipin Wang*, Hongxing Fan*, Zhenfei Yin, Lu Sheng†, Yu Qiao, Jing Shaoâ€
Arxiv, 2023MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
Yiran Qin*, Enshen Zhou*, Qichang Liu*, Zhenfei Yin, Lu Sheng†, Ruimao Zhang†, Yu Qiao, Jing Shao
Arxiv 2023DepictQA: Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models
Zhiyuan You*, Zheyuan Li*, Jinjin Gu*, Zhenfei Yin, Tianfan Xue+, Chao Dong+
Arxiv, 2023From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Chaochao Lu, Chen Qian, Guodong Zheng, Hongxing Fan, Hongzhi Gao, Jie Zhang, Jing Shao, Jingyi Deng, Jinlan Fu, Kexin Huang, Kunchang Li, Lijun Li, Limin Wang, Lu Sheng, Meiqi Chen, Ming Zhang, Qibing Ren, Sirui Chen, Tao Gui, Wanli Ouyang, Yali Wang, Yan Teng, Yaru Wang, Yi Wang, Yinan He, Yingchun Wang, Yixu Wang, Yongting Zhang, Yu Qiao, Yujiong Shen, Yurong Mou, Yuxi Chen, Zaibin Zhang, Zhelun Shi, Zhenfei Yin, Zhipin Wang
Arxiv, 2024