Default Benchmark


LAMM-Benchmark


LAMM Benchmark


Notes: LAMM-Benchmark has now been fully implemented using ChEF, and we highly recommend using the latest ChEF evaluation method for benchmarking in your work. ChEF supports the common 2D and 3D tasks evaluation and locating tasks evaluation in LAMM. Please note that the GPT rank metric in LAMM is no longer applicable.


To evaluate LAMM/Octavius on LAMM-Benchmark in 2D common tasks, use the pre-defined model config (src/config/ChEF/models/lamm.yaml or src/config/ChEF/models/octavius_2d+3d.yaml) and the pre-defined recipes config (src/config/ChEF/scenario_recipes/LAMM/).


python eval.py --model_cfg config/ChEF/models/lamm.yaml  --recipe_cfg config/ChEF/scenario_recipes/LAMM/ScienceQA.yaml

If you want to automately running all the evaluations sequentially, you can run


sh tools/LAMM/eval_lamm2d.sh

sh tools/LAMM/eval_lamm3d.sh

To evaluate Octavius on ScanNet Detection, run:


sh tools/Octavius/octavius_ChEF.sh

ChEF


ChEF Benchmark


Download Evaluated MLLMs


LLMVision EncoderLanguage ModelLink
InstructBLIPEVA-GVicuna 7Binstruct_blip_vicuna7b_trimmed
Kosmos2CLIP ViT-L/14Decoder 1.3Bkosmos-2.pt
LAMMCLIP ViT-L/14Vicuna 13Blamm_13b_lora32_186k
LLaMA-Adapter-v2CLIP ViT-L/14LLaMA 7BLORA-BIAS-7B
LLaVACLIP ViT-L/14MPT 7BLLaVA-Lightning-MPT-7B
MiniGPT-4EVA-GVicuna 7BMiniGPT-4
mPLUG-OwlCLIP ViT-L/14LLaMA 7Bmplug-owl-llama-7b
OtterCLIP ViT-L/14LLaMA 7BOTTER-9B-LA-InContext
ShikraCLIP ViT-L/14LLaMA 7Bshikra-7b

Organize them as below:


ckpt ├── epcl_vit-L_256tokens ├── │ ├── lamm_2d # saved checkpoints in training │ └── … └── …