Default Benchmark

LAMM-Benchmark

LAMM Benchmark

Notes: LAMM-Benchmark has now been fully implemented using ChEF, and we highly recommend using the latest ChEF evaluation method for benchmarking in your work. ChEF supports the common 2D and 3D tasks evaluation and locating tasks evaluation in LAMM. Please note that the GPT rank metric in LAMM is no longer applicable.

To evaluate LAMM/Octavius on LAMM-Benchmark in 2D common tasks, use the pre-defined model config (src/config/ChEF/models/lamm.yaml or src/config/ChEF/models/octavius_2d+3d.yaml) and the pre-defined recipes config (src/config/ChEF/scenario_recipes/LAMM/).

python eval.py --model_cfg config/ChEF/models/lamm.yaml  --recipe_cfg config/ChEF/scenario_recipes/LAMM/ScienceQA.yaml

If you want to automately running all the evaluations sequentially, you can run

sh tools/LAMM/eval_lamm2d.sh

sh tools/LAMM/eval_lamm3d.sh

To evaluate Octavius on ScanNet Detection, run:

sh tools/Octavius/octavius_ChEF.sh

ChEF

ChEF Benchmark

Download Evaluated MLLMs

LLM	Vision Encoder	Language Model	Link
InstructBLIP	EVA-G	Vicuna 7B	instruct_blip_vicuna7b_trimmed
Kosmos2	CLIP ViT-L/14	Decoder 1.3B	kosmos-2.pt
LAMM	CLIP ViT-L/14	Vicuna 13B	lamm_13b_lora32_186k
LLaMA-Adapter-v2	CLIP ViT-L/14	LLaMA 7B	LORA-BIAS-7B
LLaVA	CLIP ViT-L/14	MPT 7B	LLaVA-Lightning-MPT-7B
MiniGPT-4	EVA-G	Vicuna 7B	MiniGPT-4
mPLUG-Owl	CLIP ViT-L/14	LLaMA 7B	mplug-owl-llama-7b
Otter	CLIP ViT-L/14	LLaMA 7B	OTTER-9B-LA-InContext
Shikra	CLIP ViT-L/14	LLaMA 7B	shikra-7b

Organize them as below:

ckpt ├── epcl_vit-L_256tokens ├── │ ├── lamm_2d # saved checkpoints in training │ └── … └── …