Skip to main content

Benchmarking

We provide 2D/3D/ChEF benchmarking datasets for downstream evaluation.

2D Benchmarking Datasets

2D Benchmarking datasets are build on Flickr30k, CIFAR-10, FSC147, CelebA, UCMerced, LSP, PASCAL VOC, SVT, AI2D and ScienceQA datasets. You can download them from here.

Corresponding meta file is here:

Meta file namesizeData file namesize
Caption_flickr30k.json598Kflickr30k_images.zip559M
Classification_CIFAR10.json2.6Mcifar10_images.zip8.9M
Counting_FSC147.json7.3Mfsc147_images.zip44M
Detection_VOC2012.json6.4Mvoc2012_images.zip196M
Facial_Classification_CelebA(Hair).json2.4Mceleba_images.zip566M
Facial_Classification_CelebA(Smile).json3.7Mceleba_images.zip566M
Fine-grained_Classification_UCMerced.json676Kucmerced_images.zip317M
Keypoints_Dectection_LSP.json3.9Mlsp_images.zip44M
Locating_FSC147.json7.5Mfsc147_images.zip44M
Locating_LSP.json3.9Mlsp_images.zip9.9M
Locating_VOC2012.json6.0Mvoc2012_images.zip196M
OCR_SVT.json68Ksvt_images.zip82M
VQA_AI2D.json2.1Mai2d_images.zip559M
VQA_SQAimage.json3.6Msqaimage_images.zip127M

3D Benchmarking Datasets

We provide two 3D benchmarking datasets, "Scan2Inst-benchmark" and "LAMM3D-Dataset-benchmark".

Scan2Inst-benchmark

If your MLLMs are trained with "Scan2Inst", you should use "Scan2Inst-benchmark" for evaluation.

We provide NR3D and ShapeNet for zero-shot evaluation, and ScanNet for finetuning evaluation. You can download processed pickle file from here.

Corresponding meta file is here:

Meta file namesizeData file namesize
Caption_nr3d.json2.28MCaption_nr3d.pickle25.41M
Caption_scannet.json239.43KCaption_scannet.pickle7.29M
Classification_scannet.json249.80 KClassification_scannet.pickle7.38M
Classification_shapenet.json1.09MClassification_shapenet.pickle21.45M
VQA_scannet.json231.64KVQA_scannet.pickle4.82M

LAMM3D-Dataset-benchmark

If your MLLMs are trained with "LAMM3D-Dataset", you should use "LAMM3D-Dataset-benchmark" for evalution.

LAMM3D-Dataset-benchmark is build on ScanNet. You can download them from here.

Corresponding meta file is here:

Meta file namesizeData file namesize
Detection_ScanNet.json1.7Mscannet_pcls.zip246M
VG_ScanRefer.json3.7Mscannet_pcls.zip246M
VQA_ScanQA_multiplechoice.json859Kscannet_pcls.zip246M

ChEF Benchmarking Dataset

Omnibenchmark

Download Omnibenchmark for fine-grained classification dataset and Bamboo Label System for hierarchical catergory labels.

We sampled and labeled Omnibenchmark meticulously by using a hierarchical chain of categories, facilitated by the Bamboo label system.

python ChEF/data_process/Omnibenchmark.py

You can also directly download the labeled Omnibenchmark dataset from OpenXLab.

MMBench, MME and SEEDBench

Refer to MMBench, MME and SEEDBench for dataset and more details.

POPE

POPE is a special labeled COCO dataset for hallucination evaluation based on the validation set of COCO 2014. Download COCO and POPE.

MMBench_C and ScienceQA_C

MMBench_C and ScienceQA_C are datasets with image and text corruptions fot robustness evaluation. You can also directly download the MMBench_C and ScienceQA_C dataset from OpenXLab.

Directory Structure

data
├── ChEF
| │── Omnibenchmark_Bamboo
| │ ├── meta_file
| │ └── omnibenchmark_images
| ├── MMBench_C
| | ├── images
| | ├── Image_Corruptions_info.json
| | ├── Text_Corruptions_info.json
| | └── MMBench_C.json
| └── ScienceQA_C
| ├── sqaimage_images
| ├── Image_Corruptions_info.json
| ├── Text_Corruptions_info.json
| └── VQA_ScienceQA_C.json
├── Bamboo
| └── sensexo_visual_add_academic_add_state_V4.visual.json
|── MMBench
| ├── mmbench_dev_20230712.tsv
| └── mmbench_test_20230712.tsv
|── MME_Benchmark_release_version
|── SEED-Bench
|── coco_pope
| ├── val2014
| ├── coco_pope_adversarial.json
| ├── coco_pope_popular.json
| └── coco_pope_random.json
└── ...

Sign up to get email updates on the LAMM or email us at openlamm@gmail.com.
© 2024. LAMM