Training
Prepare Required Checkpoints
We provide pretrained weights of visual encoder and LLM, you can download them from the Table and put them in model_zoo
directory.
Model Name | Link |
---|---|
Vicuna | Link |
clip-vit-large-patch14-336 | Link |
epcl_vit-L_256tokens | Link |
Organize the pretrained weights as below:
model_zoo
├── vicuna_ckpt
│ ├── 13b_v0
│ └── 7b_v0
└── epcl_vit-L_256tokens
LAMM
2D Models Training
cd src
sh tools/LAMM/train_lamm2d.sh lamm_2d
# or
sh tools/LAMM/train_lamm2d_slurm.sh <YOUR_PARTITION> lamm_2d3D Models Training
cd src
sh tools/LAMM/train_lamm3d.sh lamm_3d
# or
sh tools/LAMM/train_lamm3d_slurm.sh <YOUR_PARTITION> lamm_3d
For your reference, GPU memory consumption for different models are shown as follows
Model Size | Sample Num/GPU | GPU Memory |
---|---|---|
Vicuna_v0_7B | 1 | ~30GB |
Vicuna_v0_7B | 2 | ~46GB |
Vicuna_v0_13B | 1 | ~53GB |
Vicuna_v0_13B | 2 | ~70GB |
Octavius
Image modality only
cd src
sh tools/Octavius/train_octavius_slurm.sh <YOUR_PARTITION> <NUM_GPU> \
config/Octavius/octavius_2d_e4_bs64.yaml octavius_2d_e4_bs64Point cloud modality only
cd src
sh tools/Octavius/train_octavius_slurm.sh <YOUR_PARTITION> <NUM_GPU> \
config/Octavius/octavius_3d_e3_bs64.yaml octavius_3d_e3_bs64Image & point cloud modality joint
cd src
sh tools/Octavius/train_octavius_slurm.sh <YOUR_PARTITION> <NUM_GPU> \
config/Octavius/octavius_2d+3d_e6_bs64.yaml octavius_2d+3d_e6_bs64
Model Zoo
We provide several pretrained LAMM/Octavius checkpoints here:
LAMM Model Zoo
# Training Samples | Vision Encoder | LLM | Training Data | Lora Rank | Link |
---|---|---|---|---|---|
98K | CLIP-ViT-L | Vicuna_v0_7B | LAMM-2D daily dialogue & desctiption | 32 | Checkpoints |
186K | CLIP-ViT-L | Vicuna_v0_7B | LAMM-2D Instruction Data | 32 | Checkpoints |
186K | CLIP-ViT-L | LLaMA2_chat_7B | LAMM-2D Instruction Data | 32 | Checkpoints |
98K | CLIP-ViT-L | Vicuna_v0_13B | LAMM-2D daily dialogue & desctiption | 32 | Checkpoints |
186K | CLIP-ViT-L | Vicuna_v0_13B | LAMM-2D Instruction Data | 32 | Checkpoints |
10K | EPCL-ViT-L | Vicuna_v0_13B | LAMM-3D Instruction Data | 32 | Checkpoints |
Octavius Model Zoo
#Samples | Vision Encoder | LLM | Training Data | Link |
---|---|---|---|---|
286k | CLIP-ViT-L | Vicuna_v0_13B | LAMM-2D Instruction Data & COCO-Detection | ckpt |
90k | Obj-As-Scene | Vicuna_v0_13B | Scan2Inst | ckpt |
376K | CLIP-ViT-L & Obj-As-Scene | LLaMA2_chat_13B | LAMM-2D Instruction Data & COCO-Detection & Scan2Inst | ckpt |
You can download them and put them into ckpt
directory for fast evaluation.