A multimodal material estimation project that utilizes both audio and visual information for material classification.
Pull the Docker image from Docker Hub:
docker pull timttu/multimodal-material-estimation:latestAlternatively, you can download the image from the Docker Hub repository (see links below).
After pulling the image, run the container with GPU support:
docker run -it --gpus all timttu/multimodal-material-estimation:latest /bin/bashOnce inside the Docker container, you can choose from different checkpoint options:
If you want to use the original checkpoint from the initial training, you can find it at the following path:
/MultiModalMaterialEstimation/ckpt.pth
This checkpoint achieves approximately 90% accuracy.
For better performance, you can use the optimized checkpoint that has been fine-tuned with different weight configurations:
/workspace/checkpoints/model_ckpt_finetune.pth
This checkpoint achieves approximately 92% accuracy through weight optimization and fine-tuning.
To train the model:
python train.py --config config.jsonTo test the model with a specific checkpoint:
python test.py --config config_test.json --ckpt_path [checkpoint_path]train.py: Model training scripttest.py: Model testing scriptdataset_utils.py: Dataset processing utilitiesconfig.json: Training configuration fileconfig_test.json: Testing configuration file
The project runs in a Docker container with all necessary dependencies pre-installed:
- PyTorch 1.10.0
- CUDA 11.3
- Transformers
- OpenAI Whisper
- CLIP
- Other related dependencies
- Docker installed on your system
- NVIDIA Docker runtime (for GPU support)
- Use the
--gpus allflag when running the container to enable GPU support
-
Docker Image Repository: Docker Hub - MultiModal Material Estimation
-
Original Checkpoint Path:
/MultiModalMaterialEstimation/ckpt.pth(approximately 90% accuracy) -
Optimized Checkpoint Path:
/workspace/checkpoints/model_ckpt_finetune.pth(approximately 92% accuracy)