Video captioning in Pytorch based on hobincar/SA-LSTM.
-  MSVD and MSR-VTT dataset EDA (see dataset_eda/dataeda.ipynb)
- 2d Feature extraction
- 3d Feature extraction (follow this issue)
- BUTD Feature extraction
- Temporal augmentation
- Joint-Hierarchical Attention Model
- Full pretrained models (Cider 50.3 for MSR-VTT, 97.1 for MSVD)