Feature Extraction and Unsupervised Clustering of Histopathological Images of Pancreatic Cancer Using Information Maximization
This repository shows the code implementation and figures for the conference paper titled Feature Extraction and Unsupervised Clustering of Histopathological Images of Pancreatic Cancer Using Information Maximization.
Let's walk through this repository:
- kpc16 folder:
part_1_5000e_16c_training.ipynb
: This file has the code for training the unsupervised model using the 16-cluster set.part_2_5000e_16c_clustering.ipynb
: This file offers the code for clustering the training images using the trained unsupervised model with the 16-cluster set. We only showed the samples using the HE staining due to the page limitations during the paper submission.part_3_5000e_16c_umap.ipynb
: This file shows the code for plotting UMAP for the 16-cluster set.part_4_5000e_16c_NLL.ipynb
: This file includes the source code to compute the nagative log-likelihood for the 16-cluster set.part_5_save_clusters.ipynb
: This file contains the code for saving clusters individually for the 16-cluster set.
HHH16 & C16 & D16.csv
file has the information regarding the latent space.
kpc16 folder has two subfolders: clus16 and models16.
clus16 has the individual clusters in the form of
.csv
files. We obtained these files after executing the code shown inpart_5_save_clusters.ipynb
script.models16 has 3
.ckpt
files:model_en_202203211747_5000.ckpt
,model_cl_202203211747_5000.ckpt
, andmodel_de_202203211747_5000.ckpt
. These three files form our unsupervised model.hist_modelS_202203211747_5000.tsv
file stores diiferent metrics obtained during the training of our model.
The files present in the kpc16 folder can also be obtained for cluster sets of 8, 12, and 20. We haven't shown them here to avoid repetitiveness. In the paper, we showed the UMAP
and clustering
results for the 8-cluster set. These can be obtained after training the model using part_1_5000e_8c_training.ipynb
file for the 8-cluster set. After that, running part_2_5000e_8c_clustering.ipynb
and part_3_5000e_8c_umap.ipynb
will provide the desired results. The users will have to do these by themselves.
optimum_NLL.ipynb
file plots a line graph utilizing the negative log-likelihoods for the chosen cluster sets (8, 12, 16, and 20).
Finally, The folder
Figures
has 5.png
files in it:Fig_1.png
,Fig_2.png
,Fig_3.png
,Fig_4.png
, andFig_5.png
. These are the same figures that can be seen in the conference paper.