Skip to content

rayolddog/PreprocessingForML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PreprocessingForML

Preprocessing programs I use to prep data for machine learning. This is for various programs I use to preprocess medical images for Machine Learning. I use a variety of languages, including Mathematica, MatLab, and python 3.7. Intial prograsm are written in Mathematica.

Out of the approximately 200,000+ images I downloaded, there were slightly over 200 that appeared to be mangled by a computer algorithm (or other problems such as a double exposed image). In the majority of bad images, rows of the image were shifted right or left to distort the image. Many of these training images can be more or less corrected.

Both programs build a list of training files. To walk the directory tree of training images takes some time. To speed up the execution of the programs, I write out a file containg the text list of files from patient number to projection. For the first time use of the program, the lines of code walking the directory tree will have to be uncommented, and the directory (disk drive and directory where train directory is stored) will have to be edited. The input line will have to be commented out for the first run. Subsequent program evaluation can make use of input of the text form of the directory list (changing which lines are commented).

The first program takes the list of cases I identified as scrambled, and displays reduced resolution images. The purpose is to verify that the scrambled images I identified in my downloaded dataset are in fact present in your dataset. To use the program, you will have to modify the directories where you have stored the CheXpert train images. If you don't see scrambled images, then the issues I noted are not present in your dataset. You may want to check both the small dataset, and the full sized images.

The second program attempts to unscramble images I found in downloaded images for CheXpert challenge. The program I have written attempts to shift those altered rows back into correct position. It is not always successful. The program is Rectifylistimagetest.nb. I don't supply the large datafiles (train.csv,test.csv, valid.csv, or the .jpg images). I list the bad files I found by a cursory review of the 223000 images. The program uses parameters encoded in the script, so you will have to edit the script to use it. I only include a few examples because of the sized of the saved script.

About

Preprocessing programs I use to prep medical image data for machine learning (Chest x-rays)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published