This project is an implementation of the neural network found in miniMNIST-c, in Python, with NumPy.
It is a minimal neural network for classifying handwritten digits from the MNIST dataset, and the entire implementation is 87 lines of code according to cloc.
Unlike miniMNIST-c, this project makes use of one library: NumPy.
NumPy is used for its powerful N-dimensional arrays which are extremely fast and allow the entire network to be vectorised and trained rapidly.
It also makes translating the mathematics behind a simple feed-foward neural network into Python much easier as it provides useful functions such as the matrix dot product, the normal distribution, and the argmax function.
- Two-layer Neural Network (Input -> Hidden -> Output)
- ReLU activation for the Hidden Layer and SoftMax activation for the Output Layer
- Cross-entropy Loss function (Log Loss)
- Stochastic Gradient Descent (SGD) optimizer
- The code for
miniMNIST-pythonis commented to an almost extreme degree, as this implementation was developed for a workshop presented by CompSoc - University of Galway's Computer Society on 2024/10/16 - Many of the optimisations made to
miniMNIST-csince its initially release have not been implemented inminiMNIST-python - Does not implement Momentum-based variation of Stochastic Gradient Descent
- Utilises the
t10ktesting dataset rather than taking a slice of theMNISTtraining dataset for testing
- Python 3.12
- NumPy
- MNIST dataset files:
- train-images.idx3-ubyte
- train-labels.idx1-ubyte
- t10k-images.idx3-ubyte
- t10k-labels.idx1-ubyte
- Place the MNIST dataset files in the same directory as
main.py(the root of the project) - Install NumPy:
pip install numpy- Execute the program with Python 3.12:
python main.pyThe script will train the neural network and output the accuracy and average loss for each training epoch.
The constants at the top of main.py can be adjusted to change the behaviour of the network, namely:
HIDDEN_SIZE: The number of neurons in the Hidden LayerLEARNING_RATE: The learning rate for Stochastic Gradient DescentEPOCHS: The number of training epochsBATCH_SIZE: The batch size for training (in this implementation it must be a number which divides cleanly into 60,000)
This project is open-source and available under the MIT License.