The goal of this project is to use principles of Natural Language Processing to discover how the ancient languages Sanskrit and it's degenerate Pali, influenced the creation of the two modern languages Hindi and Marathi differently.
This program is built on Python 2.7 and requires the user to have installed NLTK. It builds upon the IndicNLP library, which is included within the project.
To run the program on our data files on Windows, run python confusability_matrix.py.
To run the program on our data files on Unix, run make.
This will output the data in a file called program.data
Note: be prepared to wait a very large amount of time
Our paper, FinalPaper.pdf, summarizes the results of our historical linguistic experiments. We found that Hindi was much more similar to Sanskrit than was Marathi. and Marathi was much more similar to Pali than was Hindi.