This project received an First Class equivalent grade. While there are minor issues, this was particularly one of my biggest accomplishments during my university journey and will continue to work on this project when free time allows.
- MongoDB - database used to store and retrieve unstructured data
- PyMongo - Python library to integrate database and server
- Server - Flask
- Browser - HTML,CSS,JS,BootStrap (If time allows, use React for scability)
For further details, please refer to the Honours Project report found in page 27 onwards.
- Install MongoDB and MongoDBCompass - used to visualize data much more clearly
- Create a virtual environment in Python and activate it
python -m venv venv
- Ensure the Python dependencies from
requirements.txt
are installed in the virtual environment, especially Flask - Execute
setup_settings.py
on console - Go to
settings.ini
and type in respective API and database settings- Database by default is called
db
- Collections can either be named
bbcgoodfood
ortasty
.
- Database by default is called
- Run MongoDB in your local environment and connect it to localhost with port of 27017 (or MongoDBCloud database - lowest tier is free as of Feb 2023)
- Run
main.py
- this deploys the Flask server only for local development and not production usage.
Folders named static
and templates
where static files and templates are respectively displayed.
web_scraping
folder contains the Python files required to scrape the website.
honours_project_report
folder contains the Honours Project report and images.
NOTE: Installation of Selenium and Google Chrome version is used for scraping data locally on the database. This is done separately from deploying onto Heroku. It should ideally be done on a personal local machine.
Configurable deployable websites include BBCGoodFood and Tasty. BBCGoodFood collections are supported such as https://www.bbcgoodfood.com/recipes/collection/february-recipes
Tasty website has more flexibility where base url is configured as https://tasty.co/search
The base_url
of the website to scrape can be configured in the settings.ini
file
- Simply run the respective Python website module. Go to cmd and go to the
web_scraping
folder then typepython web_scrapping_module.py
- Ensure the recipes are stored in the MongoDB collections
NOTE: Figure how to deploy settings config file securely
- Go to terminal
- Create and obtain the Heroku app name on heroku.com, after this then follow instructions given
- Remove
settings.ini
from.gitignore
- Check in
requirements.txt
file to ensure Python dependencies are installed - Execute
git push heroku master
on your terminal - Add
settings.ini
to.gitignore