This is my first basic Web Crawler, and possibly, my first working python3 code I've uploaded on GitHub, pushed, or rather, copy-pasted from the Documents folder. Made it in Jupyter Notebook, in an .piynb file. Modivated by Manish sir, to be used under the WEB CRAWLER for PEC.
DESCRIPTION - THIS IS A BASIC WEB CRAWLER BUILT TO CRAWL THROUGH PEC.AC.IN. IT GOES TO A PAGE, GETS ALL THE LINKS AVAILABLE ON THAT PAGE WITHIN THE [A HREF=""] SECTION, AND STORES IT INTO A LIST. THIS SAME LIST IS USED AS REFFERENCE TO VISIT THE LISTED PAGES AND PERFORM THE SAME OPERATIONS ON THEM, THEREBY UPDATING THE URLs LIST. IT ALSO STORES THE TEXT OF EVERY PAGE IT VISITS. A SPECIAL FEATURE INCLUDES SEARCHING BY KEYWORDS. THE DEPTH OF EACH PAGE RELATIVE TO THE HOME PAGE IS SPECFIED. ALL THIS DATA, i.e. THE LINK, THE CORRESPONDING LISTS ON THE PAGE, IT'S TEXT, ETC. ALL ARE STORED IN A CSV FILE.
USES - USED FOR CHECKING DEAD LINKS, ERRORS IN PAGES/URLs AND A BIRD'S VIEW OVERLAY OF PEC.AC.IN, SEARCHING SPECIFIC KEYWORDS, DEPTH DETECTION, ETC.
CREATIVITY OF THE USER AND USES ARE PROPORTIONAL!!
PROBELMS- PERFORMS UPTO EXPECTATIONS AS OF NOW; MAY FAIL WHEN NEWER REQUIREMENTS ARE MADE KNOW IN THE FUTURE.