A web-based search engine for Cal Poly Pomona faculty profiles, featuring a modern GUI interface with CPP's green and gold theme.
- Web scraping of CPP faculty profiles
- TF-IDF based search algorithm
- Cosine similarity for result ranking
- Modern GUI interface with CPP branding
- Real-time search results
- MongoDB integration for data storage
- Python 3.9 or higher
- MongoDB installed and running locally
- pip (Python package installer)
The following Python packages are required:
beautifulsoup4>=4.12.0
pymongo>=4.6.0
scikit-learn>=1.4.0
nltk>=3.8.1
- Clone the repository:
git clone <repository-url>
cd ScratchBuiltSearchEngine- Create and activate a virtual environment:
python3 -m venv venv
source venv/bin/activate # On Windows, use: venv\Scripts\activate- Install the required dependencies:
pip install -r requirements.txt- Ensure MongoDB is running locally on port 27017:
# On macOS with Homebrew
brew services start mongodb-community- First, run the parser to collect faculty data:
python3 parser_1.pyThis will scrape faculty profiles and store them in MongoDB.
- Launch the search engine GUI:
python3 SearchEngineGUI.py- Use the search interface:
- Type your search query in the search bar
- Press Enter or click the Search button
- View the top 5 matching results
- Results are ranked by relevance
- Scrapes faculty profiles from CPP's website
- Extracts text content from each profile
- Stores data in MongoDB for efficient retrieval
-
Text Processing:
- Tokenizes and stems search queries
- Removes stopwords
- Converts text to lowercase
-
Search Algorithm:
- Uses TF-IDF (Term Frequency-Inverse Document Frequency) to convert text to vectors
- Applies cosine similarity to rank results
- Returns most relevant matches
-
User Interface:
- Modern GUI with CPP's green and gold theme
- Real-time search results
- Scrollable results view
- Responsive design
-
MongoDB Connection Issues:
- Ensure MongoDB is running locally
- Check if port 27017 is available
- Verify MongoDB service status
-
Import Errors:
- Make sure all dependencies are installed
- Verify virtual environment is activated
- Check Python version compatibility
-
SSL Certificate Issues:
- The program includes SSL context handling
- If issues persist, check your system's SSL certificates
Feel free to submit issues and enhancement requests!
This project is licensed under the MIT License - see the LICENSE file for details.
For questions or feedback, please contact:
- Email: [email protected]