GitHub - BigData-Fall2024-Team4/Assignment1

Assignemnt1

The objective of this project is to enhance AI model performance analysis by developing a model evaluation tool with Streamlit. The application will enable users to select validation test cases from the GAIA dataset and evaluate responses from OpenAI models. By comparing the OpenAI-generated answers to pre-defined correct answers, users can compare the OpenAI manually if the answer is correct.

Key technologies involved include: Streamlit: A framework for building interactive data applications. It serves as the user interface for selecting test cases, displaying outcomes, and capturing user feedback.

GAIA dataset: A dataset containing test cases, including questions and final answers, which serves as the foundation for model evaluation.

OpenAI models: To respond to the user-selected test cases, these language models will be queried to get a response

Azure Cloud Storage: It stores additional files, such as spreadsheets, pdf, txt, etc that certain test cases refer to, while the file names are stored in the MySQL database.

MySQL: This database is used to store metadata, such as test case files and test case files.

Project Resources

Google collab notebook: https://colab.research.google.com/drive/1-u0u6Ib5aPGprUhVwmp_Yj_Ie-FEaNgi?usp=sharing

Google codelab: https://codelabs-preview.appspot.com/?file_id=1Ih2p01AQZP2_p7pM-CWIECQJQams-EnPEwdwYNav838#0

App link (hosted on Streamlit Cloud): https://mainpy-heznqzbq2wxhheb66pts6x.streamlit.app/

Demo Video URL: https://drive.google.com/file/d/15uZEUIzM380tWLgTcy5BQN5SA_6WAFyi/view?usp=sharing

Tech Stack

Python | Streamlit | OpenAI | Azure SQL | Azure Blob Storage

Architecture diagram

Project Flow

The application starts when a user selects a test case from the GAIA dataset via the frontend (Streamlit). The backend retrieves the metadata from a MySQL database and, if applicable, fetches external files stored in Azure Cloud Storage.
The backend prepares the test case and context, sending it to OpenAI through an API call. The OpenAI model generates a response, which is then compared to the final answer in the GAIA dataset. The results, along with the generated answer, are returned to the frontend for display.
User feedback and results are stored in the MySQL database. The frontend generates visualizations, such as pie charts, to show model performance across test cases, allowing users to view metrics like total attempts, correct answers, and evaluation summaries.

Contributions

Name	Percentage Contribution
Sarthak Somvanshi	33%
Yuga Kanse	33%
Tanvi Inchanalkar	33%

Additional Notes

WE ATTEST THAT WE HAVEN’T USED ANY OTHER STUDENTS’ WORK IN OUR ASSIGNMENT AND ABIDE BY THE POLICIES LISTED IN THE STUDENT HANDBOOK.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.devcontainer		.devcontainer
Architecture		Architecture
my_page		my_page
utils		utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
validation_data.py		validation_data.py
your_database.db		your_database.db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Assignemnt1

Project Resources

Tech Stack

Architecture diagram

Project Flow

Contributions

Additional Notes

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

BigData-Fall2024-Team4/Assignment1

Folders and files

Latest commit

History

Repository files navigation

Assignemnt1

Project Resources

Tech Stack

Architecture diagram

Project Flow

Contributions

Additional Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages