Skip to content

BigData-Fall2024-Team4/Assignment1

Repository files navigation

Assignemnt1

The objective of this project is to enhance AI model performance analysis by developing a model evaluation tool with Streamlit. The application will enable users to select validation test cases from the GAIA dataset and evaluate responses from OpenAI models. By comparing the OpenAI-generated answers to pre-defined correct answers, users can compare the OpenAI manually if the answer is correct.

Key technologies involved include: Streamlit: A framework for building interactive data applications. It serves as the user interface for selecting test cases, displaying outcomes, and capturing user feedback.

GAIA dataset: A dataset containing test cases, including questions and final answers, which serves as the foundation for model evaluation.

OpenAI models: To respond to the user-selected test cases, these language models will be queried to get a response

Azure Cloud Storage: It stores additional files, such as spreadsheets, pdf, txt, etc that certain test cases refer to, while the file names are stored in the MySQL database.

MySQL: This database is used to store metadata, such as test case files and test case files.

Project Resources

Google collab notebook: https://colab.research.google.com/drive/1-u0u6Ib5aPGprUhVwmp_Yj_Ie-FEaNgi?usp=sharing

Google codelab: https://codelabs-preview.appspot.com/?file_id=1Ih2p01AQZP2_p7pM-CWIECQJQams-EnPEwdwYNav838#0

App link (hosted on Streamlit Cloud): https://mainpy-heznqzbq2wxhheb66pts6x.streamlit.app/

Demo Video URL: https://drive.google.com/file/d/15uZEUIzM380tWLgTcy5BQN5SA_6WAFyi/view?usp=sharing

Tech Stack

Python | Streamlit | OpenAI | Azure SQL | Azure Blob Storage

Architecture diagram

image

Project Flow

  1. The application starts when a user selects a test case from the GAIA dataset via the frontend (Streamlit). The backend retrieves the metadata from a MySQL database and, if applicable, fetches external files stored in Azure Cloud Storage.

  2. The backend prepares the test case and context, sending it to OpenAI through an API call. The OpenAI model generates a response, which is then compared to the final answer in the GAIA dataset. The results, along with the generated answer, are returned to the frontend for display.

  3. User feedback and results are stored in the MySQL database. The frontend generates visualizations, such as pie charts, to show model performance across test cases, allowing users to view metrics like total attempts, correct answers, and evaluation summaries.

Contributions

Name Percentage Contribution
Sarthak Somvanshi 33%
Yuga Kanse 33%
Tanvi Inchanalkar 33%

Additional Notes

WE ATTEST THAT WE HAVEN’T USED ANY OTHER STUDENTS’ WORK IN OUR ASSIGNMENT AND ABIDE BY THE POLICIES LISTED IN THE STUDENT HANDBOOK.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages