This repository is designed for creating, managing, and sharing datasets related to cultural heritage and museums, intended for training AI models, particularly for image classification and image captioning tasks. It is designed to help researchers, developers, and practitioners in their efforts toward the digital preservation, analysis, and AI-driven exploration of museum collections. The project is currently in progress and will continue to expand its capabilities.
A project focused on scraping, processing, and preparing data from the Europeana database, using only open data available. It includes tools for:
- Data Crawling: Retrieve open and media data from the Europeana API.
- Data Processing: Filter and link descriptions with their associated media.
- Dataset Preparation: Organize image links and metadata for AI model training, accessing those images straight on training.
The EMA project is a public dataset with approximately 12,000 images of more than 2,900 Brazilian historical objects, associated with 31 different labels. The EMA dataset can be adopted in interior objects contexts to improve the training or evaluate the performance of automated image captioning.
- Clone the Repository:
git clone https://github.com/AI-Unicamp/heritage-data-hub.git
cd heritage-data-hub
- Project Setup: Each project contains its own README file with detailed instructions on installation, dependencies, and usage.
This project is licensed under the MIT License. See the LICENSE file for more details.