Skip to content

A repository designed for creating, managing, and sharing datasets related to cultural heritage and museums. This hub includes modules for building datasets for training AI models, specifically for image classification and image captioning tasks.

License

Notifications You must be signed in to change notification settings

AI-Unicamp/heritage-data-hub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Heritage Data Hub

This repository is designed for creating, managing, and sharing datasets related to cultural heritage and museums, intended for training AI models, particularly for image classification and image captioning tasks. It is designed to help researchers, developers, and practitioners in their efforts toward the digital preservation, analysis, and AI-driven exploration of museum collections. The project is currently in progress and will continue to expand its capabilities.

Projects

1. europeana_db

A project focused on scraping, processing, and preparing data from the Europeana database, using only open data available. It includes tools for:

  • Data Crawling: Retrieve open and media data from the Europeana API.
  • Data Processing: Filter and link descriptions with their associated media.
  • Dataset Preparation: Organize image links and metadata for AI model training, accessing those images straight on training.

2. ema

The EMA project is a public dataset with approximately 12,000 images of more than 2,900 Brazilian historical objects, associated with 31 different labels. The EMA dataset can be adopted in interior objects contexts to improve the training or evaluate the performance of automated image captioning.

Getting Started

  1. Clone the Repository:
git clone https://github.com/AI-Unicamp/heritage-data-hub.git
cd heritage-data-hub
  1. Project Setup: Each project contains its own README file with detailed instructions on installation, dependencies, and usage.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

About

A repository designed for creating, managing, and sharing datasets related to cultural heritage and museums. This hub includes modules for building datasets for training AI models, specifically for image classification and image captioning tasks.

Resources

License

Stars

Watchers

Forks

Packages

No packages published