VitalSource Supplemental Data Repository

This repository contains supporting datasets and analysis code for several of our papers evaluating the use of artificial intelligence to enhance electronic textbooks at scale. These projects include automatic question generation (AQG) as well as other generative AI–based features such as text simplification. All datasets are drawn from real student interactions in the VitalSource Bookshelf ereader platform.

Our earliest research focused on AQG as a method for adding formative practice to textbooks. Millions of automatically generated questions have been added to thousands of textbooks in Bookshelf as part of a free study feature called CoachMe. CoachMe is based on the Doer Effect, the learning science principle that students who do practice as they read have better learning outcomes than those who only read. Our efforts have since expanded beyond AQG to include other generative AI-based interventions to support student learning and engagement. All of our published research papers can be found on our research site.

The datasets available are:

Directory	Paper
l@s-2021	Toward effective courseware at scale: Investigating automatically generated questions as formative practice
aied-2021-itextbooks	Transforming textbooks into learning by doing environments: An evaluation of textbook-based automatic question generation
l@s-2022	Discrimination of automatically generated questions used as formative practice
ijaied-2024	Automatic question generation for Spanish textbooks: Evaluating Spanish questions generated with the parallel construction method
ife-2024	An expert evaluation of formative practice generated for Spanish textbooks using Artificial Intelligence
aied-2024-evallac	Exploring large language models for evaluating automatically generated questions
edm-2024	Investigating student ratings with features of automatically generated questions: A large-scale analysis using data from natural learning contexts
jedm-2025	Intrinsic and contextual factors impacting student ratings of automatically generated questions: A large-scale data analysis
edm-2025-causaledm	Improving automatically generated fill-in-the-blank answer selection with an LLM-based agreement filter
l@s-2025	Refining sentence selection for automatic cloze question generation with large language models
aied-2025-evallac	Open-ended questions need personalized feedback: Analyzing LLM-enabled features with student data
aied-2025-itextbooks	Improving textbook accessibility through AI simplification: Readability improvements and meaning preservation

Unless otherwise noted, our datasets are available under the Creative Commons Attribution 4.0 International License.

Contact Us

If you have questions, please feel free to email [email protected].

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VitalSource Supplemental Data Repository

Contact Us

About

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
aied-2021-itextbooks		aied-2021-itextbooks
aied-2024-evallac		aied-2024-evallac
aied-2025-evallac		aied-2025-evallac
aied-2025-itextbooks		aied-2025-itextbooks
edm-2024		edm-2024
edm-2025-causaledm		edm-2025-causaledm
ife-2024		ife-2024
ijaied-2024		ijaied-2024
jedm-2025		jedm-2025
l@s-2021		l@s-2021
l@s-2022		l@s-2022
l@s-2025		l@s-2025
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

License

vitalsource/data

Folders and files

Latest commit

History

Repository files navigation

VitalSource Supplemental Data Repository

Contact Us

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 3

Uh oh!

Languages