Skip to content

WikiplagWS17/wiki-test-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Wiki test data

Contains some small sample XML-Files extracted from the Wikipedia Articles XML-Dump of October 2017. All samples contain well formed XML. Closing tags such as </page> and </mediawiki> were added manually in order to achieve this. The original dump has a size of ~4.5 GB compressed and ~18 GB uncompressed. The lastest XML-Articles dump can be obtained from here.

  • first_5k.xml - the first 5000 lines of the XML-Dump file in plain XML format.

  • first_100k.xml - the first 100 000 lines of the XML-Dump file in plain XML format.

  • first_500k.zip (21.4MB) - the first 500 000 lines of the XML-Dump file as a zipped XML file.

About

some smaller sample xml file from the wikipedia dump

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published