Contains some small sample XML-Files extracted from the Wikipedia Articles XML-Dump of October 2017.
All samples contain well formed XML. Closing tags such as </page>
and </mediawiki>
were added manually in order to achieve this. The original dump has a size of ~4.5 GB compressed and ~18 GB uncompressed. The lastest XML-Articles dump can be obtained from here.
-
first_5k.xml - the first 5000 lines of the XML-Dump file in plain XML format.
-
first_100k.xml - the first 100 000 lines of the XML-Dump file in plain XML format.
-
first_500k.zip (21.4MB) - the first 500 000 lines of the XML-Dump file as a zipped XML file.