-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Currently the Semantic Climate project converts PDFs to HTML.
The content is the IPPC Climate report AR6 and we need to improve is markup for further semantic annotation, resuse, and presentation. From a typesetting perspective and freeing us from descructive reliance on PDF (note we can get PDF like results in a non-descructive way using Vivliostyle) - that's me @mrchristian I would like to produce HTMl that could be rendered in Vivliostyle better than this.
The output needs improvement. Currently it contained a number of elements which may not be needed, e.g., page numbers, inline styles, etc.
The objective would be to improve the output with tooling that can integrate with the current workflow.
The suggestion would be to create a way to evaluate the process by collating information on the issue:
- Current tooling
- Condition of the source PDFs
- Problems with outputs
- List of parts and markup that we need to retain their integrity
- Define what we want in out target outputs
- Do we want other output formats for richer markup and other interoperability
- List and evaluate tools
- Consult experts in the field: pandoc, le-tex, fidus, vivlio, css-rocks, etc
This research can be conducted in a wiki page on the Semantic Climate repository.
Here are sample files:
PDF source - Chapter 8 https://github.com/petermr/semanticClimate/blob/main/ipcc/ar6/wg3/Chapter08/fulltext.pdf
HTML full text - Chapter 8 https://github.com/petermr/semanticClimate/blob/main/ipcc/ar6/wg3/Chapter08/fulltext.html
Tasks
- Link to current PDF to HTML tooling.
- Consult Single Source Publishing Community https://github.com/singlesourcepub/community/discussions and others: le-tex, pandoc, css rocks?