-
-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Enhancement
1 / 11 of 1 issue completed
Copy link
Description
After a set of USFM tokens are parsed and handled, we will want to apply some updates, namely, intra-verse insertion #578 and punctuation normalization #614. To do this, it would (likely) be cleanest to architect in this way:
- When a verse (or non-verse) set of tokens are processed, a data structure is created called a ScriptureBlock that represents the block as a overall reference, text, embeds, style markers and paragraph markers, with helper routines to iterate through various portions of the data.
- This structure is passed to a number of handlers:
- A handler to updated text based upon pretranslations and "strip" or "preserve" of various portions
- A handler to re-align intra-verse markers (paragraphs, embeds and style markers)
- A handler to de-normalize punctuation
- This ScriptureBlock then can output the tokens that are ready to put on the stack of tokens
This allows multiple transformations to happen on the scripture text on a verse-by-verse bases using a unified data model. Any cross-verse data that needs to be utilized would be done so within the handler itself.
Scripture Block
The data structure could include:
- All types of data in a list with unified indexes:
- Called "ScriptureBlockElements"
- Types: Verse (or non-verse) text; Paragraph marker; Embed (totality of the embed); Style marker
- Includes: Text (includes /ft for notes); Render(); a list of "indexes" that this element includes
- There is a copy of the "original" text elements before the transformations are made for use in later steps if the text ends up being transformed in the middle stages.
Sub-issues
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
✅ Done