Skip to content

UpdateBlock data structure for updating pipeline #619

@johnml1135

Description

@johnml1135

After a set of USFM tokens are parsed and handled, we will want to apply some updates, namely, intra-verse insertion #578 and punctuation normalization #614. To do this, it would (likely) be cleanest to architect in this way:

  • When a verse (or non-verse) set of tokens are processed, a data structure is created called a ScriptureBlock that represents the block as a overall reference, text, embeds, style markers and paragraph markers, with helper routines to iterate through various portions of the data.
  • This structure is passed to a number of handlers:
    • A handler to updated text based upon pretranslations and "strip" or "preserve" of various portions
    • A handler to re-align intra-verse markers (paragraphs, embeds and style markers)
    • A handler to de-normalize punctuation
  • This ScriptureBlock then can output the tokens that are ready to put on the stack of tokens

This allows multiple transformations to happen on the scripture text on a verse-by-verse bases using a unified data model. Any cross-verse data that needs to be utilized would be done so within the handler itself.

Scripture Block

The data structure could include:

  • All types of data in a list with unified indexes:
    • Called "ScriptureBlockElements"
    • Types: Verse (or non-verse) text; Paragraph marker; Embed (totality of the embed); Style marker
    • Includes: Text (includes /ft for notes); Render(); a list of "indexes" that this element includes
  • There is a copy of the "original" text elements before the transformations are made for use in later steps if the text ends up being transformed in the middle stages.

Sub-issues

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

Status

✅ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions