Skip to content

Conversation

isaac091
Copy link
Collaborator

@isaac091 isaac091 commented Jul 10, 2025

This adds paragraph and style marker behavior to the metadata for marker placement and fixes the bug related to headers when stripping paragraphs. This will also enable us to have the third option for paragraph markers, which is to move them to the end of the verse.

A couple of questions I have about this:

  1. I only added style_behavior to the metadata for completeness, it's not actually necessary, i.e. the desired behavior for style markers always matches the marked_for_removal property. Should I remove it? I don't forsee it ever being needed, but I could be wrong.
  2. I may just be forgetting, but are there plans to explicitly add "end of verse" as an option for UpdateUsfmMarkerBehavior? We should be able to create that behavior for paragraph markers within the current system (including these changes) by passing "strip" in the alignment info (so the markers get pushed to the end) and "preserve" in update_usfm (so they aren't removed).

After this is in I would like to do another minor release so I can add the "end of verse" option in silnlp.

Closes #206


This change is Reviewable

@isaac091 isaac091 requested review from ddaspit and Enkidu93 July 10, 2025 05:37
@Enkidu93
Copy link
Collaborator

1. I only added `style_behavior` to the metadata for completeness, it's not actually necessary, i.e. the desired behavior for style markers always matches the `marked_for_removal` property. Should I remove it? I don't forsee it ever being needed, but I could be wrong.

In general, could you elaborate on this issue: Why is not possible for marked_for_removal to be set such that it is accurate?

2. I may just be forgetting, but are there plans to explicitly add "end of verse" as  an option for UpdateUsfmMarkerBehavior? We should be able to create that behavior for paragraph markers within the current system (including these changes) by passing "strip" in the alignment info (so the markers get pushed to the end) and "preserve" in `update_usfm` (so they aren't removed).

At the moment in Serval, we just don't pass the place markers handler when we want the 'end' behavior. Do you see any issues with that? I'm indifferent in general in regard to whether we add an option in the place marks handler or just don't pass that handler in the 'end' case.

Copy link
Collaborator Author

@isaac091 isaac091 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, the marked_for_removal property of a marker does align with whether or not we want to place that marker (and in fact always does for style markers, as far as I know). The exception to this is paragraph markers that follow headers, which are always marked_for_removal=False (because they are essentially the end marker for the header). So, in the case that we are not placing paragraph markers, where all other paragraph markers are marked_for_removal=True, we need need to look at the actual marker placement setting.

There is no issue with not passing the place markers handler when we are only concerned with one type of marker behavior, as is currently the case because we officially support placing paragraph markers but not yet style markers. However, once we are handling multiple types of markers, it becomes an issue. If, for example, we want to place style markers but not paragraph markers, we need to create a place markers handler, which creates the environment for the bug I described above. Similarly, if we want to place style markers and put paragraph markers at the ends of verses, we would create the handler. But once we're in the handler, all of the paragraph markers will have marked_for_removal=False, which signals to the current code that they need to be placed, when in fact we don't want them to be.

Reviewable status: 0 of 2 files reviewed, all discussions resolved (waiting on @ddaspit and @Enkidu93)

Copy link
Collaborator Author

@isaac091 isaac091 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight tangent:
An alternative approach to these problems is to create separate place markers handler instances for each type of marker. Since the tokenizations/alignments are generated beforehand and the placement of each marker is determined independent of other markers, it would not change the behavior of the algorithm, and then we would not have to consider the implications of each intersection of marker behaviors. The downside of this would be if we developed an alternate or improved algorithm in the future that does not place markers independently of one another, it would potentially not be implementable with this setup.

For example, if we wanted to place all paragraph markers first and then place style markers with the constraint that they must be placed in the same paragraph they originated in, that would be fine, but if we wanted to place markers left-to-right and say that each marker must come after the one before it (in the linear order), that would not be possible to do.

Reviewable status: 0 of 2 files reviewed, all discussions resolved (waiting on @ddaspit and @Enkidu93)

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer this approach rather than creating separate handler classes.

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)

@isaac091 isaac091 merged commit ed8b7fa into main Jul 15, 2025
14 checks passed
@isaac091 isaac091 deleted the marker_behavior_metadata branch July 15, 2025 18:32
benjaminking pushed a commit that referenced this pull request Jul 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pass USFM marker behaviors through UpdateBlock metadata
3 participants