Skip to content

Conversation

isaac091
Copy link
Collaborator

@isaac091 isaac091 commented May 10, 2025

This change is Reviewable

@isaac091 isaac091 requested review from Enkidu93 and ddaspit May 10, 2025 02:26
@codecov-commenter
Copy link

codecov-commenter commented May 10, 2025

Codecov Report

Attention: Patch coverage is 99.60159% with 1 line in your changes missing coverage. Please review.

Project coverage is 88.87%. Comparing base (45f24a9) to head (96e85e3).
Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
...corpora/place_markers_usfm_update_block_handler.py 99.25% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #175      +/-   ##
==========================================
+ Coverage   88.66%   88.87%   +0.20%     
==========================================
  Files         280      282       +2     
  Lines       16708    16951     +243     
==========================================
+ Hits        14814    15065     +251     
+ Misses       1894     1886       -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @Enkidu93)


tests/corpora/test_place_markers_usfm_update_block_handler.py line 10 at r1 (raw file):

    parse_usfm,
)
from machine.corpora.place_markers_usfm_update_block_handler import PlaceMarkersUsfmUpdateBlockHandler

We should import from the corpora package.


machine/corpora/place_markers_usfm_update_block_handler.py line 7 at r1 (raw file):

from ..jobs.eflomal_aligner import to_word_alignment_matrix
from ..jobs.translation_file_service import PretranslationInfo
from ..translation import WordAlignmentMatrix

The convention for internal imports is to import directly from the appropriate module instead of the package.


machine/corpora/place_markers_usfm_update_block_handler.py line 14 at r1 (raw file):

class PlaceMarkersUsfmUpdateBlockHandler(UsfmUpdateBlockHandler):

This class should be exposed in the machine.corpora package exports.


machine/corpora/place_markers_usfm_update_block_handler.py line 16 at r1 (raw file):

class PlaceMarkersUsfmUpdateBlockHandler(UsfmUpdateBlockHandler):

    def __init__(self, pt_info: Sequence[PretranslationInfo]) -> None:

Can you create a separate type to hold the alignment info for a verse? I don't want to create a dependency on the DTO that we use to pass pretranslations from a build job.

Copy link
Collaborator Author

@isaac091 isaac091 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 4 unresolved discussions (waiting on @ddaspit and @Enkidu93)


machine/corpora/place_markers_usfm_update_block_handler.py line 7 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

The convention for internal imports is to import directly from the appropriate module instead of the package.

Done.


machine/corpora/place_markers_usfm_update_block_handler.py line 14 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

This class should be exposed in the machine.corpora package exports.

Done.


machine/corpora/place_markers_usfm_update_block_handler.py line 16 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Can you create a separate type to hold the alignment info for a verse? I don't want to create a dependency on the DTO that we use to pass pretranslations from a build job.

Done. Turns out I had to anyways because importing from jobs was causing a circular dependency once I exported the handler.


tests/corpora/test_place_markers_usfm_update_block_handler.py line 10 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

We should import from the corpora package.

Done.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r2, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @Enkidu93)


machine/corpora/place_markers_usfm_update_block_handler.py line 12 at r2 (raw file):

class AlignmentInfo(TypedDict):

I would like to rename this to something that is more closely associated with the PlaceMarkersUsfmUpdateBlockHandler class, so that its purpose is clearer. Maybe PlaceMarkersAlignmentInfo.


machine/corpora/place_markers_usfm_update_block_handler.py line 14 at r2 (raw file):

class AlignmentInfo(TypedDict):
    refs: List[str]
    source_toks: List[str]

I would rename this to source_tokens.


machine/corpora/place_markers_usfm_update_block_handler.py line 16 at r2 (raw file):

    source_toks: List[str]
    translation_toks: List[str]
    alignment: str

This can be a WordAlignmentMatrix or a collection of AlignedWordPairs.


machine/corpora/place_markers_usfm_update_block_handler.py line 21 at r2 (raw file):

class PlaceMarkersUsfmUpdateBlockHandler(UsfmUpdateBlockHandler):

    def __init__(self, align_info: Sequence[AlignmentInfo]) -> None:

The type can be an Iterable.

Copy link
Collaborator Author

@isaac091 isaac091 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 4 unresolved discussions (waiting on @ddaspit and @Enkidu93)


machine/corpora/place_markers_usfm_update_block_handler.py line 12 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I would like to rename this to something that is more closely associated with the PlaceMarkersUsfmUpdateBlockHandler class, so that its purpose is clearer. Maybe PlaceMarkersAlignmentInfo.

Done.


machine/corpora/place_markers_usfm_update_block_handler.py line 14 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I would rename this to source_tokens.

Done.


machine/corpora/place_markers_usfm_update_block_handler.py line 16 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

This can be a WordAlignmentMatrix or a collection of AlignedWordPairs.

Done.


machine/corpora/place_markers_usfm_update_block_handler.py line 21 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

The type can be an Iterable.

Done.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Excited to port this 😆

Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @ddaspit)


machine/corpora/place_markers_usfm_update_block_handler.py line 14 at r2 (raw file):

Previously, isaac091 (Isaac Schifferer) wrote…

Done.

I don't know if you'd rather it were addressed in a separate PR, @ddaspit, but along the same lines, I noticed while working on sillsdev/serval#663 that source_toks is also used in that pipeline of passing the alignments up with the pretranslations. It'd make our side cleaner if we want to use SourceTokens in our data model if this were just called source_tokens (and something similar for translation_toks).

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)


machine/corpora/place_markers_usfm_update_block_handler.py line 14 at r2 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

I don't know if you'd rather it were addressed in a separate PR, @ddaspit, but along the same lines, I noticed while working on sillsdev/serval#663 that source_toks is also used in that pipeline of passing the alignments up with the pretranslations. It'd make our side cleaner if we want to use SourceTokens in our data model if this were just called source_tokens (and something similar for translation_toks).

Yeah, let's go ahead and make the change now.

Copy link
Collaborator Author

@isaac091 isaac091 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r4, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit)


machine/corpora/place_markers_usfm_update_block_handler.py line 14 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Yeah, let's go ahead and make the change now.

Done.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 4 files at r4, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @isaac091)

@isaac091 isaac091 merged commit f533cfb into main May 15, 2025
14 checks passed
@isaac091 isaac091 deleted the paragraph_marker_placement branch May 15, 2025 22:41
@isaac091
Copy link
Collaborator Author

Can one of you do a minor release so I can start working on plugging it into silnlp? Thanks!

@Enkidu93
Copy link
Collaborator

Can one of you do a minor release so I can start working on plugging it into silnlp? Thanks!

Yeah, I can do that: #177

benjaminking pushed a commit that referenced this pull request Jun 6, 2025
* Marker placement update block handler

* Refactor marker placement handler, small bug fixes

* Extend and clean up tests, more code cleanup

* Fix imports, use separate AlignmentInfo type

* Adjust (PlaceMarkers)AlignmentInfo type

* 'toks' --> 'tokens'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants