Skip to content

Conversation

carmenbianca
Copy link
Member

@carmenbianca carmenbianca commented Mar 4, 2025

This is preliminary work for #947.

I'd like to merge this before I work on the refactoring. It will keep these PRs smaller.

  • Added a change log entry in changelog.d/<directory>/.
    • Specifically, reuse annotate --year works differently now.
  • Added self to copyright blurb of touched files.
  • Wrote tests.
  • My changes do not contradict
    the current specification.
  • I agree to license my contribution under the licenses indicated in the
    changed files.

@carmenbianca carmenbianca force-pushed the copyright-line branch 4 times, most recently from 80c1953 to 7f871ea Compare March 6, 2025 15:06
@carmenbianca carmenbianca force-pushed the copyright-line branch 2 times, most recently from 9156136 to 73729eb Compare August 26, 2025 17:29
@carmenbianca
Copy link
Member Author

So this is basically done. All tests pass, and the whole code base is converted to using these new data types instead of strings. There are probably some comments or variable names that still hint at the old string types, or patterns in the code that make less sense given the new data types, but on the whole, everything appears to work.

The benefits of doing this aren't immediately obvious yet, except that #328 is now implemented in an obvious and elegant way that doesn't require dark string manipulation magics.

The downside is that the new data type is somewhat less flexible than strings. It has some assumptions (years come before the author, separated by spaces, dashes, and commas; contacts are between square brackets at the end; etc etc etc), and tries to remain flexible outside of those assumptions (that is: put everything that isn't recognised into the name field of CopyrightNotice), but it's not infeasible that a bug report will be opened by someone because of something I had not anticipated.

The other downside is that one test is broken on Windows. I'll fix that, probably.

the :class:`CopyrightPrefix`es, the most common is chosen. If there is a
tie in frequency, choose the one which appears first in the enum.
"""
# TODO: Consider making a match on contact optional.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't do this; name collisions

carmenbianca and others added 22 commits September 6, 2025 18:00
Signed-off-by: Carmen Bianca BAKKER <[email protected]>
This is the basis for refactoring ReuseInfo to use CopyrightNotice
instead of strings for copyright lines.

Signed-off-by: Carmen Bianca BAKKER <[email protected]>
Signed-off-by: Carmen Bianca BAKKER <[email protected]>
ReuseInfo will depend on CopyrightNotice, so it makes sense to move it.

Signed-off-by: Carmen Bianca BAKKER <[email protected]>
These functions were causing cyclical import problems.

Signed-off-by: Carmen Bianca BAKKER <[email protected]>
These should be immutable. Editing them makes `orginal` useless.

Signed-off-by: Carmen Bianca BAKKER <[email protected]>
This object needs to be hashable, and lists aren't hashable. Altogether
a simple change.
This is more correct with the newly created class CopyrightNotice. It
will also help identify the spots to refactor, because everything is
broken now. Effectively every single line touched by this commit needs a
touch-up.
- I made the regex patterns public so that others can use them.
- For efficiency, CopyrightNotice.from_match now exists. You can match a
  file against the COPYRIGHT_PATTERN, and pass the results to the new
  factory.
- I stripped trailing whitespace using the regex.
I don't believe the error can ever be reached given the regex. This
should be fine.
This apparently wasn't enabled because the tests themselves weren't
typed. By enabling `check_untyped_defs`, all calls made inside of test
functions are type-checked. I need this as a sanity check because I am
changing the type of a very core component used _everywhere_.
While doing this, I changed the return type from YearRange.compact to
tuple.
This is, I believe, the last module that needed refactoring.
The symbol © was not correctly encoded into UTF-8 without explicitly
declaring the encoding, and reuse assumes UTF-8 encoding in all files.
`2017 -2019` and `2017 -2019` are not valid. This code ensures that an
appropriate error is raised, and also ensures that tuple_from_string
splits on correct whitespace.
@carmenbianca carmenbianca mentioned this pull request Sep 21, 2025
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant