Skip to content

Conversation

alok1304
Copy link
Collaborator

@alok1304 alok1304 commented Aug 16, 2025

Reference #4515
Updated..
This will be merge after #4432

In this PR, I added extra-phrase maker in rules. I am doing by license-expression wise. First i am doing for bsd-new license_expression.

Introducing new repo: (What they do?) named-entity-utils

Named-Entity Removal:

  • Automatically removes named-entities (e.g., PERSON, ORG, GPE) from license rules.
  • Users can provide a specific license-expression, ensuring that named-entity removal is applied only to those rules.

Duplicate Rule Detection:

  • After named-entity removal, we can also detect and output duplicate rules.
  • Results are provided in JSON format, making it clear which rule files match each other after removing named-entity.

A detailed README is included with instructions on how to set up and run the tool.

Processing Steps for BSD-New(eg) Rules

  1. Remove all named-entities from BSD-New rules using named-entity-utils.
  2. Detect duplicate rules (after named-entity removal) and export them in JSON format.
  3. For each group of duplicate rules:
    • If a rule already exists without a named-entity at the target position, add an extra-phrase marker there.
    • If no such rule exists, create a new rule with an extra-phrase marker at the position named-entity.
  4. Handle edge cases:
    • If a removed named-entity is not actually a named-entity, then do not modify that rule (no new rule, no extra-phrase).
    • This ensures we only adjust rules where named-entities truly exist.

For bsd-new license we got these duplicate rules after removing named-entity from all rules of bsd-new license.
File:
bsd-new_duplicates_rules.json

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
    Run tests locally to check for errors.
  • Commits are in uniquely-named feature branch and has no merge conflicts 📁
  • Updated documentation pages (if applicable)
  • Updated CHANGELOG.rst (if applicable)

Signed-off-by: Alok Kumar [email protected]

Copy link
Member

@AyanSinhaMahapatra AyanSinhaMahapatra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alok1304 thanks for the PR, but these changes are not mergable.

See #4515 (comment) for more details on why you cannot:

  • replace named entitites with extra words markers
  • deprecate rules

See also alok1304/mark-extra-phrase#1 :)

this list of conditions and the following Disclaimer in the documentation
and/or other materials provided with the distribution.
.
. Neither the name of Agere Systems Inc. nor the names of the contributors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot replace these named entities with extra words markers, this has a huge performance hit. See #4515 (comment) for more details.

Signed-off-by: Alok Kumar <[email protected]>
This reverts commit ad4d85c.

Signed-off-by: Alok Kumar <[email protected]>
@alok1304
Copy link
Collaborator Author

Some test cases may be failing because the previous pr is not merged yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants