Skip to content

Conversation

@ArthurZucker
Copy link
Collaborator

What does this PR do?

  • adds pretty print
  • adds simple model-name as file
  • adds release date parsing
  • adds highest overall match
  • optional Jaccard
  • snapshot download
image

@ArthurZucker ArthurZucker requested a review from molbap October 6, 2025 10:03
Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks super nice, thanks!

Comment on lines +280 to +286
snapshot_path = snapshot_download(repo_id=self.hub_dataset, repo_type="dataset")
snapshot_dir = Path(snapshot_path)
missing = [
fname for fname in (EMBEDDINGS_PATH, INDEX_MAP_PATH, TOKENS_PATH) if not (snapshot_dir / fname).exists()
]
if missing:
raise FileNotFoundError("Missing expected files in Hub snapshot: " + ", ".join(missing))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaner handling, thanks :D

dict[str, str]: mapping of model_id -> ISO date string (YYYY-MM-DD).
Files without a match are simply omitted.
"""
import transformers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import transformers

Comment on lines 601 to 606
for md_path in root.glob("*.md"):
try:
text = md_path.read_text(encoding="utf-8", errors="ignore")
except Exception:
# Skip unreadable files quietly
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can log the file to INFO I suppose

Comment on lines 660 to 668
try:
source = file_path.read_text(encoding="utf-8")
except (FileNotFoundError, OSError):
return {}

try:
tree = ast.parse(source)
except SyntaxError:
return {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious, why the try/catch patterns here? Fine to keep them though

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex going crazy heaha

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it does allow us to keep going on the script

parser = argparse.ArgumentParser(prog="hf-code-sim")
parser.add_argument("--build", action="store_true")
parser.add_argument("--modeling-file", type=str)
parser.add_argument("--modeling-file", type=str, help='You can just specify "vits" if you are lazy like me.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😁

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ArthurZucker ArthurZucker merged commit 0452f28 into main Oct 6, 2025
17 checks passed
@ArthurZucker ArthurZucker deleted the add-dates branch October 6, 2025 10:52
AhnJoonSung pushed a commit to AhnJoonSung/transformers that referenced this pull request Oct 12, 2025
* update

* fancy table fancy prints

* download to cache folder, never need it everagain

* stule

* update based on review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants