Skip to content

Conversation

hoodmane
Copy link

This is a reland of #12962. That PR broke accesses to spacy.cli:

import spacy
spacy.cli # NameError

This avoids such a problem by using module __getattr__.

@hoodmane
Copy link
Author

Well this does not work quite yet.

@hoodmane
Copy link
Author

Okay now it's better.

@nro-bot
Copy link

nro-bot commented Dec 23, 2024

Hi @adrianeboyd, spacy + pyodide still blocked on this and wondering if you could take a look! Thanks (and also to hoodmane)

@nro-bot
Copy link

nro-bot commented Jan 3, 2025

Would a comment like

Lazily load `cli` submodule (used by `info`) to avoid requiring packages such as `requests` to run SpaCy

be appropriate for __getattr__?

And for curiosity, is __all__ required for lazy loading or added as "this is good practice"?
Thanks!

@hoodmane
Copy link
Author

Yes. And __all__ is good practice but it's also needed to avoid dropping cli from the things imported by from mod import *

@nro-bot
Copy link

nro-bot commented Feb 13, 2025

@adrianeboyd Another ping on this (as it's been two months) :)

@honnibal
Copy link
Member

@nro-bot Sorry for the lack of attention on this. Adriane no longer works at Explosion (we're no longer venture-backed), and my time for support has been split over a lot of different things.

I'm now getting more on top of it, having just finished the support for Python 3.13.

Could you summarise a little more what the lazy loading is intending to do? Is it that you want to avoid the import of the pipeline at loading, so that you can avoid triggering stuff that doesn't work?

An issue with the lazy-loading is that spaCy heavily depends on entry-points. These end up importing everything to get decorators to run as side-effects. I've refactored this in the patch for 3.13, so we no longer rely on these import side-effects to run decorators. That said, the registration functions (which need to run to do anything much) still need to import pretty much everything. So it's not clear to me what the net impact for your use-case would be.

@hoodmane
Copy link
Author

Could you sumarise a little more what the lazy loading is intending to do? Is it that you want to avoid the import of the pipeline at loading, so that you can avoid triggering stuff that doesn't work?

Yeah that's what we're after.

it's not clear to me what the net impact for your use-case would be.

The test here works fine with this patch:
https://github.com/pyodide/pyodide/blob/144c3de286ad41575b7d7fcc10438465a1e093fe/packages/spacy/test_spacy.py

@jakob1379
Copy link

The impact here is the __getattr__ effectively deferre the importing of the submodule and whatever cascade of loading that does until cli or info is called.

This prevents the need for moving import into the code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants