-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Avoid Unidecoding Query Strings in Deezer Metadata Plugin #5882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR moves duplicated search query construction logic from the Spotify and Deezer plugins into a shared base class and changes the default behavior to avoid ASCII encoding of search queries. The key motivation is to improve search results for non-Latin characters while maintaining backward compatibility.
- Refactored
_construct_search_query
method into sharedSearchApiMetadataSourcePlugin
base class - Changed default behavior to preserve Unicode characters in search queries instead of ASCII encoding
- Added configuration option
search_query_ascii
to restore legacy ASCII encoding behavior
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
beets/metadata_plugins.py | Added shared _construct_search_query method and search_query_ascii config option to base class |
beetsplug/spotify.py | Removed duplicated _construct_search_query method and config option |
beetsplug/deezer.py | Removed duplicated _construct_search_query method and unidecode import |
docs/plugins/deezer.rst | Added documentation for new search_query_ascii configuration option |
docs/changelog.rst | Added changelog entry describing the fix and new configuration option |
deezer and spotify functionalities.
Codecov ReportAttention: Patch coverage is
🚀 New features to boost your workflow:
|
if self.config["search_query_ascii"].get(): | ||
query = unidecode.unidecode(query) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say we should just bomb unidecode
. I just checked that deezer
returns crap if I unidecode
a japanese album name, and it returns the correct album if I don't.
requests
these days are more than capable of dealing with any sort of strings, so I think we can safely unidecode
for good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so too before I looked into why this was implemented in the first place. I don't think we should just remove it. The query never was the issue. The main issue is the terminal or badly pasted queries if I remember correctly.
Please have a look at #5705
beets/metadata_plugins.py
Outdated
|
||
query_components = [ | ||
keywords, | ||
" ".join(f'{k}:"{v}"' for k, v in filters.items()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if there is a double quote character in the query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea. Depends on spotify or deezer, I guess. I check that both support quoted queries and they do but have not checked double quoted.
Deezer even needs quoted strings otherwise the api just returns rubbish. Before this PR spotify did not quote string whereas deezer did. Was the only difference in the functions.
beets/metadata_plugins.py
Outdated
"""Construct a query string with the specified filters and keywords to | ||
be provided to the Spotify (or similar) Search API. | ||
|
||
At the moment, this is used to construct a query string for: | ||
- Spotify (https://developer.spotify.com/documentation/web-api/reference/search). | ||
- Deezer (https://developers.deezer.com/api/search). | ||
|
||
:param filters: Field filters to apply. | ||
:param keywords: Query keywords to use. | ||
:return: Query string to be provided to the Search API. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interface sets up this method, but it shouldn't know the specifics about what implements it.
Since the parameters' names are clear, I don't think we need to repeat this information in the docstring.
"""Construct a query string with the specified filters and keywords to | |
be provided to the Spotify (or similar) Search API. | |
At the moment, this is used to construct a query string for: | |
- Spotify (https://developer.spotify.com/documentation/web-api/reference/search). | |
- Deezer (https://developers.deezer.com/api/search). | |
:param filters: Field filters to apply. | |
:param keywords: Query keywords to use. | |
:return: Query string to be provided to the Search API. | |
""" | |
"""Construct a query with the specified filters and query string.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, the issue is tho that we lose the references to the docs if we just remove it, as these methods aren't present in the plugins file anymore. There is no repeated information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this function kinda is bound to the implementation specific as it only makes sense in the context of these two plugins and for the deezer and spotify api. Again kinda an ambiguity with our deeper structure if you ask me:
plugin = beetsplug(beets)
# vs
plugin = beets(beetsplug(beets))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This docstring sort of violates the Open/Closed SOLID principle: if we add another data source that inherits SearchApiMetadataSourcePlugin
and uses this method, we will need to update this docstring, which is bad practice. Practically speaking, this docstring would not get updated and end up being misleading.
This method (including the docstring) should be implemented/documented in a generic manner and closed for modification in order to prevent this issue.
You can move the URLs under respective _search_api
implementations instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can move the comment, but I still think it fits better alongside the query function. The root issue here is that this setup already violates the dependency inversion principle, so moving the comment doesn’t really solve anything, it just reinforces the design flaw.
This docstring sort of violates the Open/Closed SOLID principle: if we add another data source that inherits SearchApiMetadataSourcePlugin and uses this method, we will need to update this docstring, which is bad practice. Practically speaking, this docstring would not get updated and end up being misleading.
I get the concern about the docstring violating the Open/Closed Principle, and that’s valid, but we’re already operating in a structure that makes adhering to these principles tricky. Rather than selectively applying them, I think we should either address the underlying design or be explicit about the compromises we're making.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mainly think it is important for someone looking at the function to know why the query string looks that way. And this is only clear in the context of the spotify api documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I'm following you regarding these two points:
The root issue here is that this setup already violates the dependency inversion principle
And this is only clear in the context of the spotify api documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I've been trying to point out for quite some time that there's a problem with how plugins and core logic are not properly separated (see, for example, Avoid Unidecoding Query Strings in Deezer Metadata Plugin #5882 (comment) or the get_artists discussion). Our ongoing debates stem from this fundamental issue. Since you value design principles, this essentially comes down to a violation of the dependency inversion principle.
-
Consider encountering the
create_query
function somewhere without any context. A comment explaining why the query is constructed that way, such as linking to spotifys or deezers specifications, would be very helpful. Having to dive into the spotify or deezer plugin just to understand the rationale makes maintenance harder, even if adding such comments slightly bends some design principles.
removed legacy isinstance check
Moved the
_construct_search_query
method that was duplicated in the spotify and deezer plugins into the (new) shared helper baseclass (SearchApiMetadataSourcePlugin
).By default deezer now also does not ascii encode a query when it is send but the legacy behaviour can be restored using the
deezer.search_query_ascii
option.closes #5860