Skip to content

Avoid Unidecoding Query Strings in Deezer Metadata Plugin #5882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

semohr
Copy link
Contributor

@semohr semohr commented Jul 17, 2025

Moved the _construct_search_query method that was duplicated in the spotify and deezer plugins into the (new) shared helper baseclass ( SearchApiMetadataSourcePlugin).

By default deezer now also does not ascii encode a query when it is send but the legacy behaviour can be restored using the deezer.search_query_ascii option.

closes #5860

  • Changelog
  • Documentation

@semohr semohr requested review from Copilot and snejus and removed request for Copilot July 17, 2025 15:14
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR moves duplicated search query construction logic from the Spotify and Deezer plugins into a shared base class and changes the default behavior to avoid ASCII encoding of search queries. The key motivation is to improve search results for non-Latin characters while maintaining backward compatibility.

  • Refactored _construct_search_query method into shared SearchApiMetadataSourcePlugin base class
  • Changed default behavior to preserve Unicode characters in search queries instead of ASCII encoding
  • Added configuration option search_query_ascii to restore legacy ASCII encoding behavior

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
beets/metadata_plugins.py Added shared _construct_search_query method and search_query_ascii config option to base class
beetsplug/spotify.py Removed duplicated _construct_search_query method and config option
beetsplug/deezer.py Removed duplicated _construct_search_query method and unidecode import
docs/plugins/deezer.rst Added documentation for new search_query_ascii configuration option
docs/changelog.rst Added changelog entry describing the fix and new configuration option

Copy link

codecov bot commented Jul 17, 2025

Codecov Report

Attention: Patch coverage is 85.71429% with 2 lines in your changes missing coverage. Please review.

Project coverage is 63.77%. Comparing base (bde5de4) to head (d2cca4a).

Files with missing lines Patch % Lines
beets/metadata_plugins.py 90.90% 1 Missing ⚠️
beetsplug/deezer.py 0.00% 1 Missing ⚠️
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines +422 to +423
if self.config["search_query_ascii"].get():
query = unidecode.unidecode(query)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say we should just bomb unidecode. I just checked that deezer returns crap if I unidecode a japanese album name, and it returns the correct album if I don't.

requests these days are more than capable of dealing with any sort of strings, so I think we can safely unidecode for good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so too before I looked into why this was implemented in the first place. I don't think we should just remove it. The query never was the issue. The main issue is the terminal or badly pasted queries if I remember correctly.

Please have a look at #5705


query_components = [
keywords,
" ".join(f'{k}:"{v}"' for k, v in filters.items()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if there is a double quote character in the query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea. Depends on spotify or deezer, I guess. I check that both support quoted queries and they do but have not checked double quoted.

Deezer even needs quoted strings otherwise the api just returns rubbish. Before this PR spotify did not quote string whereas deezer did. Was the only difference in the functions.

Comment on lines 402 to 412
"""Construct a query string with the specified filters and keywords to
be provided to the Spotify (or similar) Search API.

At the moment, this is used to construct a query string for:
- Spotify (https://developer.spotify.com/documentation/web-api/reference/search).
- Deezer (https://developers.deezer.com/api/search).

:param filters: Field filters to apply.
:param keywords: Query keywords to use.
:return: Query string to be provided to the Search API.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface sets up this method, but it shouldn't know the specifics about what implements it.

Since the parameters' names are clear, I don't think we need to repeat this information in the docstring.

Suggested change
"""Construct a query string with the specified filters and keywords to
be provided to the Spotify (or similar) Search API.
At the moment, this is used to construct a query string for:
- Spotify (https://developer.spotify.com/documentation/web-api/reference/search).
- Deezer (https://developers.deezer.com/api/search).
:param filters: Field filters to apply.
:param keywords: Query keywords to use.
:return: Query string to be provided to the Search API.
"""
"""Construct a query with the specified filters and query string."""

Copy link
Contributor Author

@semohr semohr Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the issue is tho that we lose the references to the docs if we just remove it, as these methods aren't present in the plugins file anymore. There is no repeated information.

Copy link
Contributor Author

@semohr semohr Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this function kinda is bound to the implementation specific as it only makes sense in the context of these two plugins and for the deezer and spotify api. Again kinda an ambiguity with our deeper structure if you ask me:

plugin = beetsplug(beets)
# vs
plugin = beets(beetsplug(beets))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring sort of violates the Open/Closed SOLID principle: if we add another data source that inherits SearchApiMetadataSourcePlugin and uses this method, we will need to update this docstring, which is bad practice. Practically speaking, this docstring would not get updated and end up being misleading.

This method (including the docstring) should be implemented/documented in a generic manner and closed for modification in order to prevent this issue.

You can move the URLs under respective _search_api implementations instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can move the comment, but I still think it fits better alongside the query function. The root issue here is that this setup already violates the dependency inversion principle, so moving the comment doesn’t really solve anything, it just reinforces the design flaw.

This docstring sort of violates the Open/Closed SOLID principle: if we add another data source that inherits SearchApiMetadataSourcePlugin and uses this method, we will need to update this docstring, which is bad practice. Practically speaking, this docstring would not get updated and end up being misleading.

I get the concern about the docstring violating the Open/Closed Principle, and that’s valid, but we’re already operating in a structure that makes adhering to these principles tricky. Rather than selectively applying them, I think we should either address the underlying design or be explicit about the compromises we're making.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly think it is important for someone looking at the function to know why the query string looks that way. And this is only clear in the context of the spotify api documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I'm following you regarding these two points:

The root issue here is that this setup already violates the dependency inversion principle

And this is only clear in the context of the spotify api documentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I've been trying to point out for quite some time that there's a problem with how plugins and core logic are not properly separated (see, for example, Avoid Unidecoding Query Strings in Deezer Metadata Plugin #5882 (comment) or the get_artists discussion). Our ongoing debates stem from this fundamental issue. Since you value design principles, this essentially comes down to a violation of the dependency inversion principle.

  2. Consider encountering the create_query function somewhere without any context. A comment explaining why the query is constructed that way, such as linking to spotifys or deezers specifications, would be very helpful. Having to dive into the spotify or deezer plugin just to understand the rationale makes maintenance harder, even if adding such comments slightly bends some design principles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid Unidecoding Query Strings in Deezer Metadata Plugin
2 participants