Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor lyrics tests, do not search for empty metadata #5452

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

snejus
Copy link
Member

@snejus snejus commented Oct 3, 2024

Description

Fixes #2635
Fixes #5133

I realised that #5406 has gotten too big, thus I'm splitting it into several smaller PRs.

This PR refactors lyrics plugin tests and fixes an empty metadata issue in the lyrics logic.

CI

  • Added --extras=lyrics to the Poetry install command to include the lyrics plugin dependencies.
  • In the main task which measures coverage, set LYRICS_UPDATED environment variable based on changes detected in the lyrics files.

Test setup

  • Introduced ConfigMixin to centralize configuration setup for tests, reducing redundancy. This can be used by tests based on pytest.

Lyrics logic

  • Trimmed whitespace from item.title, item.artist, and item.artist_sort in search_pairs function.
  • Added checks to avoid searching for lyrics if either the artist or title is missing.
  • Improved _scrape_strip_cruft function to remove Google Ads tags and unnecessary HTML tags.

Lyrics tests overhaul

  • Migrated lyrics tests to use pytest for better isolation and configuration management.
  • Deleted redundant lyrics text files and some unused utils.
  • Marked tests that should only run when lyrics source code is updated (LYRICS_UPDATED is set from the CI) using the on_lyrics_update marker.

Documentation and Dependencies

  • Added requests-mock version 1.12.1 to pyproject.toml and poetry.lock for mocking HTTP requests in tests.
  • Updated setup.cfg to include a new marker on_lyrics_update.

@snejus snejus self-assigned this Oct 3, 2024
@snejus snejus requested a review from bal-e October 3, 2024 13:41
@snejus snejus force-pushed the lyrics-refactor-tests branch 4 times, most recently from 626bf91 to 273e39b Compare October 3, 2024 19:11
Comment on lines +47 to +48
beetsplug/lyrics.py
test/plugins/test_lyrics.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely comfortable with this mode of testing-as-needed. These files necessarily reference many other parts of the codebase and it's always possible that they can break tests (even if they're pretty independent of the rest of the codebase right now, that can and probably will change in the future).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently these integration tests only run once a week, and no one finds out about failures 😟

CONTRIBUTING.rst Outdated Show resolved Hide resolved
@snejus snejus force-pushed the lyrics-refactor-tests branch 3 times, most recently from 200cd19 to 9e67a56 Compare October 12, 2024 01:15
@snejus snejus changed the base branch from master to lyrics-fix-tekstowo October 12, 2024 01:39
@snejus snejus force-pushed the lyrics-refactor-tests branch 2 times, most recently from b4d78da to 038ca48 Compare October 12, 2024 01:48
test/plugins/test_lyrics.py Outdated Show resolved Hide resolved
test/plugins/test_lyrics.py Outdated Show resolved Hide resolved
Base automatically changed from lyrics-fix-tekstowo to master October 12, 2024 21:52
Two google sources failed to return the expected output. I looked into
each case why parsing failed:

- lyrics on musica.com contain <aside> Google Ads
- each lyrics line on lacoccinelle.net is wrapped within alternating
  <em> and <strong> tags

Thus remove these tags as part of the HTML cleanup logic.
@snejus snejus force-pushed the lyrics-refactor-tests branch 2 times, most recently from 2fb6682 to d02e017 Compare October 12, 2024 22:10
@snejus snejus requested a review from bal-e October 12, 2024 22:18
Create 'helpers.ConfigMixin' which sets up testing configuration.
This is helpful for tests (e.g. test_lyrics.py) that only need the
configuration and do not require temp dir.

(#5102) Refactor lyrics tests to fix the issue global beets config
issue.

Additionally, add 'integration_test' mark that can be used to mark tests
that should only run once a week.
- Replaced unittest.mock with pytest fixtures for better test isolation and readability.
- Simplified test cases by using parameterized tests.
- Added `requests-mock` dependency to `pyproject.toml` and `poetry.lock`.
- Removed redundant helper functions and classes.
The test for GeniusLyrics was heavily patched and no longer provided
useful coverage. It has been removed to clean up the test suite.
- Consolidated multiple test cases into parameterized tests for better
  readability and maintainability.
- Simplified assertions by comparing lists of actual and expected
  artists/titles.
- Added `unexpected_empty_artist` marker to handle cases which
  unexpectedly return an empty artist. This seems to be happen when
  `artist_sort` field is empty.
Modified `search_pairs` function in `lyrics.py` to:

* Firstly strip each of `artist`, `artist_sort` and `title` fields
* Only generate alternatives if both `artist` and `title` are not empty
* Ensure that `artist_sort` is not empty and not equal to artist (ignoring
  case) before appending it to the artists

Extended tests to cover the changes.
Since at least one Backend requires album` and `duration` arguments
(`LRCLib`), the caller (`LyricsPlugin.fetch_item_lyrics`) must always
provide them.

Since they need to provided, we need to enforce this by defining them as
positional arguments.

Why is this important? I found that integrated `LRCLib` tests have been
passing, but they called `LRCLib.fetch` with values for `artist` and
`title` fields only, while the actual functionality *always* provides
values for `album` and `duration` fields too.

When I adjusted the test to provide values for the missing fields,
I found that it failed. This makes sense: Lib `album` and `duration`
filters are strict on LRCLib, so I was not surprised the lyrics could
not be found.

Thus I adjusted `LRCLib` backend implementation to only filter by each
of these fields when their values are truthy.
@snejus snejus force-pushed the lyrics-refactor-tests branch 2 times, most recently from 54f06f9 to 19dff1c Compare October 18, 2024 21:14
@snejus
Copy link
Member Author

snejus commented Oct 18, 2024

@bal-e added a pair of additional commits as I noticed that LRCLib integrated tests have been testing no-op logic and failed once I configured it correctly.

I also added lyrics texts that we're expecting to receive from each of the sources we're testing. These will be helpful to have when changes are made in the backends' scraping logic.

Add explicit checks for lyrics texts fetched from the tested sources.

- Introduced `LyricsPage` class to represent lyrics pages for integrated
  tests.
- Configured expected lyrics for each of the URLs that are being
  fetched.
- Consolidated integrated tests in a new `TestLyricsSources` class.
- Mocked Google Search API to return the lyrics page under test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants