Skip to content

Fix/getitem pagination walk#184

Open
GFlores17 wants to merge 3 commits intojoalla:masterfrom
GFlores17:fix/getitem-pagination-walk
Open

Fix/getitem pagination walk#184
GFlores17 wants to merge 3 commits intojoalla:masterfrom
GFlores17:fix/getitem-pagination-walk

Conversation

@GFlores17
Copy link
Copy Markdown

Problem

Discogs sometimes returns fewer items per page than the per_page
metadata claims. This causes __getitem__ to calculate incorrect
page indices and offsets, resulting in IndexError: list index out of range on indices near page boundaries.

Verified that page sizes vary and don't match metadata:
print(len(results.page(1))) # 47, not 50
print(len(results.page(2))) # 49, not 50

Fix

Rather than trusting per_page metadata for offset math, walk
through actual page sizes sequentially to find the correct page
and offset.

Tradeoff

Accessing a high index now requires fetching all preceding pages
sequentially. Performance could be improved further by caching
actual page sizes as they're fetched.

Closes #183

@prcutler
Copy link
Copy Markdown
Contributor

prcutler commented Apr 4, 2026

@JOJ0 or @AnssiAhola - any chance one of you could review this PR when you get a chance? Thanks!

Comment thread discogs_client/models.py
@@ -354,18 +354,26 @@ def _transform(self, item):
return item

def __getitem__(self, index):
Copy link
Copy Markdown
Contributor

@AnssiAhola AnssiAhola Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you pointed out, this would query each page starting from page 1, which isn't ideal.
Could we compare the offset with actual page item count and walk backwards if the actual count is less that offset? This way we could dramatically reduce requests for large indexes.

Also this change in logic could/should(?) be opt-in (like our "exponential backoff" feature) and could be enabled with something like

# Different ideas for enabling/disabling this feature

client.safe_pagination = True # Default: False.
client.unsafe_pagination = False # Default: True.
client.safe_pagination_indexing = True # Default: False
client.unsafe_pagination_indexing = False # Default: True

# ... Other options?

@JOJ0 Any ideas/suggestions on this?

Also, please add tests that at least reproduces this bug so we can make sure it is fixed and stays fixed. Look in tests/test_models.py for examples how to mock api responses.

Copy link
Copy Markdown
Contributor

@JOJ0 JOJ0 Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @GFlores17 thanks for the submission, it is a good idea to finally find a solution for this even though it's not optimal. I agree with all @AnssiAhola mentioned. We need:

  • A test to reproduce
  • It should be configurable
  • And we need to document it

If you are you still motivated to move this PR forward, we can assist with the details then.

I asked and AI to draft some tests that might be a starting point. @AnssiAhola please have a look at them if good enough or too much. Thanks!

Copy link
Copy Markdown
Contributor

@JOJ0 JOJ0 Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GFlores17 I rebased your 2 commits into one and force pushed. Please reset your branch to upstream. I also pushed the drafted tests.

Copy link
Copy Markdown
Contributor

@JOJ0 JOJ0 Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the config option, what about:

trust_per_page = False  # Default: True

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests seem fine, even though a bit verbose, as is tradition for AI.

For the config option, what about:

trust_per_page = False  # Default: True

This sounds good to me. With some documentation this should be clear enough.

Also realized that my suggestion for "reverse walk" wouldn't work, so the original idea to start always from page 1 is pretty much necessary.

@JOJ0 JOJ0 force-pushed the fix/getitem-pagination-walk branch from 403761e to a792460 Compare April 7, 2026 15:50
@JOJ0 JOJ0 force-pushed the fix/getitem-pagination-walk branch from a792460 to eabf016 Compare April 7, 2026 15:51
fix: add trust_per_page config, update __getitem__ and tests accordingly

fix: add trust_per_page config, update __getitem__ and tests accordingly#
@GFlores17
Copy link
Copy Markdown
Author

GFlores17 commented Apr 8, 2026

I implemented the trust_per_page config option and updated getitem to branch based on the setting.

One thing I wanted to flag: the drafted tests assume sequential walking is the default, so having trust_per_page = True as the default causes the test suite to fail out of the box.

I updated the tests to explicitly set trust_per_page = False so the suite passes with the default config. Does that align with what you both had in mind?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

"list index out of range" on random releases in search

4 participants