Conversation
|
@JOJ0 or @AnssiAhola - any chance one of you could review this PR when you get a chance? Thanks! |
| @@ -354,18 +354,26 @@ def _transform(self, item): | |||
| return item | |||
|
|
|||
| def __getitem__(self, index): | |||
There was a problem hiding this comment.
As you pointed out, this would query each page starting from page 1, which isn't ideal.
Could we compare the offset with actual page item count and walk backwards if the actual count is less that offset? This way we could dramatically reduce requests for large indexes.
Also this change in logic could/should(?) be opt-in (like our "exponential backoff" feature) and could be enabled with something like
# Different ideas for enabling/disabling this feature
client.safe_pagination = True # Default: False.
client.unsafe_pagination = False # Default: True.
client.safe_pagination_indexing = True # Default: False
client.unsafe_pagination_indexing = False # Default: True
# ... Other options?@JOJ0 Any ideas/suggestions on this?
Also, please add tests that at least reproduces this bug so we can make sure it is fixed and stays fixed. Look in tests/test_models.py for examples how to mock api responses.
There was a problem hiding this comment.
Hi @GFlores17 thanks for the submission, it is a good idea to finally find a solution for this even though it's not optimal. I agree with all @AnssiAhola mentioned. We need:
- A test to reproduce
- It should be configurable
- And we need to document it
If you are you still motivated to move this PR forward, we can assist with the details then.
I asked and AI to draft some tests that might be a starting point. @AnssiAhola please have a look at them if good enough or too much. Thanks!
There was a problem hiding this comment.
@GFlores17 I rebased your 2 commits into one and force pushed. Please reset your branch to upstream. I also pushed the drafted tests.
There was a problem hiding this comment.
For the config option, what about:
trust_per_page = False # Default: True
There was a problem hiding this comment.
Tests seem fine, even though a bit verbose, as is tradition for AI.
For the config option, what about:
trust_per_page = False # Default: True
This sounds good to me. With some documentation this should be clear enough.
Also realized that my suggestion for "reverse walk" wouldn't work, so the original idea to start always from page 1 is pretty much necessary.
403761e to
a792460
Compare
a792460 to
eabf016
Compare
fix: add trust_per_page config, update __getitem__ and tests accordingly fix: add trust_per_page config, update __getitem__ and tests accordingly#
|
I implemented the trust_per_page config option and updated getitem to branch based on the setting. One thing I wanted to flag: the drafted tests assume sequential walking is the default, so having I updated the tests to explicitly set |
Problem
Discogs sometimes returns fewer items per page than the
per_pagemetadata claims. This causes
__getitem__to calculate incorrectpage indices and offsets, resulting in
IndexError: list index out of rangeon indices near page boundaries.Verified that page sizes vary and don't match metadata:
print(len(results.page(1))) # 47, not 50
print(len(results.page(2))) # 49, not 50
Fix
Rather than trusting
per_pagemetadata for offset math, walkthrough actual page sizes sequentially to find the correct page
and offset.
Tradeoff
Accessing a high index now requires fetching all preceding pages
sequentially. Performance could be improved further by caching
actual page sizes as they're fetched.
Closes #183