12 generalize downloader by liellnima · Pull Request #13 · RolnickLab/ClimateSetExtension

liellnima · 2024-10-26T00:36:54Z

I am creating a pull request, so the branches are not splitting up too much over time.

Current state:

removed some parts that were blocking generalizing. e.g. sticking to the selected_scenario file instead of the broader files. removed other files (or moved them somewhere else) if I realized we are not using them and don't need them in the future.
simplified the constant files: simple model list instead of dictionary. generalized to one node-link per project.
extended options: we can use all desired models for cmip6 now
extended scenarios: we can use all desired scenarios/experiments for cmip6 now
extended projects: first naive approach to extend the downloader to different projects. This will need some major rewriting though, which is why I stopped this and would like to do this in a separate PR. There is a first naive implementation. I think we need an abstract Downlaoder, and then children for the different projects (input4mips, cmip6, cmip7). This way the user can implement Downloader classes for the projects we are not supporting.
max_ensemble number of ensemble_members: this was already supported by the previous code and works now as well.
removed some other things that we don't need to simplify the code
implemented some failure/breaking points where needed (because code is otherwise downloading stuff the user doesn't need). still needs to be done for the input4mips part as well.
each project gets its own constant files (not sure this is elegant). this way we can load the relevant constants for this project and a separate node link (necessary!). each project might have different models / variables or e.g. no models at all (see input4mips)
cmip7 (or cmip6plus) config is there, and I tested it - unclear if it's an server issue / pyesgf issue / or issue on our side
updated the yaml downloader configs accordingly to the named changes (tag "project" - if ommited it should default to cmip6)

The other future_cases are not working yet and have not been implemented yet. Please ignore those.

Next immediate steps: (Input4Mips handling)

input4mips: rewrite some funcs, such as generate_variables
split the raw / model vars in the cmip6 constant files for input4mips vs cmip6
get input4mips configs running as expected
remove default downloader behaviour

Other next steps:

Rewrite downloader class, so that the project types are handled in a clean way. Each project needs its own downloading function, different handling for special variables if desired, and init_globs (or sth like that) imo.
grid handling: we should get rid of the explicit grid mentions. first of all, different variables will need different grids (oceans need gr instead of gn e.g.). secondly, each variable should only have one grid available, i.e. we want to retrieve it automatically. This is a potential failure point for downloading certain data.
extend configs for testing the downloader so each model component is tested once (not just atmosphere and sea-ice as it is now)
think about reducing logger output / terminal output. Is it too much?

Backlog:

time-window (we can calmly ignore that for now)

…der experiment/scenario set, remove some defaults that result in unituitive results, add some failure points where needed, add naive approach for scenario handling

liellnima · 2024-10-26T00:40:20Z

I haven't run the tests yet and I can already see that the linters are failing here (it worked locally for me though).

I just wanted to make sure that you have access to the latest version and can look into the changes. And I think it would be good to merge current changes before I continue adapting the downloader further.

With those changes all my cmip6 test cases are now working (6/6), so that's already a big step :D

…sgf.py. split up raw and model vars. remove unused constants.

…g. update attribute handling of class. rewrite some if-else blocks. unify model and raw input vars handling. update constants. rename emission handling funcs. add comments for attributes in downloader class.

… Downloader

f-PLT · 2025-02-27T04:50:20Z

@liellnima

Pushed my updates for review/testing.

Updated download configs
Added config classes that are used by the different downloaders
- These config classes handle some of the validating and preparations
Fixed the downloader tests
Added an example script, scripts/download_example.py

Let me know what you thinks and if you find any bugs/typos of problems!

liellnima · 2025-02-27T17:55:55Z

…i-value querying - Added `esgpull` to `pyproject.toml` as part of the overarching task to implement an async `esgpull` downloader client. - Refactored `climateset/download/constraints.py` to support native multi-value lists seamlessly compatible with `esgpull.models.Query(selection=...)`. - Added a `to_esgpull_query()` method while retaining the original `to_esgf_params()` boundary, avoiding breakage of `esgf-pyclient` dependent logic. - Updated `test_constraints.py` with corresponding multi-value list tests. - Marked task 01 as completed.

- Created isolated_esgpull_context context manager in climateset/download/utils.py to prevent SQLite file lock collisions during parallel execution. - Configured Esgpull to initialize locally within a unique UUID path inside RAW_DATA/.esgpull_jobs. - Wrapped initialization and execution in a try/finally block with shutil.rmtree to ensure ephemeral state is securely purged after use. - Added test_isolated_esgpull_context in tests/test_download/test_utils.py to assert behavior. - Marked task 02 as completed.

…ntract - Created EsgpullDownloader mimicking utils.py functions to perform search via esgpull. - Replaced iterative fallback logic with efficient hints discovery queries mapped onto esgpull.models.Query. - Integrated missing dynamically configured esgpull facet properties (version, target_mip) via Selection.configure(). - Implemented option parsing (distrib, latest) ensuring robust bulk constraints execution. - Covered context logic and mock validations thoroughly via test_esgpull_downloader.py. - Marked task 03 as completed.

- Added test_esgpull_downloader_integration_search to perform a real search on ESGF nodes. - Validated that results successfully fetch and parse esgpull.models.File items. - Asserted Dataset ID mapping properties (model, experiment, variable) accurately returned matching properties.

- Implemented async download runner _download_and_move_files for extracting queries natively via asyncio. - Transferred resulting .nc files utilizing shutil.move() from the localized internal isolated UUID DB directly into target RAW_DATA project-specific locations. - Safely integrated esg.db.add tracking models bypassing previously dependent ESGF bash THREDDS downloads. - Validated via rigorous patches mocking asyncio downloads and ensuring mapping matches correctly via test_esgpull_downloader.py. - Verified file integration structure fully correctly during E2E assertions resolving the Task 4 goal. - Marked task 04 as completed.

- Implemented real-search automated tests for EsgpullDownloader, fulfilling the mandate to never mock the search querying phase. - Added global AsyncMock intercept for Esgpull.download to prevent data bandwidth usage in CI while allowing end-to-end flow verification. - Performed manual verification of storage independence (using .esgpull_jobs UUID isolation) and subprocess elimination (async native downloads). - Confirmed backward compatibility with existing esgf-pyclient downloader tests. - Refined iterative search logic by removing problematic nominal_resolution constraint which caused empty overlaps in esgpull. - Marked task 05 as completed.

# Conflicts: # .github/workflows/lint.yml # .github/workflows/precommit.yml # .make/base.make # Makefile.private.example # noxfile.py

f-PLT · 2026-03-24T18:48:56Z

Right now, the previous py-esgf client has been refactored and still works, but has some problems with some sources (like for biomass-burning).

A new implementation using esgpull is now available, tested and available also in the scripts/download_example.py scripts

Some more review is needed on my part, as well as adding information about the new client and how to use it.

liellnima added 7 commits October 22, 2024 15:39

remove model source center and reformat

9e92fbc

move selected scenario mip files to docs

7e08575

update download configs for project, and ensemble members

68dabf5

remove unused esm_constants

114eb10

add new constant files for each esgf project type

47f0c78

remove get_selected_scenario as it is too restricting

aa89ff6

remove restricting funcs, extend to broader model set, extend to broa…

a61dc2a

…der experiment/scenario set, remove some defaults that result in unituitive results, add some failure points where needed, add naive approach for scenario handling

liellnima linked an issue Oct 26, 2024 that may be closed by this pull request

Generalize Downloader #12

Open

f-PLT assigned liellnima Nov 5, 2024

f-PLT self-requested a review November 5, 2024 15:55

liellnima and others added 16 commits November 19, 2024 20:23

move constants into constant classes, and collect them in a dict in e…

b868033

…sgf.py. split up raw and model vars. remove unused constants.

update configs: move project id to the top

bb7b8f1

update download_from_config func with new constant and config handlin…

5a0c38f

…g. update attribute handling of class. rewrite some if-else blocks. unify model and raw input vars handling. update constants. rename emission handling funcs. add comments for attributes in downloader class.

Add base structure for abstract downloader and implementations

ad5e0b0

Refactor ESGF constants and project constants

aa0e451

Add first base structure of Config classes

6a76fa9

Integrate config class for Input4mips

23b0bea

Implement config classes

770c003

Update tests

0ea3aae

Refactor CMIP6Downloader for multiple models

9df2456

Cleanup of downloader.py file

df32336

Cleanup of downloader.py file

1f2ff66

Update all download config files

7789c1b

Add download example

ecf8a41

Update download_from_config_file() to use existing functions for each…

260952d

… Downloader

Fix Pylint errors

1017f14

fix typo

445115f

f-PLT and others added 27 commits May 23, 2025 16:06

Update with new QA tools and new Makefile version

2c341dc

Ruff fix lint + formatting

73b56a2

Update and fix failing test

4f0283b

Refactor input4mips constants for safety

0ceafa0

Handle pylint warnings

3ed950c

Update github actions

99561a3

Formatting for pyproject.toml

bf53e46

Refactor downloader constants

665d77b

Refactor downloader_config from Abstract to base inheritance

6717393

Update .pre-commit-config.yaml

ee52825

Update pyproject.toml

cf437bd

Save progress - Prototype url search

fe35c4f

Create constraints classes

05e98cc

Remove pytest xfail for test_downloader_model_params

29c1fca

Implement new search client

a5a8b9e

Use new client in utils.py

37ccda1

Add esgpull downloader

4d4b4bc

Cleanup unused dedicated esgpull downloader

8dc75b4

Remove push from CI triggers

e09705a

Refactor esgpull utils

d15f462

Merge branch 'main' into 12-generalize-downloader

c2c8363

# Conflicts: # .github/workflows/lint.yml # .github/workflows/precommit.yml # .make/base.make # Makefile.private.example # noxfile.py

f-PLT added 2 commits March 24, 2026 17:03

Update Makefile for bugfix

7099073

Update Makefile for bugfix

04ef0e5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

12 generalize downloader#13

12 generalize downloader#13
liellnima wants to merge 55 commits intomainfrom
12-generalize-downloader

liellnima commented Oct 26, 2024 •

edited

Loading

Uh oh!

liellnima commented Oct 26, 2024

Uh oh!

f-PLT commented Feb 27, 2025 •

edited

Loading

Uh oh!

liellnima commented Feb 27, 2025

Uh oh!

f-PLT commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

liellnima commented Oct 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liellnima commented Oct 26, 2024

Uh oh!

f-PLT commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liellnima commented Feb 27, 2025

Uh oh!

f-PLT commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

liellnima commented Oct 26, 2024 •

edited

Loading

f-PLT commented Feb 27, 2025 •

edited

Loading