"[ENH]Implement programmatic lookup for MTYPE and SCITYPE"#986
"[ENH]Implement programmatic lookup for MTYPE and SCITYPE"#986sun-9545sunoj wants to merge 5 commits intosktime:mainfrom
Conversation
…d update contributors
There was a problem hiding this comment.
Pull request overview
Implements lazy, programmatic lookup for MTYPE_SOFT_DEPS and SCITYPE_* registries to break circular imports between _registry.py and _check.py, and expands mtype registry tuples to carry soft dependency metadata.
Changes:
- Expanded
MTYPE_REGISTER_*entries from 3-tuples to 4-tuples to include optionalpython_dependencies. - Added PEP 562
__getattr__-based lazy generation + caching forMTYPE_SOFT_DEPS,SCITYPE_REGISTER, andSCITYPE_LIST, with a reentrancy guard. - Updated
_check.scitypeto avoid importingSCITYPE_LISTat module import time.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| skpro/datatypes/_registry.py | Adds lazy registry lookup via __getattr__, cache, and reentrancy guard; updates scitype/mtype logic to use dynamic registries. |
| skpro/datatypes/_check.py | Avoids eager import of SCITYPE_LIST; lazily imports it when needed in scitype(). |
| skpro/datatypes/_table/_registry.py | Extends Table mtype registry tuples to include soft dependency metadata (e.g., polars). |
| skpro/datatypes/_proba/_registry.py | Extends Proba mtype registry tuples to the new 4-element format. |
| .all-contributorsrc | Adds contributor entry and minor JSON formatting adjustment. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…d update contributors
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _get_registry(name): | ||
| if name in _CACHE: | ||
| return _CACHE[name] | ||
|
|
||
| with _REGISTRY_LOCK: | ||
| if name in _CACHE: | ||
| return _CACHE[name] |
There was a problem hiding this comment.
_get_registry reads _CACHE outside of _REGISTRY_LOCK (lines 75-76). If one thread pre-seeds _CACHE[...] during generation, another thread can observe and return the still-empty placeholders, leading to transient wrong results (e.g., empty SCITYPE_LIST). Consider acquiring _REGISTRY_LOCK before the initial cache-hit check (or using a per-name “generating” sentinel/event) so other threads block until generation is complete while still allowing same-thread re-entrancy via the RLock.
| None, | ||
| ), | ||
| ("numpy1D", "Table", "1D np.ndarray representation of a univariate table", None), | ||
| ("numpy2D", "Table", "2D np.ndarray representation of a univariate table", None), |
There was a problem hiding this comment.
The numpy2D entry description still says "univariate table", but the corresponding checker (TableNp2D) supports multivariate tables (capability:multivariate=True and is_univariate depends on column count). Please update the registry description to avoid misleading users.
| ("numpy2D", "Table", "2D np.ndarray representation of a univariate table", None), | |
| ("numpy2D", "Table", "2D np.ndarray representation of a data table", None), |
| if name in ["MTYPE_SOFT_DEPS", "SCITYPE_REGISTER", "SCITYPE_LIST"]: | ||
| # Pre-seed _CACHE to prevent infinite recursion during generation | ||
| # Re-entrant calls (e.g., from inspect) will receive these references | ||
| _CACHE["MTYPE_SOFT_DEPS"] = {} | ||
| _CACHE["SCITYPE_REGISTER"] = [] | ||
| _CACHE["SCITYPE_LIST"] = [] | ||
|
|
||
| try: | ||
| from skpro.datatypes._check import get_check_dict | ||
|
|
||
| check_dict = get_check_dict(soft_deps="all") | ||
|
|
||
| soft_deps = {} | ||
| scitypes = set() | ||
|
|
||
| for k, cls in check_dict.items(): | ||
| if hasattr(cls, "get_class_tag"): | ||
| scitype = cls.get_class_tag("scitype") | ||
| else: | ||
| scitype = k[1] | ||
| if scitype is not None: | ||
| scitypes.add(scitype) | ||
|
|
||
| for mtype_tuple in MTYPE_REGISTER: | ||
| mtype = mtype_tuple[0] | ||
| scitype = mtype_tuple[1] | ||
| scitypes.add(scitype) | ||
| if len(mtype_tuple) >= 4 and mtype_tuple[3] is not None: | ||
| deps = mtype_tuple[3] | ||
| soft_deps[mtype] = ( | ||
| list(deps) if isinstance(deps, tuple) else deps | ||
| ) | ||
|
|
||
| scitype_register = [] | ||
| for sci in sorted(scitypes): | ||
| desc = _SCITYPE_DESCRIPTIONS.get(sci) | ||
| if desc is None: | ||
| raise ValueError( | ||
| f"scitype '{sci}' is missing a description in " | ||
| "`_SCITYPE_DESCRIPTIONS`. Please add it to " | ||
| "`skpro.datatypes._registry._SCITYPE_DESCRIPTIONS`." | ||
| ) | ||
| scitype_register.append((sci, desc)) | ||
|
|
||
| scitype_list = [x[0] for x in scitype_register] | ||
|
|
||
| # Update the pre-seeded objects in-place | ||
| _CACHE["MTYPE_SOFT_DEPS"].update(soft_deps) | ||
| _CACHE["SCITYPE_REGISTER"].extend(scitype_register) | ||
| _CACHE["SCITYPE_LIST"].extend(scitype_list) | ||
|
|
||
| return _CACHE[name] |
There was a problem hiding this comment.
The new lazy-generated registries (MTYPE_SOFT_DEPS, SCITYPE_REGISTER, SCITYPE_LIST) and the expanded 4-tuple MTYPE_REGISTER format aren’t covered by a targeted regression test. In particular, there’s no assertion that MTYPE_SOFT_DEPS contains the expected keys/values (e.g., polars mtypes require both polars and pyarrow) and that scitype_to_mtype(..., softdeps='exclude'|'present') filters based on those values. Please add a focused test to lock this behavior down, since subtle registry changes can otherwise silently alter API results.
|
@copilot apply changes based on the comments in this thread |
Reference Issues/PRs
Closes #9704
What does this implement/fix? Explain your changes.
This PR implements a programmatic lookup for MTYPE and SCITYPE registries to resolve long-standing circular dependency issues between _registry.py and _check.py.
Dynamic Attributes: Replaced static dictionary/list definitions with PEP 562 getattr logic to allow module-level attributes to be generated on-demand.
Recursion Guard: Implemented a global _IS_GENERATING reentrancy lock to prevent infinite recursion loops during attribute lookup, ensuring compatibility with IDE autocompletion and inspection tools.
Metadata Extraction: Updated the tabular registry (MTYPE_REGISTER_TABLE) to 4-element tuples to support lazy extraction of python_dependencies without triggering ModuleNotFoundError for missing soft dependencies like polars.
Contributor Update: Added myself to the .all-contributorsrc file with code and bug badges.
Does your contribution introduce a new dependency? If yes, which one?
No.
What should a reviewer concentrate their feedback on?
The robustness of the _IS_GENERATING guard in handling edge cases during registry assembly.
The expansion of the MTYPE_REGISTER_TABLE tuples to ensure they correctly map to the new dynamic parsing logic.
Did you add any tests for the change?
I verified the changes using the existing datatype test suite. Run pytest skpro/datatypes/tests/ -v showing 107 passed tests. The logic successfully handles environments with and without optional soft dependencies.
Any other comments?
This cleanup provides a more maintainable foundation for the datatypes module and simplifies the internal import structure of the package.