Skip to content

"[ENH]Implement programmatic lookup for MTYPE and SCITYPE"#986

Closed
sun-9545sunoj wants to merge 5 commits intosktime:mainfrom
sun-9545sunoj:fix/9704-programmatic-lookup
Closed

"[ENH]Implement programmatic lookup for MTYPE and SCITYPE"#986
sun-9545sunoj wants to merge 5 commits intosktime:mainfrom
sun-9545sunoj:fix/9704-programmatic-lookup

Conversation

@sun-9545sunoj
Copy link
Copy Markdown

Reference Issues/PRs
Closes #9704

What does this implement/fix? Explain your changes.

This PR implements a programmatic lookup for MTYPE and SCITYPE registries to resolve long-standing circular dependency issues between _registry.py and _check.py.

Dynamic Attributes: Replaced static dictionary/list definitions with PEP 562 getattr logic to allow module-level attributes to be generated on-demand.

Recursion Guard: Implemented a global _IS_GENERATING reentrancy lock to prevent infinite recursion loops during attribute lookup, ensuring compatibility with IDE autocompletion and inspection tools.

Metadata Extraction: Updated the tabular registry (MTYPE_REGISTER_TABLE) to 4-element tuples to support lazy extraction of python_dependencies without triggering ModuleNotFoundError for missing soft dependencies like polars.

Contributor Update: Added myself to the .all-contributorsrc file with code and bug badges.

Does your contribution introduce a new dependency? If yes, which one?
No.

What should a reviewer concentrate their feedback on?

The robustness of the _IS_GENERATING guard in handling edge cases during registry assembly.

The expansion of the MTYPE_REGISTER_TABLE tuples to ensure they correctly map to the new dynamic parsing logic.

Did you add any tests for the change?
I verified the changes using the existing datatype test suite. Run pytest skpro/datatypes/tests/ -v showing 107 passed tests. The logic successfully handles environments with and without optional soft dependencies.

Any other comments?
This cleanup provides a more maintainable foundation for the datatypes module and simplifies the internal import structure of the package.

Copilot AI review requested due to automatic review settings March 22, 2026 08:23
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements lazy, programmatic lookup for MTYPE_SOFT_DEPS and SCITYPE_* registries to break circular imports between _registry.py and _check.py, and expands mtype registry tuples to carry soft dependency metadata.

Changes:

  • Expanded MTYPE_REGISTER_* entries from 3-tuples to 4-tuples to include optional python_dependencies.
  • Added PEP 562 __getattr__-based lazy generation + caching for MTYPE_SOFT_DEPS, SCITYPE_REGISTER, and SCITYPE_LIST, with a reentrancy guard.
  • Updated _check.scitype to avoid importing SCITYPE_LIST at module import time.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
skpro/datatypes/_registry.py Adds lazy registry lookup via __getattr__, cache, and reentrancy guard; updates scitype/mtype logic to use dynamic registries.
skpro/datatypes/_check.py Avoids eager import of SCITYPE_LIST; lazily imports it when needed in scitype().
skpro/datatypes/_table/_registry.py Extends Table mtype registry tuples to include soft dependency metadata (e.g., polars).
skpro/datatypes/_proba/_registry.py Extends Proba mtype registry tuples to the new 4-element format.
.all-contributorsrc Adds contributor entry and minor JSON formatting adjustment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread skpro/datatypes/_registry.py Outdated
Comment thread skpro/datatypes/_registry.py
Comment thread skpro/datatypes/_registry.py Outdated
Comment thread skpro/datatypes/_registry.py Outdated
Comment thread skpro/datatypes/_registry.py Outdated
Comment thread skpro/datatypes/_check.py
Comment thread skpro/datatypes/_check.py
@sun-9545sunoj sun-9545sunoj requested a review from Copilot March 22, 2026 08:56
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread skpro/datatypes/_registry.py Outdated
Comment thread skpro/datatypes/_registry.py Outdated
Comment thread skpro/datatypes/_registry.py
Comment thread skpro/datatypes/_registry.py Outdated
@sun-9545sunoj sun-9545sunoj requested a review from Copilot March 24, 2026 05:53
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread skpro/datatypes/_table/_registry.py Outdated
Comment thread skpro/datatypes/_registry.py
Comment thread skpro/datatypes/_registry.py Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread skpro/datatypes/_registry.py
Comment thread skpro/datatypes/_table/_registry.py Outdated
Comment thread skpro/datatypes/_registry.py
Comment thread skpro/datatypes/_registry.py Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +74 to +80
def _get_registry(name):
if name in _CACHE:
return _CACHE[name]

with _REGISTRY_LOCK:
if name in _CACHE:
return _CACHE[name]
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_registry reads _CACHE outside of _REGISTRY_LOCK (lines 75-76). If one thread pre-seeds _CACHE[...] during generation, another thread can observe and return the still-empty placeholders, leading to transient wrong results (e.g., empty SCITYPE_LIST). Consider acquiring _REGISTRY_LOCK before the initial cache-hit check (or using a per-name “generating” sentinel/event) so other threads block until generation is complete while still allowing same-thread re-entrancy via the RLock.

Copilot uses AI. Check for mistakes.
None,
),
("numpy1D", "Table", "1D np.ndarray representation of a univariate table", None),
("numpy2D", "Table", "2D np.ndarray representation of a univariate table", None),
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numpy2D entry description still says "univariate table", but the corresponding checker (TableNp2D) supports multivariate tables (capability:multivariate=True and is_univariate depends on column count). Please update the registry description to avoid misleading users.

Suggested change
("numpy2D", "Table", "2D np.ndarray representation of a univariate table", None),
("numpy2D", "Table", "2D np.ndarray representation of a data table", None),

Copilot uses AI. Check for mistakes.
Comment on lines +82 to +133
if name in ["MTYPE_SOFT_DEPS", "SCITYPE_REGISTER", "SCITYPE_LIST"]:
# Pre-seed _CACHE to prevent infinite recursion during generation
# Re-entrant calls (e.g., from inspect) will receive these references
_CACHE["MTYPE_SOFT_DEPS"] = {}
_CACHE["SCITYPE_REGISTER"] = []
_CACHE["SCITYPE_LIST"] = []

try:
from skpro.datatypes._check import get_check_dict

check_dict = get_check_dict(soft_deps="all")

soft_deps = {}
scitypes = set()

for k, cls in check_dict.items():
if hasattr(cls, "get_class_tag"):
scitype = cls.get_class_tag("scitype")
else:
scitype = k[1]
if scitype is not None:
scitypes.add(scitype)

for mtype_tuple in MTYPE_REGISTER:
mtype = mtype_tuple[0]
scitype = mtype_tuple[1]
scitypes.add(scitype)
if len(mtype_tuple) >= 4 and mtype_tuple[3] is not None:
deps = mtype_tuple[3]
soft_deps[mtype] = (
list(deps) if isinstance(deps, tuple) else deps
)

scitype_register = []
for sci in sorted(scitypes):
desc = _SCITYPE_DESCRIPTIONS.get(sci)
if desc is None:
raise ValueError(
f"scitype '{sci}' is missing a description in "
"`_SCITYPE_DESCRIPTIONS`. Please add it to "
"`skpro.datatypes._registry._SCITYPE_DESCRIPTIONS`."
)
scitype_register.append((sci, desc))

scitype_list = [x[0] for x in scitype_register]

# Update the pre-seeded objects in-place
_CACHE["MTYPE_SOFT_DEPS"].update(soft_deps)
_CACHE["SCITYPE_REGISTER"].extend(scitype_register)
_CACHE["SCITYPE_LIST"].extend(scitype_list)

return _CACHE[name]
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new lazy-generated registries (MTYPE_SOFT_DEPS, SCITYPE_REGISTER, SCITYPE_LIST) and the expanded 4-tuple MTYPE_REGISTER format aren’t covered by a targeted regression test. In particular, there’s no assertion that MTYPE_SOFT_DEPS contains the expected keys/values (e.g., polars mtypes require both polars and pyarrow) and that scitype_to_mtype(..., softdeps='exclude'|'present') filters based on those values. Please add a focused test to lock this behavior down, since subtle registry changes can otherwise silently alter API results.

Copilot uses AI. Check for mistakes.
@sun-9545sunoj
Copy link
Copy Markdown
Author

@copilot apply changes based on the comments in this thread

@sun-9545sunoj sun-9545sunoj deleted the fix/9704-programmatic-lookup branch April 17, 2026 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants