[ENH] V1 → V2 API Migration - studies#1610
[ENH] V1 → V2 API Migration - studies#1610rohansen856 wants to merge 277 commits intoopenml:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1610 +/- ##
==========================================
- Coverage 54.67% 54.64% -0.04%
==========================================
Files 63 63
Lines 5108 5124 +16
==========================================
+ Hits 2793 2800 +7
- Misses 2315 2324 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Implementing def _check_fold_timing_evaluations( # noqa: PLR0913
self,
fold_evaluations: dict[str, dict[int, dict[int, float]]],
num_repeats: int,
num_folds: int,
*,
max_time_allowed: float = 60000.0,
task_type: TaskType = TaskType.SUPERVISED_CLASSIFICATION,
check_scores: bool = True,
) -> None:Final function signature: def list( # noqa: PLR0913
self,
limit: int | None = None,
offset: int | None = None,
status: str | None = None,
main_entity_type: str | None = None,
uploader: list[int] | None = None,
benchmark_suite: int | None = None,
) -> Any: |
Signed-off-by: rohansen856 <rohansen856@gmail.com>
geetu040
left a comment
There was a problem hiding this comment.
Good work. Just use the listing as suggested in #1575 (comment) which is already similar to what you have done.
|
@geetu040 I reviewed the specific changes needed and have a slight doubt in the pandas implementation. class StudiesAPI(ResourceAPI, ABC):
@abstractmethod
def list( # noqa: PLR0913
self,
limit: int | None = None,
offset: int | None = None,
status: str | None = None,
main_entity_type: str | None = None,
uploader: list[int] | None = None,
benchmark_suite: int | None = None,
) -> pd.DataFrame: ...and similarly i have to change the return object in xml_string = response.text
# Parse XML and convert to DataFrame
study_dict = xmltodict.parse(xml_string, force_list=("oml:study",))
# Minimalistic check if the XML is useful
assert isinstance(study_dict["oml:study_list"]["oml:study"], list), type(
study_dict["oml:study_list"],
)
assert (
study_dict["oml:study_list"]["@xmlns:oml"] == "http://openml.org/openml"
), study_dict["oml:study_list"]["@xmlns:oml"]
studies = {}
for study_ in study_dict["oml:study_list"]["oml:study"]:
# maps from xml name to a tuple of (dict name, casting fn)
expected_fields = {
"oml:id": ("id", int),
"oml:alias": ("alias", str),
"oml:main_entity_type": ("main_entity_type", str),
"oml:benchmark_suite": ("benchmark_suite", int),
"oml:name": ("name", str),
"oml:status": ("status", str),
"oml:creation_date": ("creation_date", str),
"oml:creator": ("creator", int),
}
study_id = int(study_["oml:id"])
current_study = {}
for oml_field_name, (real_field_name, cast_fn) in expected_fields.items():
if oml_field_name in study_:
current_study[real_field_name] = cast_fn(study_[oml_field_name])
current_study["id"] = int(current_study["id"])
studies[study_id] = current_study
return pd.DataFrame.from_dict(studies, orient="index")A total of 3 files would be affected: Can you please confirm my approach... After that i will update the PR. |
|
@rohansen856 yes sounds right |
Signed-off-by: rohansen856 <rohansen856@gmail.com>
|
Updated! Ready for review. |
geetu040
left a comment
There was a problem hiding this comment.
Almost fine, just complety remove _list_studies as well and replace _list_studies with api_context.backend.studies.list as the parameter for partial in list_studies. Hope I didnot confuse you, just search for the exact method names in code. Let me know if I am not clear enough.
Oh definitely! I prolly missed that in |
…list Signed-off-by: rohansen856 <rohansen856@gmail.com>
geetu040
left a comment
There was a problem hiding this comment.
update with #1576 (comment)
for more information, see https://pre-commit.ci
Signed-off-by: rohansen856 <rohansen856@gmail.com>
Signed-off-by: rohansen856 <rohansen856@gmail.com>
Signed-off-by: rohansen856 <rohansen856@gmail.com>
Signed-off-by: rohansen856 <rohansen856@gmail.com>
|
Please add in description |
…into studies-migration
geetu040
left a comment
There was a problem hiding this comment.
The base PR is merged now, please sync with main.
geetu040
left a comment
There was a problem hiding this comment.
Nicely done overall, I am a bit skeptical about the overuse of mocking, since it adds alot of hardcoded response content, but let's see if Pieter thinks otherwise, when he reviews.
| main_entity_type: str | None = None, | ||
| uploader: list[int] | None = None, | ||
| benchmark_suite: int | None = None, | ||
| ) -> pd.DataFrame: |
There was a problem hiding this comment.
use abstractmethod and I'd say remove the docstring and simply use the placeholder ... inplace of NotImplementedError
tests/test_api/test_study.py
Outdated
|
|
||
| result = study_v1.delete(study_id) | ||
|
|
||
| assert result is True |
There was a problem hiding this comment.
| assert result is True | |
| assert result |
geetu040
left a comment
There was a problem hiding this comment.
also fix the failing tests, you probably just need to fix the fixture, see other test files for reference
Signed-off-by: rohansen856 <rohansen856@gmail.com>
Signed-off-by: rohansen856 <rohansen856@gmail.com>
Signed-off-by: rohansen856 <rohansen856@gmail.com>
geetu040
left a comment
There was a problem hiding this comment.
Thanks for the changes, looks good now.
@PGijsbers please review/merge.
| mock_request.return_value._content = mock_response.encode("utf-8") | ||
|
|
||
| studies_df = study_v1.list(limit=5, offset=0) | ||
|
|
There was a problem hiding this comment.
| mock_request.assert_called_once() |
There was a problem hiding this comment.
Make sure that we check the mocks are used so that we're testing what we think we are testing.
tests/test_api/test_study.py
Outdated
|
|
||
| page1_ids = set(page1["id"]) | ||
| page2_ids = set(page2["id"]) | ||
| assert page1_ids.isdisjoint(page2_ids) |
There was a problem hiding this comment.
I think the real test here is that we make the right request, e.g., that the pagination parameters are set correctly in the URL. We should assert that. The parsed responses are not really under test, and are dictated by the mocked responses anyway.
| f"</oml:upload_study>\n" | ||
| ).encode("utf-8") | ||
|
|
||
| published_id = study_v1.publish("study", files=study_files) |
There was a problem hiding this comment.
note (no action required): as discussed in the call today, the publish method is going to be used internally through from OpenMLBase.publish. Users will not be expected to pass files manually to the function.
Similar work will be done for tag/untag so that the interface for those operations doesn't change.
This is listed as follow up work to be handled in a separate PR.
Just leaving the comment here to clarify the intent and double check my understanding is correct.
Signed-off-by: rohansen856 <rohansen856@gmail.com>
…enml-python into studies-migration
Metadata
Details
Stackend PR, Depends on #1576
This PR adds
Studies v2migration.A question:
Due to the pre commit hook i could not put 6 arguments in a function, so i had to workaround that with this instead:
openml_api\resources\studies.py (line 10-15)
I would like to confirm if this approach is correct or not. Raising a draft PR for now.