Predict from selected or from all models models (Issues #798 and #423) by giovannimicaroni · Pull Request #805 · mljar/mljar-supervised

giovannimicaroni · 2025-12-04T00:25:16Z

This PR is an attempt at solving issues #798 and #423. We added new parameters in the _predict function of the base_automl class and adapted the parts of the code that we found necessary. We did not implement direct unit tests since we did not find unit tests that directly test the predict method. If possible, it would be nice to get a direction in that sense, so that this implementation can be merged. Thanks

…ssage

…ling

pplonski

thank you for PR,

pplonski · 2025-12-05T08:52:55Z

+        X : array-like, pandas.DataFrame
+            Input data to generate predictions for.
+
+        prediction_mode : str, default='best'


Thank you for PR! You assumed that user already has a list with loaded models. Do you have example code for such use?

The use for this case would be something like: predictions = automl.predict(X_test, prediction_mode = 'custom', custom_models=["2_DecisionTree", "3_Linear"]). I was not sure how to change the names for the models, since they are objects from the ModelFramework class, and I didn't want to change it. To minimize that I thought about adding the available models names to the documentation and printing an error that shows their names aswell, in case the user enters invalid names.

pplonski · 2025-12-05T08:54:22Z

+        prediction_mode : str, default='best'
+            Model selection strategy:
+
+            - 'best': selects the top `n_models` models ranked by performance.


I think it is too much options, can options be set based on list of models?

Sure! I think I can leave the default mode as best, so that it doesnt affect previously working code, and leave the option to select all modes as a prediction mode, but change the custom mode to be used based on the list of models the user provides. What do you think?

pplonski · 2025-12-05T08:55:59Z

+            - 'custom': selects only the models explicitly listed in `custom_models`.
+            - 'all': uses all trained models.
+
+        n_models : int, default=1


why do we need n_models? can it be set from list of models?

The idea for n_models was that if a user is using the 'best' mode, but, for example, wants the predictions for the top 3 performing models, n_models could be set to 3, so that the user gets the 3 best predictions. Should I keep it?

I suggest to keep only list of models ad additional argument and use all models provided in the list.

The first case would be:

# use the best model to compute predictions preds = automl.predict(X_test)

the second case:

# use the provideds models to compute predictions preds = automl.predict(X_test, models=["model_1", "model_2"])

pplonski · 2025-12-05T08:56:50Z

+        selected_models = []
+
+        # Model selection logic
+        match prediction_mode:


by using match we need Python 3.10+

True! I'll change the logic to use if's.

pplonski · 2025-12-05T08:58:26Z

+            model_list_str = ", ".join(selected_model_names)
+
+            if n_selected == 1:
+                print(


is it for debug only?

Yes, the idea of this part of the code is to serve as a explanation for the user regarding what models are being used for prediction, and how the output matrix is structured.

I suggest to remove prints.

pplonski

Thank you for update, let's create test cases to check how it is working.

pplonski · 2025-12-09T09:17:19Z



-    def predict(self, X: Union[List, numpy.ndarray, pandas.DataFrame]) -> numpy.ndarray:
+    def predict(self, X: Union[List, numpy.ndarray, pandas.DataFrame], models = []) -> numpy.ndarray:


I think that comment with description how the functions is working should be here. Docs are generated based on this comment https://supervised.mljar.com/api/#supervised.automl.AutoML.predict

pplonski · 2025-12-09T09:20:50Z

-            if self._ml_task != REGRESSION
-            else predictions["prediction"].to_numpy()
-        )
+    def _predict(self, X, models=[]):


It would be good to add tests for predict. It would be good if we have following test cases:

no model selected, predict should use the best model,

models selected and predict should provide predictions for each model,

automl trained, and then loaded back from hard drive, and here two cases: (1) prediction computed on best model, (2) and predictions computed on selected models

If we are going to provide such functionality to predict then we should at it to predict_proba, predict_all as well.

Sure. For the predict_all method, does it make sense for it to just call the new predict implementation? since both of them originally just called the base predict method.

Yes, it makes sense.

giovannimicaroni added 4 commits December 3, 2025 21:11

add option for predictions for all or selected models

cec9a50

add option for predictions for all or selected models. fix comment me…

e63b384

…ssage

add option for predictions for all or selected models. fix error hand…

2219b6b

…ling

remove unnecessary else

366f85f

pplonski requested changes Dec 5, 2025

View reviewed changes

refactor for PR

b0dfadb

giovannimicaroni requested a review from pplonski December 8, 2025 21:07

update docstring

3f9a5a8

pplonski requested changes Dec 9, 2025

View reviewed changes



		def predict(self, X: Union[List, numpy.ndarray, pandas.DataFrame]) -> numpy.ndarray:
		def predict(self, X: Union[List, numpy.ndarray, pandas.DataFrame], models = []) -> numpy.ndarray:

Conversation

giovannimicaroni commented Dec 4, 2025

Uh oh!

pplonski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pplonski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants