Predict from selected or from all models models (Issues #798 and #423)#805
Predict from selected or from all models models (Issues #798 and #423)#805giovannimicaroni wants to merge 6 commits intomljar:masterfrom
Conversation
| X : array-like, pandas.DataFrame | ||
| Input data to generate predictions for. | ||
|
|
||
| prediction_mode : str, default='best' |
There was a problem hiding this comment.
Thank you for PR! You assumed that user already has a list with loaded models. Do you have example code for such use?
There was a problem hiding this comment.
The use for this case would be something like: predictions = automl.predict(X_test, prediction_mode = 'custom', custom_models=["2_DecisionTree", "3_Linear"]). I was not sure how to change the names for the models, since they are objects from the ModelFramework class, and I didn't want to change it. To minimize that I thought about adding the available models names to the documentation and printing an error that shows their names aswell, in case the user enters invalid names.
| prediction_mode : str, default='best' | ||
| Model selection strategy: | ||
|
|
||
| - 'best': selects the top `n_models` models ranked by performance. |
There was a problem hiding this comment.
I think it is too much options, can options be set based on list of models?
There was a problem hiding this comment.
Sure! I think I can leave the default mode as best, so that it doesnt affect previously working code, and leave the option to select all modes as a prediction mode, but change the custom mode to be used based on the list of models the user provides. What do you think?
| - 'custom': selects only the models explicitly listed in `custom_models`. | ||
| - 'all': uses all trained models. | ||
|
|
||
| n_models : int, default=1 |
There was a problem hiding this comment.
why do we need n_models? can it be set from list of models?
There was a problem hiding this comment.
The idea for n_models was that if a user is using the 'best' mode, but, for example, wants the predictions for the top 3 performing models, n_models could be set to 3, so that the user gets the 3 best predictions. Should I keep it?
There was a problem hiding this comment.
I suggest to keep only list of models ad additional argument and use all models provided in the list.
The first case would be:
# use the best model to compute predictions
preds = automl.predict(X_test)the second case:
# use the provideds models to compute predictions
preds = automl.predict(X_test, models=["model_1", "model_2"])| selected_models = [] | ||
|
|
||
| # Model selection logic | ||
| match prediction_mode: |
There was a problem hiding this comment.
by using match we need Python 3.10+
There was a problem hiding this comment.
True! I'll change the logic to use if's.
| model_list_str = ", ".join(selected_model_names) | ||
|
|
||
| if n_selected == 1: | ||
| print( |
There was a problem hiding this comment.
Yes, the idea of this part of the code is to serve as a explanation for the user regarding what models are being used for prediction, and how the output matrix is structured.
There was a problem hiding this comment.
I suggest to remove prints.
pplonski
left a comment
There was a problem hiding this comment.
Thank you for update, let's create test cases to check how it is working.
|
|
||
|
|
||
| def predict(self, X: Union[List, numpy.ndarray, pandas.DataFrame]) -> numpy.ndarray: | ||
| def predict(self, X: Union[List, numpy.ndarray, pandas.DataFrame], models = []) -> numpy.ndarray: |
There was a problem hiding this comment.
I think that comment with description how the functions is working should be here. Docs are generated based on this comment https://supervised.mljar.com/api/#supervised.automl.AutoML.predict
| if self._ml_task != REGRESSION | ||
| else predictions["prediction"].to_numpy() | ||
| ) | ||
| def _predict(self, X, models=[]): |
There was a problem hiding this comment.
It would be good to add tests for predict. It would be good if we have following test cases:
- no model selected, predict should use the best model,
- models selected and predict should provide predictions for each model,
- automl trained, and then loaded back from hard drive, and here two cases: (1) prediction computed on best model, (2) and predictions computed on selected models
If we are going to provide such functionality to predict then we should at it to predict_proba, predict_all as well.
There was a problem hiding this comment.
Sure. For the predict_all method, does it make sense for it to just call the new predict implementation? since both of them originally just called the base predict method.
This PR is an attempt at solving issues #798 and #423. We added new parameters in the _predict function of the base_automl class and adapted the parts of the code that we found necessary. We did not implement direct unit tests since we did not find unit tests that directly test the predict method. If possible, it would be nice to get a direction in that sense, so that this implementation can be merged. Thanks