How Do I Change - Using For Loops To Call Multiple Functions - Into - Using A Pipeline To Call A Class?
Solution 1:
You can consider using map(), details here: https://www.geeksforgeeks.org/python-map-function/
Some programmers have the habit of avoiding raw loops - "A raw loop is any loop inside a function where the function serves purpose larger than the algorithm implemented by the loop". More details here: https://sean-parent.stlab.cc/presentations/2013-09-11-cpp-seasoning/cpp-seasoning.pdf
I think that's the reason you are asked to remove for loop.
Solution 2:
I have implemented a working solution. I should have worded my question better. I initially misunderstood how GridsearchCV
or RandomizedSearchCV
works internally. cv_results_
gives all the results of the grid available. I thought only the best estimator
was available to us.
Using this, for each type of model, I took the max rank_test_score
, and got the parameters making up the model. In this example, it is 4 models. Now I ran each of those models, i.e. the best combination of parameters for each model, with my test data, and predicted the required scores. I think this solution can be extended to RandomizedSearchCV
and a lot more other options.
NOTE: This is just a trivial solution. Lot of modifications necessary, like needing to scale data for specific models, etc. This solution will just serve as a starting point which can be modified according to the user's needs.
Credits to this answer for the ClfSwitcher() class
.
Following is the implementation of the class (suggestions to improve are welcomed).
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import f1_score, roc_auc_score, recall_score, precision_score
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.base import BaseEstimator
import warnings
warnings.filterwarnings('ignore')
cancer = datasets.load_breast_cancer()
df = pd.DataFrame(cancer.data, columns=cancer.feature_names)
df['target'] = cancer.target
target = df['target']
X_train, X_test, y_train, y_test = train_test_split(df.drop(columns='target', axis=1), target, test_size=0.4, random_state=13, stratify=target)
classClfSwitcher(BaseEstimator):
def__init__(self, model=RandomForestClassifier()):
"""
A Custom BaseEstimator that can switch between classifiers.
:param estimator: sklearn object - The classifier
"""
self.model = model
deffit(self, X, y=None, **kwargs):
self.model.fit(X, y)
return self
defpredict(self, X, y=None):
return self.model.predict(X)
defpredict_proba(self, X):
return self.model.predict_proba(X)
defscore(self, X, y):
return self.estimator.score(X, y)
classreport(ClfSwitcher):
def__init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.grid = None
self.full_report = None
self.concise_report = None
self.scoring_metrics = {
'precision': precision_score,
'recall': recall_score,
'f1': f1_score,
'roc_auc': roc_auc_score
}
defgriddy(self, pipeLine, parameters, **kwargs):
self.grid = GridSearchCV(pipeLine, parameters, scoring='accuracy', n_jobs=-1)
deffit_grid(self, X_train, y_train=None, **kwargs):
self.grid.fit(X_train, y_train)
defmake_grid_report(self):
self.full_report = pd.DataFrame(self.grid.cv_results_)
@staticmethoddefget_names(col):
return col.__class__.__name__
@staticmethoddefcalc_score(col, metric):
returnround(metric(y_test, col.fit(X_train, y_train).predict(X_test)), 4)
defmake_concise_report(self):
self.concise_report = pd.DataFrame(self.grid.cv_results_)
self.concise_report['model_names'] = self.concise_report['param_cst__model'].apply(self.get_names)
self.concise_report = self.concise_report.sort_values(['model_names', 'rank_test_score'], ascending=[True, False]) \
.groupby(['model_names']).head(1)[['param_cst__model', 'model_names']] \
.reset_index(drop=True)
for metric_name, metric_func in self.scoring_metrics.items():
self.concise_report[metric_name] = self.concise_report['param_cst__model'].apply(self.calc_score, metric=metric_func)
self.concise_report = self.concise_report[['model_names', 'precision', 'recall', 'f1', 'roc_auc', 'param_cst__model']]
pipeline = Pipeline([
('cst', ClfSwitcher()),
])
parameters = [
{
'cst__model': [RandomForestClassifier()],
'cst__model__n_estimators': [10, 20],
'cst__model__max_depth': [5, 10],
'cst__model__criterion': ['gini', 'entropy']
},
{
'cst__model': [SVC()],
'cst__model__C': [10, 20],
'cst__model__kernel': ['linear'],
'cst__model__gamma': [0.0001, 0.001]
},
{
'cst__model': [LogisticRegression()],
'cst__model__C': [13, 17],
'cst__model__penalty': ['l1', 'l2']
},
{
'cst__model': [GradientBoostingClassifier()],
'cst__model__n_estimators': [10, 50],
'cst__model__max_depth': [3, 5],
'cst__model__min_samples_leaf': [1, 2]
}
]
my_report = report()
my_report.griddy(pipeline, parameters, scoring='f1')
my_report.fit_grid(X_train, y_train)
my_report.make_concise_report()
my_report.concise_report
Output Report as desired.
Post a Comment for "How Do I Change - Using For Loops To Call Multiple Functions - Into - Using A Pipeline To Call A Class?"