Model TuningΒΆ
Bij model tuning gaan we search capaciteiten gebruiken om hyperparameters te zoeken die optimaal performatie geven. Er zijn verschillende frameworks die we hiervoor kunnen gebruiken. Hieronder een voorbeeld met optuna. Waar we tijdens het trainen de parameters zoeken van een ML model zoals de gewichten bij linieare regresssie en de spilts bij een decision tree gaan we hier meerdere modelen gaan trainen met andere begin waarden. bijvoorbeeld hoeveel leafs heeft een decision tree.
"""
Optuna example that optimizes a classifier configuration for Iris dataset using sklearn.
In this example, we optimize a classifier configuration for Iris dataset. Classifiers are from
scikit-learn. We optimize both the choice of classifier (among SVC and RandomForest) and their
hyperparameters.
"""
import optuna
import sklearn.datasets
import sklearn.ensemble
import sklearn.model_selection
import sklearn.svm
# FYI: Objective functions can take additional arguments
# (https://optuna.readthedocs.io/en/stable/faq.html#objective-func-additional-args).
def objective(trial):
iris = sklearn.datasets.load_iris()
x, y = iris.data, iris.target
classifier_name = trial.suggest_categorical("classifier", ["SVC", "RandomForest"])
if classifier_name == "SVC":
svc_c = trial.suggest_float("svc_c", 1e-10, 1e10, log=True)
classifier_obj = sklearn.svm.SVC(C=svc_c, gamma="auto")
else:
rf_max_depth = trial.suggest_int("rf_max_depth", 2, 32, log=True)
classifier_obj = sklearn.ensemble.RandomForestClassifier(
max_depth=rf_max_depth, n_estimators=10
)
score = sklearn.model_selection.cross_val_score(classifier_obj, x, y, n_jobs=-1, cv=3)
accuracy = score.mean()
return accuracy
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100)
study.best_trial
[I 2021-09-16 08:10:06,261] A new study created in memory with name: no-name-d9aba38c-c791-47d9-a666-fb9888c66eba
[I 2021-09-16 08:10:07,041] Trial 0 finished with value: 0.96 and parameters: {'classifier': 'SVC', 'svc_c': 14487247.93005794}. Best is trial 0 with value: 0.96.
[I 2021-09-16 08:10:07,110] Trial 1 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 2}. Best is trial 0 with value: 0.96.
[I 2021-09-16 08:10:07,123] Trial 2 finished with value: 0.96 and parameters: {'classifier': 'SVC', 'svc_c': 3583528790.7717648}. Best is trial 0 with value: 0.96.
[I 2021-09-16 08:10:07,134] Trial 3 finished with value: 0.32 and parameters: {'classifier': 'SVC', 'svc_c': 0.01374049215957742}. Best is trial 0 with value: 0.96.
[I 2021-09-16 08:10:07,173] Trial 4 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 4}. Best is trial 0 with value: 0.96.
[I 2021-09-16 08:10:07,185] Trial 5 finished with value: 0.32 and parameters: {'classifier': 'SVC', 'svc_c': 3.2958652818826705e-05}. Best is trial 0 with value: 0.96.
[I 2021-09-16 08:10:07,197] Trial 6 finished with value: 0.84 and parameters: {'classifier': 'SVC', 'svc_c': 0.0388890805913242}. Best is trial 0 with value: 0.96.
[I 2021-09-16 08:10:07,209] Trial 7 finished with value: 0.96 and parameters: {'classifier': 'SVC', 'svc_c': 0.21357522077535437}. Best is trial 0 with value: 0.96.
[I 2021-09-16 08:10:07,321] Trial 8 finished with value: 0.96 and parameters: {'classifier': 'SVC', 'svc_c': 3257546634.4158716}. Best is trial 0 with value: 0.96.
[I 2021-09-16 08:10:07,362] Trial 9 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 17}. Best is trial 0 with value: 0.96.
[I 2021-09-16 08:10:07,379] Trial 10 finished with value: 0.96 and parameters: {'classifier': 'SVC', 'svc_c': 17244.09094347225}. Best is trial 0 with value: 0.96.
[I 2021-09-16 08:10:07,396] Trial 11 finished with value: 0.96 and parameters: {'classifier': 'SVC', 'svc_c': 8267320975.982889}. Best is trial 0 with value: 0.96.
[I 2021-09-16 08:10:07,411] Trial 12 finished with value: 0.96 and parameters: {'classifier': 'SVC', 'svc_c': 318770.4637190416}. Best is trial 0 with value: 0.96.
[I 2021-09-16 08:10:07,425] Trial 13 finished with value: 0.96 and parameters: {'classifier': 'SVC', 'svc_c': 165014.34293923172}. Best is trial 0 with value: 0.96.
[I 2021-09-16 08:10:07,440] Trial 14 finished with value: 0.32 and parameters: {'classifier': 'SVC', 'svc_c': 5.019654334487846e-10}. Best is trial 0 with value: 0.96.
[I 2021-09-16 08:10:07,486] Trial 15 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 29}. Best is trial 15 with value: 0.9666666666666667.
[I 2021-09-16 08:10:07,532] Trial 16 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 30}. Best is trial 15 with value: 0.9666666666666667.
[I 2021-09-16 08:10:07,578] Trial 17 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 10}. Best is trial 15 with value: 0.9666666666666667.
[I 2021-09-16 08:10:07,625] Trial 18 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 27}. Best is trial 15 with value: 0.9666666666666667.
[I 2021-09-16 08:10:07,670] Trial 19 finished with value: 0.9733333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 7}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:07,715] Trial 20 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 6}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:07,759] Trial 21 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 11}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:07,806] Trial 22 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 3}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:07,852] Trial 23 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 16}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:07,899] Trial 24 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 6}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:07,945] Trial 25 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 2}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:07,990] Trial 26 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 18}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,036] Trial 27 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 9}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,085] Trial 28 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 4}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,130] Trial 29 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 23}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,177] Trial 30 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 13}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,222] Trial 31 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 7}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,275] Trial 32 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 7}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,321] Trial 33 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 13}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,368] Trial 34 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 8}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,383] Trial 35 finished with value: 0.9466666666666667 and parameters: {'classifier': 'SVC', 'svc_c': 398.29566574264686}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,428] Trial 36 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 23}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,473] Trial 37 finished with value: 0.94 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 22}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,519] Trial 38 finished with value: 0.94 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 32}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,571] Trial 39 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 13}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,624] Trial 40 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 23}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,641] Trial 41 finished with value: 0.32 and parameters: {'classifier': 'SVC', 'svc_c': 3.838389826638352e-06}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,658] Trial 42 finished with value: 0.9466666666666667 and parameters: {'classifier': 'SVC', 'svc_c': 433.18881122499346}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,703] Trial 43 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 9}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,720] Trial 44 finished with value: 0.96 and parameters: {'classifier': 'SVC', 'svc_c': 42277094.49734071}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,765] Trial 45 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 14}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,811] Trial 46 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 20}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,858] Trial 47 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 20}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,905] Trial 48 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 25}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,952] Trial 49 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 19}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:08,998] Trial 50 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 16}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:09,054] Trial 51 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 19}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:09,102] Trial 52 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 28}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:09,150] Trial 53 finished with value: 0.94 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 26}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:09,197] Trial 54 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 32}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:09,253] Trial 55 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 20}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:09,300] Trial 56 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 15}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:09,352] Trial 57 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 16}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:09,398] Trial 58 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 11}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:09,443] Trial 59 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 26}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:09,489] Trial 60 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 5}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:09,536] Trial 61 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 29}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:09,582] Trial 62 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 14}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:09,627] Trial 63 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 21}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:09,675] Trial 64 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 28}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:09,722] Trial 65 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 18}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:09,768] Trial 66 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 18}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:09,813] Trial 67 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 3}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:09,866] Trial 68 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 12}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:09,912] Trial 69 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 17}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:09,958] Trial 70 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 15}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:10,005] Trial 71 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 23}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:10,051] Trial 72 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 23}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:10,099] Trial 73 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 24}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:10,155] Trial 74 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 29}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:10,207] Trial 75 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 21}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:10,259] Trial 76 finished with value: 0.94 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 12}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:10,306] Trial 77 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 25}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:10,360] Trial 78 finished with value: 0.94 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 21}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:10,377] Trial 79 finished with value: 0.32 and parameters: {'classifier': 'SVC', 'svc_c': 1.9599998329780468e-10}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:10,423] Trial 80 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 5}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:10,470] Trial 81 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 5}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:10,516] Trial 82 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 5}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:10,565] Trial 83 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 6}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:10,610] Trial 84 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 4}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:10,664] Trial 85 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 18}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:10,709] Trial 86 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 3}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:10,754] Trial 87 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 9}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:10,801] Trial 88 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 19}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:10,818] Trial 89 finished with value: 0.32 and parameters: {'classifier': 'SVC', 'svc_c': 1.6448891749672307e-07}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:10,872] Trial 90 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 31}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:10,918] Trial 91 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 5}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:10,963] Trial 92 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 22}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:11,009] Trial 93 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 27}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:11,056] Trial 94 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 8}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:11,102] Trial 95 finished with value: 0.94 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 2}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:11,158] Trial 96 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 4}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:11,210] Trial 97 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 6}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:11,256] Trial 98 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 3}. Best is trial 19 with value: 0.9733333333333333.
[I 2021-09-16 08:10:11,301] Trial 99 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 15}. Best is trial 19 with value: 0.9733333333333333.
FrozenTrial(number=19, values=[0.9733333333333333], datetime_start=datetime.datetime(2021, 9, 16, 8, 10, 7, 626788), datetime_complete=datetime.datetime(2021, 9, 16, 8, 10, 7, 670289), params={'classifier': 'RandomForest', 'rf_max_depth': 7}, distributions={'classifier': CategoricalDistribution(choices=('SVC', 'RandomForest')), 'rf_max_depth': IntLogUniformDistribution(high=32, low=2, step=1)}, user_attrs={}, system_attrs={}, intermediate_values={}, trial_id=19, state=TrialState.COMPLETE, value=None)
Pycaret voorziet ook een tuning stap die gebaseerd is op een grid search.
from pycaret.datasets import get_data
diabetes = get_data('diabetes')
# Importing module and initializing setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')
# train a decision tree model
dt = create_model('dt')
# tune hyperparameters with custom_grid
params = {"max_depth": np.random.randint(1, (len(data.columns)*.85),20),
"max_features": np.random.randint(1, len(data.columns),20),
"min_samples_leaf": [2,3,4,5,6],
"criterion": ["gini", "entropy"]
}
tuned_dt_custom = tune_model(dt, custom_grid = params)
Number of times pregnant | Plasma glucose concentration a 2 hours in an oral glucose tolerance test | Diastolic blood pressure (mm Hg) | Triceps skin fold thickness (mm) | 2-Hour serum insulin (mu U/ml) | Body mass index (weight in kg/(height in m)^2) | Diabetes pedigree function | Age (years) | Class variable | |
---|---|---|---|---|---|---|---|---|---|
0 | 6 | 148 | 72 | 35 | 0 | 33.6 | 0.627 | 50 | 1 |
1 | 1 | 85 | 66 | 29 | 0 | 26.6 | 0.351 | 31 | 0 |
2 | 8 | 183 | 64 | 0 | 0 | 23.3 | 0.672 | 32 | 1 |
3 | 1 | 89 | 66 | 23 | 94 | 28.1 | 0.167 | 21 | 0 |
4 | 0 | 137 | 40 | 35 | 168 | 43.1 | 2.288 | 33 | 1 |
Initiated | . . . . . . . . . . . . . . . . . . | 08:10:12 |
---|---|---|
Status | . . . . . . . . . . . . . . . . . . | Preprocessing Data |
Data Type | |
---|---|
Number of times pregnant | Categorical |
Plasma glucose concentration a 2 hours in an oral glucose tolerance test | Numeric |
Diastolic blood pressure (mm Hg) | Numeric |
Triceps skin fold thickness (mm) | Numeric |
2-Hour serum insulin (mu U/ml) | Numeric |
Body mass index (weight in kg/(height in m)^2) | Numeric |
Diabetes pedigree function | Numeric |
Age (years) | Numeric |
Class variable | Label |
---------------------------------------------------------------------------
StdinNotImplementedError Traceback (most recent call last)
/tmp/ipykernel_4692/563638542.py in <module>
3 # Importing module and initializing setup
4 from pycaret.classification import *
----> 5 clf1 = setup(data = diabetes, target = 'Class variable')
6 # train a decision tree model
7 dt = create_model('dt')
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/pycaret/classification.py in setup(data, target, train_size, test_data, preprocess, imputation_type, iterative_imputation_iters, categorical_features, categorical_imputation, categorical_iterative_imputer, ordinal_features, high_cardinality_features, high_cardinality_method, numeric_features, numeric_imputation, numeric_iterative_imputer, date_features, ignore_features, normalize, normalize_method, transformation, transformation_method, handle_unknown_categorical, unknown_categorical_method, pca, pca_method, pca_components, ignore_low_variance, combine_rare_levels, rare_level_threshold, bin_numeric_features, remove_outliers, outliers_threshold, remove_multicollinearity, multicollinearity_threshold, remove_perfect_collinearity, create_clusters, cluster_iter, polynomial_features, polynomial_degree, trigonometry_features, polynomial_threshold, group_features, group_names, feature_selection, feature_selection_threshold, feature_selection_method, feature_interaction, feature_ratio, interaction_threshold, fix_imbalance, fix_imbalance_method, data_split_shuffle, data_split_stratify, fold_strategy, fold, fold_shuffle, fold_groups, n_jobs, use_gpu, custom_pipeline, html, session_id, log_experiment, experiment_name, log_plots, log_profile, log_data, silent, verbose, profile, profile_kwargs)
653 verbose=verbose,
654 profile=profile,
--> 655 profile_kwargs=profile_kwargs,
656 )
657
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/pycaret/internal/tabular.py in setup(data, target, ml_usecase, available_plots, train_size, test_data, preprocess, imputation_type, iterative_imputation_iters, categorical_features, categorical_imputation, categorical_iterative_imputer, ordinal_features, high_cardinality_features, high_cardinality_method, numeric_features, numeric_imputation, numeric_iterative_imputer, date_features, ignore_features, normalize, normalize_method, transformation, transformation_method, handle_unknown_categorical, unknown_categorical_method, pca, pca_method, pca_components, ignore_low_variance, combine_rare_levels, rare_level_threshold, bin_numeric_features, remove_outliers, outliers_threshold, remove_multicollinearity, multicollinearity_threshold, remove_perfect_collinearity, create_clusters, cluster_iter, polynomial_features, polynomial_degree, trigonometry_features, polynomial_threshold, group_features, group_names, feature_selection, feature_selection_threshold, feature_selection_method, feature_interaction, feature_ratio, interaction_threshold, fix_imbalance, fix_imbalance_method, transform_target, transform_target_method, data_split_shuffle, data_split_stratify, fold_strategy, fold, fold_shuffle, fold_groups, n_jobs, use_gpu, custom_pipeline, html, session_id, log_experiment, experiment_name, log_plots, log_profile, log_data, silent, verbose, profile, profile_kwargs, display)
1328 test_data = pd.concat([X_test, y_test], axis=1)
1329
-> 1330 train_data = prep_pipe.fit_transform(train_data)
1331 # workaround to also transform target
1332 dtypes.final_training_columns.append(target)
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/sklearn/pipeline.py in fit_transform(self, X, y, **fit_params)
365 """
366 fit_params_steps = self._check_fit_params(**fit_params)
--> 367 Xt = self._fit(X, y, **fit_params_steps)
368
369 last_step = self._final_estimator
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/sklearn/pipeline.py in _fit(self, X, y, **fit_params_steps)
294 message_clsname='Pipeline',
295 message=self._log_message(step_idx),
--> 296 **fit_params_steps[name])
297 # Replace the transformer of the step with the fitted
298 # transformer. This is necessary when loading the transformer
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/joblib/memory.py in __call__(self, *args, **kwargs)
350
351 def __call__(self, *args, **kwargs):
--> 352 return self.func(*args, **kwargs)
353
354 def call_and_shelve(self, *args, **kwargs):
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/sklearn/pipeline.py in _fit_transform_one(transformer, X, y, weight, message_clsname, message, **fit_params)
738 with _print_elapsed_time(message_clsname, message):
739 if hasattr(transformer, 'fit_transform'):
--> 740 res = transformer.fit_transform(X, y, **fit_params)
741 else:
742 res = transformer.fit(X, y, **fit_params).transform(X)
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/pycaret/internal/preprocess.py in fit_transform(self, dataset, y)
419
420 # since this is for training , we dont nees any transformation since it has already been transformed in fit
--> 421 data = self.fit(data)
422
423 # additionally we just need to treat the target variable
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/pycaret/internal/preprocess.py in fit(self, dataset, y)
321
322 display(dt_print_out[["Data Type"]])
--> 323 self.response = input()
324
325 if self.response in [
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/ipykernel/kernelbase.py in raw_input(self, prompt)
1002 if not self._allow_stdin:
1003 raise StdinNotImplementedError(
-> 1004 "raw_input was called, but this frontend does not support input requests."
1005 )
1006 return self._input_request(
StdinNotImplementedError: raw_input was called, but this frontend does not support input requests.