Model Tuning¶

Bij model tuning gaan we search capaciteiten gebruiken om hyperparameters te zoeken die optimaal performatie geven. Er zijn verschillende frameworks die we hiervoor kunnen gebruiken. Hieronder een voorbeeld met optuna. Waar we tijdens het trainen de parameters zoeken van een ML model zoals de gewichten bij linieare regresssie en de spilts bij een decision tree gaan we hier meerdere modelen gaan trainen met andere begin waarden. bijvoorbeeld hoeveel leafs heeft een decision tree.

"""
Optuna example that optimizes a classifier configuration for Iris dataset using sklearn.
In this example, we optimize a classifier configuration for Iris dataset. Classifiers are from
scikit-learn. We optimize both the choice of classifier (among SVC and RandomForest) and their
hyperparameters.
"""

import optuna

import sklearn.datasets
import sklearn.ensemble
import sklearn.model_selection
import sklearn.svm


# FYI: Objective functions can take additional arguments
# (https://optuna.readthedocs.io/en/stable/faq.html#objective-func-additional-args).
def objective(trial):
    iris = sklearn.datasets.load_iris()
    x, y = iris.data, iris.target

    classifier_name = trial.suggest_categorical("classifier", ["SVC", "RandomForest"])
    if classifier_name == "SVC":
        svc_c = trial.suggest_float("svc_c", 1e-10, 1e10, log=True)
        classifier_obj = sklearn.svm.SVC(C=svc_c, gamma="auto")
    else:
        rf_max_depth = trial.suggest_int("rf_max_depth", 2, 32, log=True)
        classifier_obj = sklearn.ensemble.RandomForestClassifier(
            max_depth=rf_max_depth, n_estimators=10
        )

    score = sklearn.model_selection.cross_val_score(classifier_obj, x, y, n_jobs=-1, cv=3)
    accuracy = score.mean()
    return accuracy


study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100)
study.best_trial

[I 2021-09-16 08:10:06,261] A new study created in memory with name: no-name-d9aba38c-c791-47d9-a666-fb9888c66eba

[I 2021-09-16 08:10:07,041] Trial 0 finished with value: 0.96 and parameters: {'classifier': 'SVC', 'svc_c': 14487247.93005794}. Best is trial 0 with value: 0.96.

[I 2021-09-16 08:10:07,110] Trial 1 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 2}. Best is trial 0 with value: 0.96.

[I 2021-09-16 08:10:07,123] Trial 2 finished with value: 0.96 and parameters: {'classifier': 'SVC', 'svc_c': 3583528790.7717648}. Best is trial 0 with value: 0.96.

[I 2021-09-16 08:10:07,134] Trial 3 finished with value: 0.32 and parameters: {'classifier': 'SVC', 'svc_c': 0.01374049215957742}. Best is trial 0 with value: 0.96.

[I 2021-09-16 08:10:07,173] Trial 4 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 4}. Best is trial 0 with value: 0.96.

[I 2021-09-16 08:10:07,185] Trial 5 finished with value: 0.32 and parameters: {'classifier': 'SVC', 'svc_c': 3.2958652818826705e-05}. Best is trial 0 with value: 0.96.

[I 2021-09-16 08:10:07,197] Trial 6 finished with value: 0.84 and parameters: {'classifier': 'SVC', 'svc_c': 0.0388890805913242}. Best is trial 0 with value: 0.96.

[I 2021-09-16 08:10:07,209] Trial 7 finished with value: 0.96 and parameters: {'classifier': 'SVC', 'svc_c': 0.21357522077535437}. Best is trial 0 with value: 0.96.

[I 2021-09-16 08:10:07,321] Trial 8 finished with value: 0.96 and parameters: {'classifier': 'SVC', 'svc_c': 3257546634.4158716}. Best is trial 0 with value: 0.96.

[I 2021-09-16 08:10:07,362] Trial 9 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 17}. Best is trial 0 with value: 0.96.

[I 2021-09-16 08:10:07,379] Trial 10 finished with value: 0.96 and parameters: {'classifier': 'SVC', 'svc_c': 17244.09094347225}. Best is trial 0 with value: 0.96.

[I 2021-09-16 08:10:07,396] Trial 11 finished with value: 0.96 and parameters: {'classifier': 'SVC', 'svc_c': 8267320975.982889}. Best is trial 0 with value: 0.96.

[I 2021-09-16 08:10:07,411] Trial 12 finished with value: 0.96 and parameters: {'classifier': 'SVC', 'svc_c': 318770.4637190416}. Best is trial 0 with value: 0.96.

[I 2021-09-16 08:10:07,425] Trial 13 finished with value: 0.96 and parameters: {'classifier': 'SVC', 'svc_c': 165014.34293923172}. Best is trial 0 with value: 0.96.

[I 2021-09-16 08:10:07,440] Trial 14 finished with value: 0.32 and parameters: {'classifier': 'SVC', 'svc_c': 5.019654334487846e-10}. Best is trial 0 with value: 0.96.

[I 2021-09-16 08:10:07,486] Trial 15 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 29}. Best is trial 15 with value: 0.9666666666666667.

[I 2021-09-16 08:10:07,532] Trial 16 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 30}. Best is trial 15 with value: 0.9666666666666667.

[I 2021-09-16 08:10:07,578] Trial 17 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 10}. Best is trial 15 with value: 0.9666666666666667.

[I 2021-09-16 08:10:07,625] Trial 18 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 27}. Best is trial 15 with value: 0.9666666666666667.

[I 2021-09-16 08:10:07,670] Trial 19 finished with value: 0.9733333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 7}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:07,715] Trial 20 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 6}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:07,759] Trial 21 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 11}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:07,806] Trial 22 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 3}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:07,852] Trial 23 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 16}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:07,899] Trial 24 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 6}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:07,945] Trial 25 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 2}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:07,990] Trial 26 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 18}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,036] Trial 27 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 9}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,085] Trial 28 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 4}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,130] Trial 29 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 23}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,177] Trial 30 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 13}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,222] Trial 31 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 7}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,275] Trial 32 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 7}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,321] Trial 33 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 13}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,368] Trial 34 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 8}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,383] Trial 35 finished with value: 0.9466666666666667 and parameters: {'classifier': 'SVC', 'svc_c': 398.29566574264686}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,428] Trial 36 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 23}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,473] Trial 37 finished with value: 0.94 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 22}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,519] Trial 38 finished with value: 0.94 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 32}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,571] Trial 39 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 13}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,624] Trial 40 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 23}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,641] Trial 41 finished with value: 0.32 and parameters: {'classifier': 'SVC', 'svc_c': 3.838389826638352e-06}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,658] Trial 42 finished with value: 0.9466666666666667 and parameters: {'classifier': 'SVC', 'svc_c': 433.18881122499346}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,703] Trial 43 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 9}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,720] Trial 44 finished with value: 0.96 and parameters: {'classifier': 'SVC', 'svc_c': 42277094.49734071}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,765] Trial 45 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 14}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,811] Trial 46 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 20}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,858] Trial 47 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 20}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,905] Trial 48 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 25}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,952] Trial 49 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 19}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:08,998] Trial 50 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 16}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:09,054] Trial 51 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 19}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:09,102] Trial 52 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 28}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:09,150] Trial 53 finished with value: 0.94 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 26}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:09,197] Trial 54 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 32}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:09,253] Trial 55 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 20}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:09,300] Trial 56 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 15}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:09,352] Trial 57 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 16}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:09,398] Trial 58 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 11}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:09,443] Trial 59 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 26}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:09,489] Trial 60 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 5}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:09,536] Trial 61 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 29}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:09,582] Trial 62 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 14}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:09,627] Trial 63 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 21}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:09,675] Trial 64 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 28}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:09,722] Trial 65 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 18}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:09,768] Trial 66 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 18}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:09,813] Trial 67 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 3}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:09,866] Trial 68 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 12}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:09,912] Trial 69 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 17}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:09,958] Trial 70 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 15}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:10,005] Trial 71 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 23}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:10,051] Trial 72 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 23}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:10,099] Trial 73 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 24}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:10,155] Trial 74 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 29}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:10,207] Trial 75 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 21}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:10,259] Trial 76 finished with value: 0.94 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 12}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:10,306] Trial 77 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 25}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:10,360] Trial 78 finished with value: 0.94 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 21}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:10,377] Trial 79 finished with value: 0.32 and parameters: {'classifier': 'SVC', 'svc_c': 1.9599998329780468e-10}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:10,423] Trial 80 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 5}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:10,470] Trial 81 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 5}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:10,516] Trial 82 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 5}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:10,565] Trial 83 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 6}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:10,610] Trial 84 finished with value: 0.9666666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 4}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:10,664] Trial 85 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 18}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:10,709] Trial 86 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 3}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:10,754] Trial 87 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 9}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:10,801] Trial 88 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 19}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:10,818] Trial 89 finished with value: 0.32 and parameters: {'classifier': 'SVC', 'svc_c': 1.6448891749672307e-07}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:10,872] Trial 90 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 31}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:10,918] Trial 91 finished with value: 0.96 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 5}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:10,963] Trial 92 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 22}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:11,009] Trial 93 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 27}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:11,056] Trial 94 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 8}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:11,102] Trial 95 finished with value: 0.94 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 2}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:11,158] Trial 96 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 4}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:11,210] Trial 97 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 6}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:11,256] Trial 98 finished with value: 0.9533333333333333 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 3}. Best is trial 19 with value: 0.9733333333333333.

[I 2021-09-16 08:10:11,301] Trial 99 finished with value: 0.9466666666666667 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 15}. Best is trial 19 with value: 0.9733333333333333.

FrozenTrial(number=19, values=[0.9733333333333333], datetime_start=datetime.datetime(2021, 9, 16, 8, 10, 7, 626788), datetime_complete=datetime.datetime(2021, 9, 16, 8, 10, 7, 670289), params={'classifier': 'RandomForest', 'rf_max_depth': 7}, distributions={'classifier': CategoricalDistribution(choices=('SVC', 'RandomForest')), 'rf_max_depth': IntLogUniformDistribution(high=32, low=2, step=1)}, user_attrs={}, system_attrs={}, intermediate_values={}, trial_id=19, state=TrialState.COMPLETE, value=None)

Pycaret voorziet ook een tuning stap die gebaseerd is op een grid search.

from pycaret.datasets import get_data 
diabetes = get_data('diabetes') 
# Importing module and initializing setup 
from pycaret.classification import * 
clf1 = setup(data = diabetes, target = 'Class variable')
# train a decision tree model
dt = create_model('dt')
# tune hyperparameters with custom_grid
params = {"max_depth": np.random.randint(1, (len(data.columns)*.85),20),
          "max_features": np.random.randint(1, len(data.columns),20),
          "min_samples_leaf": [2,3,4,5,6],
          "criterion": ["gini", "entropy"]
          }
tuned_dt_custom = tune_model(dt, custom_grid = params)

	Number of times pregnant	Plasma glucose concentration a 2 hours in an oral glucose tolerance test	Diastolic blood pressure (mm Hg)	Triceps skin fold thickness (mm)	2-Hour serum insulin (mu U/ml)	Body mass index (weight in kg/(height in m)^2)	Diabetes pedigree function	Age (years)	Class variable
0	6	148	72	35	0	33.6	0.627	50	1
1	1	85	66	29	0	26.6	0.351	31	0
2	8	183	64	0	0	23.3	0.672	32	1
3	1	89	66	23	94	28.1	0.167	21	0
4	0	137	40	35	168	43.1	2.288	33	1



Initiated	. . . . . . . . . . . . . . . . . .	08:10:12
Status	. . . . . . . . . . . . . . . . . .	Preprocessing Data

	Data Type
Number of times pregnant	Categorical
Plasma glucose concentration a 2 hours in an oral glucose tolerance test	Numeric
Diastolic blood pressure (mm Hg)	Numeric
Triceps skin fold thickness (mm)	Numeric
2-Hour serum insulin (mu U/ml)	Numeric
Body mass index (weight in kg/(height in m)^2)	Numeric
Diabetes pedigree function	Numeric
Age (years)	Numeric
Class variable	Label

---------------------------------------------------------------------------
StdinNotImplementedError                  Traceback (most recent call last)
/tmp/ipykernel_4692/563638542.py in <module>
      3 # Importing module and initializing setup
      4 from pycaret.classification import *
----> 5 clf1 = setup(data = diabetes, target = 'Class variable')
      6 # train a decision tree model
      7 dt = create_model('dt')

/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/pycaret/classification.py in setup(data, target, train_size, test_data, preprocess, imputation_type, iterative_imputation_iters, categorical_features, categorical_imputation, categorical_iterative_imputer, ordinal_features, high_cardinality_features, high_cardinality_method, numeric_features, numeric_imputation, numeric_iterative_imputer, date_features, ignore_features, normalize, normalize_method, transformation, transformation_method, handle_unknown_categorical, unknown_categorical_method, pca, pca_method, pca_components, ignore_low_variance, combine_rare_levels, rare_level_threshold, bin_numeric_features, remove_outliers, outliers_threshold, remove_multicollinearity, multicollinearity_threshold, remove_perfect_collinearity, create_clusters, cluster_iter, polynomial_features, polynomial_degree, trigonometry_features, polynomial_threshold, group_features, group_names, feature_selection, feature_selection_threshold, feature_selection_method, feature_interaction, feature_ratio, interaction_threshold, fix_imbalance, fix_imbalance_method, data_split_shuffle, data_split_stratify, fold_strategy, fold, fold_shuffle, fold_groups, n_jobs, use_gpu, custom_pipeline, html, session_id, log_experiment, experiment_name, log_plots, log_profile, log_data, silent, verbose, profile, profile_kwargs)
    653         verbose=verbose,
    654         profile=profile,
--> 655         profile_kwargs=profile_kwargs,
    656     )
    657 

/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/pycaret/internal/tabular.py in setup(data, target, ml_usecase, available_plots, train_size, test_data, preprocess, imputation_type, iterative_imputation_iters, categorical_features, categorical_imputation, categorical_iterative_imputer, ordinal_features, high_cardinality_features, high_cardinality_method, numeric_features, numeric_imputation, numeric_iterative_imputer, date_features, ignore_features, normalize, normalize_method, transformation, transformation_method, handle_unknown_categorical, unknown_categorical_method, pca, pca_method, pca_components, ignore_low_variance, combine_rare_levels, rare_level_threshold, bin_numeric_features, remove_outliers, outliers_threshold, remove_multicollinearity, multicollinearity_threshold, remove_perfect_collinearity, create_clusters, cluster_iter, polynomial_features, polynomial_degree, trigonometry_features, polynomial_threshold, group_features, group_names, feature_selection, feature_selection_threshold, feature_selection_method, feature_interaction, feature_ratio, interaction_threshold, fix_imbalance, fix_imbalance_method, transform_target, transform_target_method, data_split_shuffle, data_split_stratify, fold_strategy, fold, fold_shuffle, fold_groups, n_jobs, use_gpu, custom_pipeline, html, session_id, log_experiment, experiment_name, log_plots, log_profile, log_data, silent, verbose, profile, profile_kwargs, display)
   1328             test_data = pd.concat([X_test, y_test], axis=1)
   1329 
-> 1330         train_data = prep_pipe.fit_transform(train_data)
   1331         # workaround to also transform target
   1332         dtypes.final_training_columns.append(target)

/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/sklearn/pipeline.py in fit_transform(self, X, y, **fit_params)
    365         """
    366         fit_params_steps = self._check_fit_params(**fit_params)
--> 367         Xt = self._fit(X, y, **fit_params_steps)
    368 
    369         last_step = self._final_estimator

/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/sklearn/pipeline.py in _fit(self, X, y, **fit_params_steps)
    294                 message_clsname='Pipeline',
    295                 message=self._log_message(step_idx),
--> 296                 **fit_params_steps[name])
    297             # Replace the transformer of the step with the fitted
    298             # transformer. This is necessary when loading the transformer

/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/joblib/memory.py in __call__(self, *args, **kwargs)
    350 
    351     def __call__(self, *args, **kwargs):
--> 352         return self.func(*args, **kwargs)
    353 
    354     def call_and_shelve(self, *args, **kwargs):

/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/sklearn/pipeline.py in _fit_transform_one(transformer, X, y, weight, message_clsname, message, **fit_params)
    738     with _print_elapsed_time(message_clsname, message):
    739         if hasattr(transformer, 'fit_transform'):
--> 740             res = transformer.fit_transform(X, y, **fit_params)
    741         else:
    742             res = transformer.fit(X, y, **fit_params).transform(X)

/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/pycaret/internal/preprocess.py in fit_transform(self, dataset, y)
    419 
    420         # since this is for training , we dont nees any transformation since it has already been transformed in fit
--> 421         data = self.fit(data)
    422 
    423         # additionally we just need to treat the target variable

/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/pycaret/internal/preprocess.py in fit(self, dataset, y)
    321 
    322             display(dt_print_out[["Data Type"]])
--> 323             self.response = input()
    324 
    325             if self.response in [

/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/ipykernel/kernelbase.py in raw_input(self, prompt)
   1002         if not self._allow_stdin:
   1003             raise StdinNotImplementedError(
-> 1004                 "raw_input was called, but this frontend does not support input requests."
   1005             )
   1006         return self._input_request(

StdinNotImplementedError: raw_input was called, but this frontend does not support input requests.

Basis ML

Model Tuning¶