计算机毕业分享(含算法) 基于机器学习的乳腺癌数据分析

文章目录

  • 0 简介
      • 模型评估
            • KNN Classifier
            • Logistic Regression Classifier
            • Random Forest Classifier
            • Decision Tree Classifier
            • GBDT(Gradient Boosting Decision Tree) Classifier
            • AdaBoost
            • Bagging
            • SVM
            • 最后

              0 简介

              今天学长向大家分享一个毕业设计项目

              毕业设计 基于机器学习的乳腺癌数据分析

              项目运行效果:

              毕业设计 机器学习乳腺数据挖掘分析

              项目获取:

              https://gitee.com/sinonfin/algorithm-sharing

              模型评估

              1. 机器学习常用分类模型:

              1.最近邻 (KNN Classifier)

              2.Logistic回归 (Logistic Regression Classifier)

              3.高斯朴素贝叶斯(GaussianNB)

              4.多项分布朴素贝叶斯(Multinomial Naive Bayes Classifier )

              5.决策树(Decision Tree Classifier)

              6.集成算法(Ensemble methods)

              • 梯度提升决策树(GBDT(Gradient Boosting Decision Tree) Classifier)
              • 自适应推举算法(AdaBoost)(AdaBoost Classifier)
              • 随机森林 (Random Forest Classifier)
              • Bagging

                7.支持向量机(SVM Classifier)

                2.分类模型的评估:

                • 模型评估指标

                  准确率,精确率和召回率,F1分数,均方误差、根均方误差、绝对百分比误差,ROC曲线

                • 模型评估方法

                  Holdout检验,交叉验证,自助法,超参数调优

                • 优化过拟合与欠拟合

                  • 降低过拟合风险的方法:

                    (1).

                    从数据入手,获得更多的训练数据。使用更多的训练数据是解决过拟合问题最有效的手段,因为更多的样本能够让模型学习到更多更有效的特征,减少噪音的影响,当然,直接增加实验数据一般是很困难的,但是可以通过一定的规则来扩充训练数据。比如,在图像分类的问题上,可以通过图像的平移、旋转、缩放等方式扩充数据;更进一步地,可以使用生成式对抗网络来合成大量的新训练数据

                    (2).

                    降低模型复杂度。在数据较少时,模型过于复杂是产生过拟合的主要因素,适当降低模型复杂度可以避免拟合过多的采样噪音。例如,在神经网络中减少网络层数、神经元个数等;在决策树模型中降低树的深度、进行剪枝等

                    (3). 正则化方法

                    (4). 集成学习方法。集成学习是把多个模型集成在一起,来降低单一模型的过拟合风险

                    * 降低欠拟合风险方法
                    

                    (1).添加新特征。当特征不足或现有特征与样本标签的相关性不强时,模型容易出现不拟合,通过挖掘’上下文特征’‘ID类特征’'组合特征’等新的特征,往往能够取得更好的效果,在深度学习的潮流中,有很多类型可以帮组完成特征工程,如因子分解机

                    (2).增加模型复杂度。简单模型的学习能力较差,通过增加模型的复杂度可以使模型拥有更强的拟合能力,例如,在线性模型中添加高次项,在神经网络模型中增加网络层数或神经元个数等

                    (3). 减少正则化系数。正则化是用来防止过拟合的,但当模型出现欠拟合现象时,则需要针对性地减少正则化系数

                    1. 导入扩展库

                    import time 
                    from sklearn import metrics 
                    import pickle as pickle 
                    import pandas as pd
                    from sklearn import tree
                    from sklearn.tree import export_graphviz
                    import graphviz
                    from IPython.display import Image  
                    import pydotplus
                    import os
                    from sklearn.datasets import load_breast_cancer
                    from sklearn.model_selection import train_test_split
                    from sklearn.naive_bayes import MultinomialNB 
                    from sklearn.naive_bayes import GaussianNB
                    from sklearn.neighbors import KNeighborsClassifier 
                    from sklearn.linear_model import LogisticRegression 
                    from sklearn.tree import DecisionTreeClassifier
                    from sklearn.svm import SVC 
                    from sklearn.tree import DecisionTreeRegressor
                    from sklearn.ensemble import GradientBoostingClassifier 
                    from sklearn.ensemble import RandomForestClassifier 
                    from sklearn.ensemble import AdaBoostClassifier
                    from sklearn.ensemble import BaggingClassifier
                    from sklearn.model_selection import GridSearchCV
                    from sklearn.model_selection import learning_curve
                    from common.utils import plot_learning_curve
                    from common.utils import plot_param_curve
                    from sklearn.metrics import roc_curve, auc
                    from sklearn.metrics import plot_roc_curve
                    from sklearn.metrics import confusion_matrix
                    from sklearn.metrics import classification_report
                    from sklearn.model_selection import ShuffleSplit
                    import numpy as np
                    import matplotlib.pyplot as plt
                    import matplotlib as mpl
                    mpl.rcParams['font.sans-serif'] = ['SimHei']   #设置简黑字体
                    mpl.rcParams['axes.unicode_minus'] = False # 解决‘-’bug
                    %matplotlib inline
                    import warnings
                    warnings.filterwarnings("ignore")
                    

                    2. 准备训练数据

                    cancer = load_breast_cancer() #载入数据
                    df = pd.DataFrame(cancer.data,columns=cancer.feature_names)
                    df['target'] = cancer.target
                    x = cancer.data
                    y = cancer.target
                    print('data:',x.shape)
                    print('target:',y.shape)
                    # 打印前五行数据
                    df.head()
                    

                    data: (569, 30)
                    target: (569,)
                    

                    | mean radius| mean texture| mean perimeter| mean area| mean smoothness| mean

                    compactness| mean concavity| mean concave points| mean symmetry| mean fractal

                    dimension| …| worst texture| worst perimeter| worst area| worst smoothness|

                    worst compactness| worst concavity| worst concave points| worst symmetry|

                    worst fractal dimension| target

                    —|—|—|—|—|—|—|—|—|—|—|—|—|—|—|—|—|—|—|—|—|—

                    0| 17.99| 10.38| 122.80| 1001.0| 0.11840| 0.27760| 0.3001| 0.14710| 0.2419|

                    0.07871| …| 17.33| 184.60| 2019.0| 0.1622| 0.6656| 0.7119| 0.2654| 0.4601|

                    0.11890| 0

                    1| 20.57| 17.77| 132.90| 1326.0| 0.08474| 0.07864| 0.0869| 0.07017| 0.1812|

                    0.05667| …| 23.41| 158.80| 1956.0| 0.1238| 0.1866| 0.2416| 0.1860| 0.2750|

                    0.08902| 0

                    2| 19.69| 21.25| 130.00| 1203.0| 0.10960| 0.15990| 0.1974| 0.12790| 0.2069|

                    0.05999| …| 25.53| 152.50| 1709.0| 0.1444| 0.4245| 0.4504| 0.2430| 0.3613|

                    0.08758| 0

                    3| 11.42| 20.38| 77.58| 386.1| 0.14250| 0.28390| 0.2414| 0.10520| 0.2597|

                    0.09744| …| 26.50| 98.87| 567.7| 0.2098| 0.8663| 0.6869| 0.2575| 0.6638|

                    0.17300| 0

                    4| 20.29| 14.34| 135.10| 1297.0| 0.10030| 0.13280| 0.1980| 0.10430| 0.1809|

                    0.05883| …| 16.67| 152.20| 1575.0| 0.1374| 0.2050| 0.4000| 0.1625| 0.2364|

                    0.07678| 0

                    5 rows × 31 columns

                    # 查看数据描述
                    df.info()
                    

                    RangeIndex: 569 entries, 0 to 568
                    Data columns (total 31 columns):
                     #   Column                   Non-Null Count  Dtype  
                    ---  ------                   --------------  -----  
                     0   mean radius              569 non-null    float64
                     1   mean texture             569 non-null    float64
                     2   mean perimeter           569 non-null    float64
                     3   mean area                569 non-null    float64
                     4   mean smoothness          569 non-null    float64
                     5   mean compactness         569 non-null    float64
                     6   mean concavity           569 non-null    float64
                     7   mean concave points      569 non-null    float64
                     8   mean symmetry            569 non-null    float64
                     9   mean fractal dimension   569 non-null    float64
                     10  radius error             569 non-null    float64
                     11  texture error            569 non-null    float64
                     12  perimeter error          569 non-null    float64
                     13  area error               569 non-null    float64
                     14  smoothness error         569 non-null    float64
                     15  compactness error        569 non-null    float64
                     16  concavity error          569 non-null    float64
                     17  concave points error     569 non-null    float64
                     18  symmetry error           569 non-null    float64
                     19  fractal dimension error  569 non-null    float64
                     20  worst radius             569 non-null    float64
                     21  worst texture            569 non-null    float64
                     22  worst perimeter          569 non-null    float64
                     23  worst area               569 non-null    float64
                     24  worst smoothness         569 non-null    float64
                     25  worst compactness        569 non-null    float64
                     26  worst concavity          569 non-null    float64
                     27  worst concave points     569 non-null    float64
                     28  worst symmetry           569 non-null    float64
                     29  worst fractal dimension  569 non-null    float64
                     30  target                   569 non-null    int32  
                    dtypes: float64(30), int32(1)
                    memory usage: 135.7 KB
                    

                    数据未包含空值

                    # 打印数据类别及每种类别的个数
                    df['target'].value_counts()
                    

                    1    357
                    0    212
                    Name: target, dtype: int64
                    

                    # 查看对数值属性的概括
                    df.describe()
                    

                    # 画出数据分布直方图
                    df.hist(bins=50,figsize=(20,15))
                    

                    x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.33)
                    

                    训练集

                    df_train = pd.DataFrame(x_train,columns=cancer.feature_names)
                    df_train['target'] = y_train
                    df_train
                    

                    测试集

                    df_test = pd.DataFrame(x_test,columns=cancer.feature_names)
                    df_test['target'] = y_test
                    df_test
                    

                    3.创建模型

                    # Multinomial Naive Bayes Classifier 
                    def mul_naive_bayes_classifier(train_x, train_y): 
                        model = MultinomialNB(alpha=0.01) 
                        model.fit(train_x, train_y) 
                        return model 
                    def naive_bayes_classifier(train_x, train_y): 
                        model = GaussianNB(priors=None)
                        model.fit(train_x, train_y) 
                        return model 
                    # KNN Classifier 
                    def knn_classifier(train_x, train_y): 
                        model = KNeighborsClassifier() 
                        model.fit(train_x, train_y) 
                        return model 
                    # Logistic Regression Classifier 
                    def logistic_regression_classifier(train_x, train_y): 
                        model = LogisticRegression(penalty='l2') 
                        model.fit(train_x, train_y) 
                        return model 
                      
                    # Random Forest Classifier 
                    def random_forest_classifier(train_x, train_y): 
                        model = RandomForestClassifier(n_estimators=8) 
                        model.fit(train_x, train_y) 
                        return model 
                      
                    # Decision Tree Classifier 
                    def decision_tree_classifier(train_x, train_y): 
                        model = DecisionTreeClassifier() 
                        model.fit(train_x, train_y) 
                        return model 
                      
                    # GBDT(Gradient Boosting Decision Tree) Classifier 
                    def gradient_boosting_classifier(train_x, train_y): 
                        model = GradientBoostingClassifier(n_estimators=200) 
                        model.fit(train_x, train_y) 
                        return model 

                    # SVM Classifier 
                    def svm_classifier(train_x, train_y): 
                        model = SVC(kernel='rbf', probability=True) 
                        model.fit(train_x, train_y) 
                        return model 
                    def adaboost_classifier(train_x, train_y): 
                        model = AdaBoostClassifier(DecisionTreeClassifier(),algorithm="SAMME", n_estimators=7, learning_rate=0.4)
                        model.fit(train_x, train_y)
                        return model
                    def bagging_classifier(train_x, train_y): 
                        model = BaggingClassifier(DecisionTreeClassifier(), bootstrap=True)
                        model.fit(train_x,train_y)
                        return model
                    

                    4.测试模型

                    test_classifiers = ['NB(高斯朴素贝叶斯)','MNB(多项式分布朴素贝叶斯)', 'KNN(最近邻)', 'LR(Logistic回归)', 'RF(随机森林)', 'DT(决策树)', 'SVM(支持向量机)', 'GBDT(梯度提升决策树)','Adaboost','Bagging'] 
                    classifiers = {
                        'GBDT(梯度提升决策树)':gradient_boosting_classifier,
                        'Adaboost':adaboost_classifier,
                        'Bagging':bagging_classifier,
                        'NB(高斯朴素贝叶斯)':naive_bayes_classifier,  
                        'MNB(多项式分布朴素贝叶斯)':mul_naive_bayes_classifier,
                        'KNN(最近邻)':knn_classifier,
                        'LR(Logistic回归)':logistic_regression_classifier,
                        'RF(随机森林)':random_forest_classifier,
                        'DT(决策树)':decision_tree_classifier,
                        'SVM(支持向量机)':svm_classifier
                    }
                    

                    for classifier in test_classifiers:
                        print('******************* %s ********************' % classifier)
                        start_time = time.time()
                        model = classifiers[classifier](x_train, y_train)
                        print(model)
                        print('training took %fs!' % (time.time() - start_time))
                        predict = model.predict(x_test)
                    #     if model_save_file != None: 
                    #         model_save[classifier] = model )
                        score = metrics.precision_score(y_test, predict) 
                        recall = metrics.recall_score(y_test, predict)
                        print('precision: %.2f%%, recall: %.2f%%' % (100 * score, 100 * recall)) 
                        accuracy = metrics.accuracy_score(y_test, predict) 
                        print('accuracy: %.2f%%' % (100 * accuracy))
                        c_matrix = confusion_matrix(
                            y_test,   # array, Gound true (correct) target values
                            predict,  # array, Estimated targets as returned by a classifier
                            labels=[0,1],  # array, List of labels to index the matrix.
                            sample_weight=None  # array-like of shape = [n_samples], Optional sample weights
                        )
                        print('\nclassification_report:')
                        print(classification_report( y_test,predict,labels=[0,1]))
                        print('\nconfusion_matrix:')
                        print(c_matrix)
                        
                        cv = ShuffleSplit(n_splits=10, test_size=0.25, random_state=0)
                        title = classifier+' Learning Curves'
                        start = time.clock()
                        plot_learning_curve(plt, model,title,cancer.data, cancer.target, ylim=(0.5, 1.01), cv=cv)
                        print('elaspe: {0:.6f}'.format(time.clock()-start))
                        
                        curve1 = plot_roc_curve(model, x_train, y_train,  alpha=0.8,name=classifier)
                        curve1.figure_.suptitle("乳腺癌 ROC")
                        
                        #画出决策树
                        if classifier == 'DT(决策树)':
                            dot_data = export_graphviz(model,
                                                    out_file = None,
                                                    # feature_names = iris_feature_name,
                                                    # class_names = iris_target_name,
                                                    filled=True,
                                                    rounded=True
                                                   )
                            graph = pydotplus.graph_from_dot_data(dot_data)
                            display(Image(graph.create_png()))
                        plt.show()
                        print()
                    

                    ******************* NB(高斯朴素贝叶斯) ********************
                    GaussianNB(priors=None, var_smoothing=1e-09)
                    training took 0.004002s!
                    precision: 91.67%, recall: 94.02%
                    accuracy: 90.96%
                    classification_report:
                                  precision    recall  f1-score   support
                               0       0.90      0.86      0.88        71
                               1       0.92      0.94      0.93       117
                        accuracy                           0.91       188
                       macro avg       0.91      0.90      0.90       188
                    weighted avg       0.91      0.91      0.91       188
                    

                    confusion_matrix:
                    [[ 61  10]
                     [  7 110]]
                    elaspe: 0.299883
                    

                    ******************* MNB(多项式分布朴素贝叶斯) ********************
                    MultinomialNB(alpha=0.01, class_prior=None, fit_prior=True)
                    training took 0.008931s!
                    precision: 88.19%, recall: 95.73%
                    accuracy: 89.36%
                    classification_report:
                                  precision    recall  f1-score   support
                               0       0.92      0.79      0.85        71
                               1       0.88      0.96      0.92       117
                        accuracy                           0.89       188
                       macro avg       0.90      0.87      0.88       188
                    weighted avg       0.90      0.89      0.89       188
                    

                    confusion_matrix:
                    [[ 56  15]
                     [  5 112]]
                    elaspe: 0.272553
                    

                    png

                    ******************* KNN(最近邻) ********************
                    KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                                         metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                                         weights='uniform')
                    training took 0.006923s!
                    precision: 93.28%, recall: 94.87%
                    accuracy: 92.55%
                    classification_report:
                                  precision    recall  f1-score   support
                               0       0.91      0.89      0.90        71
                               1       0.93      0.95      0.94       117
                        accuracy                           0.93       188
                       macro avg       0.92      0.92      0.92       188
                    weighted avg       0.93      0.93      0.93       188
                    

                    confusion_matrix:
                    [[ 63   8]
                     [  6 111]]
                    elaspe: 1.937058
                    

                    ******************* LR(Logistic回归) ********************
                    LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                                       intercept_scaling=1, l1_ratio=None, max_iter=100,
                                       multi_class='auto', n_jobs=None, penalty='l2',
                                       random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                                       warm_start=False)
                    training took 0.132035s!
                    precision: 95.73%, recall: 95.73%
                    accuracy: 94.68%
                    classification_report:
                                  precision    recall  f1-score   support
                               0       0.93      0.93      0.93        71
                               1       0.96      0.96      0.96       117
                        accuracy                           0.95       188
                       macro avg       0.94      0.94      0.94       188
                    weighted avg       0.95      0.95      0.95       188
                    

                    confusion_matrix:
                    [[ 66   5]
                     [  5 112]]
                    elaspe: 5.063377
                    

                    ******************* RF(随机森林) ********************
                    RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                                           criterion='gini', max_depth=None, max_features='auto',
                                           max_leaf_nodes=None, max_samples=None,
                                           min_impurity_decrease=0.0, min_impurity_split=None,
                                           min_samples_leaf=1, min_samples_split=2,
                                           min_weight_fraction_leaf=0.0, n_estimators=8,
                                           n_jobs=None, oob_score=False, random_state=None,
                                           verbose=0, warm_start=False)
                    training took 0.044998s!
                    precision: 94.83%, recall: 94.02%
                    accuracy: 93.09%
                    classification_report:
                                  precision    recall  f1-score   support
                               0       0.90      0.92      0.91        71
                               1       0.95      0.94      0.94       117
                        accuracy                           0.93       188
                       macro avg       0.93      0.93      0.93       188
                    weighted avg       0.93      0.93      0.93       188
                    

                    confusion_matrix:
                    [[ 65   6]
                     [  7 110]]
                    elaspe: 1.873387
                    

                    ******************* DT(决策树) ********************
                    DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
                                           max_depth=None, max_features=None, max_leaf_nodes=None,
                                           min_impurity_decrease=0.0, min_impurity_split=None,
                                           min_samples_leaf=1, min_samples_split=2,
                                           min_weight_fraction_leaf=0.0, presort='deprecated',
                                           random_state=None, splitter='best')
                    training took 0.014005s!
                    precision: 93.16%, recall: 93.16%
                    accuracy: 91.49%
                    classification_report:
                                  precision    recall  f1-score   support
                               0       0.89      0.89      0.89        71
                               1       0.93      0.93      0.93       117
                        accuracy                           0.91       188
                       macro avg       0.91      0.91      0.91       188
                    weighted avg       0.91      0.91      0.91       188
                    

                    confusion_matrix:
                    [[ 63   8]
                     [  8 109]]
                    elaspe: 0.448771
                    

                    png

                    ******************* SVM(支持向量机) ********************
                    SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
                        decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
                        max_iter=-1, probability=True, random_state=None, shrinking=True, tol=0.001,
                        verbose=False)
                    training took 0.028140s!
                    precision: 90.48%, recall: 97.44%
                    accuracy: 92.02%
                    classification_report:
                                  precision    recall  f1-score   support
                               0       0.95      0.83      0.89        71
                               1       0.90      0.97      0.94       117
                        accuracy                           0.92       188
                       macro avg       0.93      0.90      0.91       188
                    weighted avg       0.92      0.92      0.92       188
                    

                    confusion_matrix:
                    [[ 59  12]
                     [  3 114]]
                    elaspe: 1.027975
                    

                    ******************* GBDT(梯度提升决策树) ********************
                    GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,
                                               learning_rate=0.1, loss='deviance', max_depth=3,
                                               max_features=None, max_leaf_nodes=None,
                                               min_impurity_decrease=0.0, min_impurity_split=None,
                                               min_samples_leaf=1, min_samples_split=2,
                                               min_weight_fraction_leaf=0.0, n_estimators=200,
                                               n_iter_no_change=None, presort='deprecated',
                                               random_state=None, subsample=1.0, tol=0.0001,
                                               validation_fraction=0.1, verbose=0,
                                               warm_start=False)
                    training took 0.996242s!
                    precision: 94.07%, recall: 94.87%
                    accuracy: 93.09%
                    classification_report:
                                  precision    recall  f1-score   support
                               0       0.91      0.90      0.91        71
                               1       0.94      0.95      0.94       117
                        accuracy                           0.93       188
                       macro avg       0.93      0.93      0.93       188
                    weighted avg       0.93      0.93      0.93       188
                    

                    confusion_matrix:
                    [[ 64   7]
                     [  6 111]]
                    elaspe: 39.072309
                    

                    png

                    ******************* Adaboost ********************
                    AdaBoostClassifier(algorithm='SAMME',
                                       base_estimator=DecisionTreeClassifier(ccp_alpha=0.0,
                              class_weight=None,
                              criterion='gini',
                              max_depth=None,
                              max_features=None,
                              max_leaf_nodes=None,
                              min_impurity_decrease=0.0,
                              min_impurity_split=None,
                              min_samples_leaf=1,
                              min_samples_split=2,
                              min_weight_fraction_leaf=0.0,
                              presort='deprecated',
                              random_state=None,
                              splitter='best'),
                                       learning_rate=0.4, n_estimators=7, random_state=None)
                    training took 0.025010s!
                    precision: 93.22%, recall: 94.02%
                    accuracy: 92.02%
                    classification_report:
                                  precision    recall  f1-score   support
                               0       0.90      0.89      0.89        71
                               1       0.93      0.94      0.94       117
                        accuracy                           0.92       188
                       macro avg       0.92      0.91      0.91       188
                    weighted avg       0.92      0.92      0.92       188
                    

                    confusion_matrix:
                    [[ 63   8]
                     [  7 110]]
                    elaspe: 0.960197
                    

                    png

                    ******************* Bagging ********************
                    BaggingClassifier(base_estimator=DecisionTreeClassifier(ccp_alpha=0.0,
                             class_weight=None,
                             criterion='gini',
                             max_depth=None,
                             max_features=None,
                             max_leaf_nodes=None,
                             min_impurity_decrease=0.0,
                             min_impurity_split=None,
                             min_samples_leaf=1,
                             min_samples_split=2,
                             min_weight_fraction_leaf=0.0,
                             presort='deprecated',
                             random_state=None,
                             splitter='best'),
                                      bootstrap=True, bootstrap_features=False, max_features=1.0,
                                      max_samples=1.0, n_estimators=10, n_jobs=None,
                                      oob_score=False, random_state=None, verbose=0,
                                      warm_start=False)
                    training took 0.106950s!
                    precision: 94.02%, recall: 94.02%
                    accuracy: 92.55%
                    classification_report:
                                  precision    recall  f1-score   support
                               0       0.90      0.90      0.90        71
                               1       0.94      0.94      0.94       117
                        accuracy                           0.93       188
                       macro avg       0.92      0.92      0.92       188
                    weighted avg       0.93      0.93      0.93       188
                    

                    confusion_matrix:
                    [[ 64   7]
                     [  7 110]]
                    elaspe: 4.000736
                    

                    png

                    当使用默认参数时, GBDT(梯度提升决策树)的准确率和召回率最高,同时耗费的时间也最长;相对的MNB(多项式分布朴素贝叶斯)耗费的时间最短。

                    5.参数调优

                    各个分类模型的默认参数

                    KNN Classifier

                    KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                                             metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                                             weights='uniform')
                        userscript.html?id=1cfc3476-717c-41b6-b4e7-1a24541c7949:24
                    
                    Logistic Regression Classifier

                    LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                                       intercept_scaling=1, l1_ratio=None, max_iter=100,
                                       multi_class='auto', n_jobs=None, penalty='l2',
                                       random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                                       warm_start=False)
                    
                    Random Forest Classifier

                    RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                                           criterion='gini', max_depth=None, max_features='auto',
                                           max_leaf_nodes=None, max_samples=None,
                                           min_impurity_decrease=0.0, min_impurity_split=None,
                                           min_samples_leaf=1, min_samples_split=2,
                                           min_weight_fraction_leaf=0.0, n_estimators=8,
                                           n_jobs=None, oob_score=False, random_state=None,
                                           verbose=0, warm_start=False)
                    
                    Decision Tree Classifier

                    max_depth(树的深度)
                    max_leaf_nodes(叶子结点的数目)
                    max_features(最大特征数目)
                    min_samples_leaf(叶子结点的最小样本数)
                    min_samples_split(中间结点的最小样本树)
                    min_weight_fraction_leaf(叶子节点的样本权重占总权重的比例)
                    min_impurity_split(最小不纯净度)也可以调整
                    DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
                                           max_depth=None, max_features=None, max_leaf_nodes=None,
                                           min_impurity_decrease=0.0, min_impurity_split=None,
                                           min_samples_leaf=1, min_samples_split=2,
                                           min_weight_fraction_leaf=0.0, presort='deprecated',
                                           random_state=None, splitter='best')
                                           
                    [sklearn决策树之剪枝参数_数据结构与算法_The Zen of Data Analysis-CSDN博客](https://blog.csdn.net/gracejpw/article/details/102239574) 

                    GBDT(Gradient Boosting Decision Tree) Classifier

                    GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,
                                               learning_rate=0.1, loss='deviance', max_depth=3,
                                               max_features=None, max_leaf_nodes=None,
                                               min_impurity_decrease=0.0, min_impurity_split=None,
                                               min_samples_leaf=1, min_samples_split=2,
                                               min_weight_fraction_leaf=0.0, n_estimators=200,
                                               n_iter_no_change=None, presort='deprecated',
                                               random_state=None, subsample=1.0, tol=0.0001,
                                               validation_fraction=0.1, verbose=0,
                                               warm_start=False)
                    
                    AdaBoost

                    AdaBoostClassifier(algorithm='SAMME',  
                                       base_estimator=DecisionTreeClassifier(ccp_alpha=0.0,  
                              class_weight=None,  
                              criterion='gini',  
                              max_depth=None,  
                              max_features=None,  
                              max_leaf_nodes=None,  
                              min_impurity_decrease=0.0,  
                              min_impurity_split=None,  
                              min_samples_leaf=1,  
                              min_samples_split=2,  
                              min_weight_fraction_leaf=0.0,  
                              presort='deprecated',  
                              random_state=None,  
                              splitter='best'),  
                                       learning_rate=0.4, n_estimators=7, random_state=None) 
                    Bagging

                    BaggingClassifier(base_estimator=DecisionTreeClassifier(ccp_alpha=0.0,
                             class_weight=None,
                             criterion='gini',
                             max_depth=None,
                             max_features=None,
                             max_leaf_nodes=None,
                             min_impurity_decrease=0.0,
                             min_impurity_split=None,
                             min_samples_leaf=1,
                             min_samples_split=2,
                             min_weight_fraction_leaf=0.0,
                             presort='deprecated',
                             random_state=None,
                             splitter='best'),
                                      bootstrap=True, bootstrap_features=False, max_features=1.0,
                                      max_samples=1.0, n_estimators=10, n_jobs=None,
                                      oob_score=False, random_state=None, verbose=0,
                                      warm_start=False)
                    
                    SVM

                    SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
                        decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
                        max_iter=-1, probability=True, random_state=None, shrinking=True, tol=0.001,
                        verbose=False)
                    

                    def grid_search(model,param_grid,train_x,train_y,cv=5):
                        grid_search = GridSearchCV(model, param_grid=param_grid, n_jobs = -1, verbose=1) # cv:交叉验证参数,默认是None, 使用三折交叉验证,指定 fold数量, default = 3
                        grid_search.fit(train_x, train_y) 
                        best_parameters = grid_search.best_estimator_.get_params() 
                    #     for para, val in list(best_parameters.items()): 
                    #         print(para, val) 
                        print('最优参数:',best_parameters)
                        return grid_search.best_estimator_
                    

                    # 调整参数的字典
                    common_classifiers = ['KNN(最近邻)', 'LR(Logistic回归)',  'DT(决策树)', 'SVM(支持向量机)' ] 
                    ensem_classifiers = ['RF(随机森林)','GBDT(梯度提升决策树)','Adaboost']
                    basic_classifiers = {
                        'KNN(最近邻)':KNeighborsClassifier(),
                        'LR(Logistic回归)':LogisticRegression(penalty='l2'),
                        'DT(决策树)': DecisionTreeClassifier() ,
                        'SVM(支持向量机)': SVC(kernel='rbf', probability=True),
                        'GBDT(梯度提升决策树)': GradientBoostingClassifier(n_estimators=200),
                        'RF(随机森林)': RandomForestClassifier(n_estimators=8) ,
                        'Adaboost': AdaBoostClassifier(DecisionTreeClassifier(),algorithm="SAMME", n_estimators=7, learning_rate=0.4)
                    }
                    grid_params = {
                        'KNN(最近邻)':[
                            {'weights':['uniform'],'n_neighbors':np.arange(4,8,1)},
                            {'weights':['distance'],'n_neighbors':np.arange(4,8,1)},
                        ],
                        'LR(Logistic回归)':[
                            {'C':[0.01,0.1,1.0,10.0,100.0],'penalty':['l1']},
                            {'C':[0.01,0.1,1.0,10.0,100.0],'penalty':['l2'],'solver':['liblinear','newton-cg','sag','lbfgs']},
                        ],
                        'DT(决策树)':[
                            {'min_samples_split':np.arange(1,15,1),'min_samples_leaf':np.arange(1,15,1),'splitter':['random']},
                            {'min_samples_split':np.arange(1,15,1),'min_samples_leaf':np.arange(1,15,1),'splitter':['best']},
                        ],
                        'SVM(支持向量机)':[
                          {'C': [1e-1, 1, 10, 100, 1000], 'kernel': ['linear']},
                          {'C': [1e-1, 1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']},
                        ]
                        
                    }
                    ensem_params = {
                        'GBDT(梯度提升决策树)':{'n_estimators':np.arange(20,500,50),'max_depth':np.arange(3,14,2), 'min_samples_split':np.arange(2,10,2)},#'min_samples_split':list(range(800,1900,200)), 'min_samples_leaf':list(range(60,101,10))
                        'RF(随机森林)':{'n_estimators':np.arange(10,71,10),'max_depth':np.arange(3,14,2), 'min_samples_split':np.arange(80,150,20), 'min_samples_leaf':np.arange(10,60,10)},
                        'Adaboost':{'n_estimators':np.arange(1,11,1),'learning_rate':np.arange(0.1,1,0.1)}
                    }
                    

                    from sklearn.metrics import roc_curve, auc, roc_auc_score
                    

                    for classifier in common_classifiers:
                        print('******************* %s ********************' % classifier)
                        start_time = time.time()
                        model = basic_classifiers[classifier]
                        clf = grid_search(model,grid_params[classifier],x_train,y_train,cv=5) 
                        print('training took %fs!' % (time.time() - start_time))
                        print(clf)
                        clf.fit(x_train,y_train)
                        predict = clf.predict(x_test)
                        score = metrics.precision_score(y_test, predict) 
                        recall = metrics.recall_score(y_test, predict)
                        print('precision: %.2f%%, recall: %.2f%%' % (100 * score, 100 * recall)) 
                        accuracy = metrics.accuracy_score(y_test, predict) 
                        print('accuracy: %.2f%%' % (100 * accuracy))
                        c_matrix = confusion_matrix(
                            y_test,   # array, Gound true (correct) target values
                            predict,  # array, Estimated targets as returned by a classifier
                            labels=[0,1],  # array, List of labels to index the matrix.
                            sample_weight=None  # array-like of shape = [n_samples], Optional sample weights
                        )
                        print('\nclassification_report:')
                        print(classification_report( y_test,predict,labels=[0,1]))
                        print('\nconfusion_matrix:')
                        print(c_matrix)
                        print()
                        curve1 = plot_roc_curve(clf, x_train, y_train,  alpha=0.8,name=classifier)
                        curve1.figure_.suptitle("乳腺癌 ROC")
                        plt.show()
                    

                    ******************* KNN(最近邻) ********************
                    Fitting 5 folds for each of 8 candidates, totalling 40 fits
                    

                    [Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
                    [Parallel(n_jobs=-1)]: Done  40 out of  40 | elapsed:    2.7s finished
                    

                    最优参数: {'algorithm': 'auto', 'leaf_size': 30, 'metric': 'minkowski', 'metric_params': None, 'n_jobs': None, 'n_neighbors': 6, 'p': 2, 'weights': 'uniform'}
                    training took 2.806710s!
                    KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                                         metric_params=None, n_jobs=None, n_neighbors=6, p=2,
                                         weights='uniform')
                    precision: 94.92%, recall: 97.39%
                    accuracy: 95.21%
                    classification_report:
                                  precision    recall  f1-score   support
                               0       0.96      0.92      0.94        73
                               1       0.95      0.97      0.96       115
                        accuracy                           0.95       188
                       macro avg       0.95      0.95      0.95       188
                    weighted avg       0.95      0.95      0.95       188
                    

                    confusion_matrix:
                    [[ 67   6]
                     [  3 112]]
                    

                    ******************* LR(Logistic回归) ********************
                    Fitting 5 folds for each of 25 candidates, totalling 125 fits
                    

                    [Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
                    [Parallel(n_jobs=-1)]: Done 125 out of 125 | elapsed:    3.8s finished
                    

                    最优参数: {'C': 100.0, 'class_weight': None, 'dual': False, 'fit_intercept': True, 'intercept_scaling': 1, 'l1_ratio': None, 'max_iter': 100, 'multi_class': 'auto', 'n_jobs': None, 'penalty': 'l2', 'random_state': None, 'solver': 'liblinear', 'tol': 0.0001, 'verbose': 0, 'warm_start': False}
                    training took 3.939845s!
                    LogisticRegression(C=100.0, class_weight=None, dual=False, fit_intercept=True,
                                       intercept_scaling=1, l1_ratio=None, max_iter=100,
                                       multi_class='auto', n_jobs=None, penalty='l2',
                                       random_state=None, solver='liblinear', tol=0.0001, verbose=0,
                                       warm_start=False)
                    precision: 95.83%, recall: 100.00%
                    accuracy: 97.34%
                    classification_report:
                                  precision    recall  f1-score   support
                               0       1.00      0.93      0.96        73
                               1       0.96      1.00      0.98       115
                        accuracy                           0.97       188
                       macro avg       0.98      0.97      0.97       188
                    weighted avg       0.97      0.97      0.97       188
                    

                    confusion_matrix:
                    [[ 68   5]
                     [  0 115]]
                    

                    png

                    ******************* DT(决策树) ********************
                    Fitting 5 folds for each of 392 candidates, totalling 1960 fits
                    

                    [Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
                    [Parallel(n_jobs=-1)]: Done 312 tasks      | elapsed:    0.9s
                    [Parallel(n_jobs=-1)]: Done 1960 out of 1960 | elapsed:    4.6s finished
                    

                    最优参数: {'ccp_alpha': 0.0, 'class_weight': None, 'criterion': 'gini', 'max_depth': None, 'max_features': None, 'max_leaf_nodes': None, 'min_impurity_decrease': 0.0, 'min_impurity_split': None, 'min_samples_leaf': 1, 'min_samples_split': 4, 'min_weight_fraction_leaf': 0.0, 'presort': 'deprecated', 'random_state': None, 'splitter': 'random'}
                    training took 4.746069s!
                    DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
                                           max_depth=None, max_features=None, max_leaf_nodes=None,
                                           min_impurity_decrease=0.0, min_impurity_split=None,
                                           min_samples_leaf=1, min_samples_split=4,
                                           min_weight_fraction_leaf=0.0, presort='deprecated',
                                           random_state=None, splitter='random')
                    precision: 95.54%, recall: 93.04%
                    accuracy: 93.09%
                    classification_report:
                                  precision    recall  f1-score   support
                               0       0.89      0.93      0.91        73
                               1       0.96      0.93      0.94       115
                        accuracy                           0.93       188
                       macro avg       0.93      0.93      0.93       188
                    weighted avg       0.93      0.93      0.93       188
                    

                    confusion_matrix:
                    [[ 68   5]
                     [  8 107]]
                    

                    ******************* SVM(支持向量机) ********************
                    Fitting 5 folds for each of 15 candidates, totalling 75 fits
                    

                    [Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
                    [Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:  7.4min
                    [Parallel(n_jobs=-1)]: Done  75 out of  75 | elapsed: 10.5min finished
                    

                    最优参数: {'C': 10, 'break_ties': False, 'cache_size': 200, 'class_weight': None, 'coef0': 0.0, 'decision_function_shape': 'ovr', 'degree': 3, 'gamma': 'scale', 'kernel': 'linear', 'max_iter': -1, 'probability': True, 'random_state': None, 'shrinking': True, 'tol': 0.001, 'verbose': False}
                    training took 698.401171s!
                    SVC(C=10, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
                        decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',
                        max_iter=-1, probability=True, random_state=None, shrinking=True, tol=0.001,
                        verbose=False)
                    precision: 96.58%, recall: 98.26%
                    accuracy: 96.81%
                    classification_report:
                                  precision    recall  f1-score   support
                               0       0.97      0.95      0.96        73
                               1       0.97      0.98      0.97       115
                        accuracy                           0.97       188
                       macro avg       0.97      0.96      0.97       188
                    weighted avg       0.97      0.97      0.97       188
                    

                    confusion_matrix:
                    [[ 69   4]
                     [  2 113]]
                    

                    集成学习调参

                    for classifier in ensem_classifiers:
                        print('******************* %s ********************' % classifier)
                        start_time = time.time()
                        model = basic_classifiers[classifier]
                        clf = grid_search(model,ensem_params[classifier],x_train,y_train,cv=5) 
                        print('training took %fs!' % (time.time() - start_time))
                        print(clf)
                        clf.fit(x_train,y_train)
                        predict = clf.predict(x_test)
                        score = metrics.precision_score(y_test, predict) 
                        recall = metrics.recall_score(y_test, predict)
                        print('precision: %.2f%%, recall: %.2f%%' % (100 * score, 100 * recall)) 
                        accuracy = metrics.accuracy_score(y_test, predict) 
                        print('accuracy: %.2f%%' % (100 * accuracy))
                        c_matrix = confusion_matrix(
                            y_test,   # array, Gound true (correct) target values
                            predict,  # array, Estimated targets as returned by a classifier
                            labels=[0,1],  # array, List of labels to index the matrix.
                            sample_weight=None  # array-like of shape = [n_samples], Optional sample weights
                        )
                        print('\nclassification_report:')
                        print(classification_report( y_test,predict,labels=[0,1]))
                        print('\nconfusion_matrix:')
                        print(c_matrix)
                        print()
                        curve1 = plot_roc_curve(clf, x_train, y_train,  alpha=0.8,name=classifier)
                        curve1.figure_.suptitle("乳腺癌 ROC")
                        plt.show()
                    

                    ******************* RF(随机森林) ********************
                    Fitting 5 folds for each of 840 candidates, totalling 4200 fits
                    

                    [Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
                    [Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:    4.7s
                    [Parallel(n_jobs=-1)]: Done 192 tasks      | elapsed:   12.4s
                    [Parallel(n_jobs=-1)]: Done 442 tasks      | elapsed:   27.4s
                    [Parallel(n_jobs=-1)]: Done 792 tasks      | elapsed:   55.1s
                    [Parallel(n_jobs=-1)]: Done 1242 tasks      | elapsed:  1.5min
                    [Parallel(n_jobs=-1)]: Done 1792 tasks      | elapsed:  2.2min
                    [Parallel(n_jobs=-1)]: Done 2442 tasks      | elapsed:  3.0min
                    [Parallel(n_jobs=-1)]: Done 3192 tasks      | elapsed:  3.8min
                    [Parallel(n_jobs=-1)]: Done 4042 tasks      | elapsed:  4.8min
                    [Parallel(n_jobs=-1)]: Done 4200 out of 4200 | elapsed:  5.0min finished
                    

                    最优参数: {'bootstrap': True, 'ccp_alpha': 0.0, 'class_weight': None, 'criterion': 'gini', 'max_depth': 3, 'max_features': 'auto', 'max_leaf_nodes': None, 'max_samples': None, 'min_impurity_decrease': 0.0, 'min_impurity_split': None, 'min_samples_leaf': 10, 'min_samples_split': 120, 'min_weight_fraction_leaf': 0.0, 'n_estimators': 30, 'n_jobs': None, 'oob_score': False, 'random_state': None, 'verbose': 0, 'warm_start': False}
                    training took 301.064679s!
                    RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                                           criterion='gini', max_depth=3, max_features='auto',
                                           max_leaf_nodes=None, max_samples=None,
                                           min_impurity_decrease=0.0, min_impurity_split=None,
                                           min_samples_leaf=10, min_samples_split=120,
                                           min_weight_fraction_leaf=0.0, n_estimators=30,
                                           n_jobs=None, oob_score=False, random_state=None,
                                           verbose=0, warm_start=False)
                    precision: 94.17%, recall: 94.17%
                    accuracy: 92.55%
                    classification_report:
                                  precision    recall  f1-score   support
                               0       0.90      0.90      0.90        68
                               1       0.94      0.94      0.94       120
                        accuracy                           0.93       188
                       macro avg       0.92      0.92      0.92       188
                    weighted avg       0.93      0.93      0.93       188
                    

                    confusion_matrix:
                    [[ 61   7]
                     [  7 113]]
                    

                    ******************* GBDT(梯度提升决策树) ********************
                    Fitting 5 folds for each of 240 candidates, totalling 1200 fits
                    

                    [Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
                    [Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:   11.9s
                    [Parallel(n_jobs=-1)]: Done 192 tasks      | elapsed:  1.0min
                    [Parallel(n_jobs=-1)]: Done 442 tasks      | elapsed:  2.1min
                    [Parallel(n_jobs=-1)]: Done 792 tasks      | elapsed:  3.8min
                    [Parallel(n_jobs=-1)]: Done 1200 out of 1200 | elapsed:  5.2min finished
                    

                    最优参数: {'ccp_alpha': 0.0, 'criterion': 'friedman_mse', 'init': None, 'learning_rate': 0.1, 'loss': 'deviance', 'max_depth': 3, 'max_features': None, 'max_leaf_nodes': None, 'min_impurity_decrease': 0.0, 'min_impurity_split': None, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'n_estimators': 420, 'n_iter_no_change': None, 'presort': 'deprecated', 'random_state': None, 'subsample': 1.0, 'tol': 0.0001, 'validation_fraction': 0.1, 'verbose': 0, 'warm_start': False}
                    training took 314.633900s!
                    GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,
                                               learning_rate=0.1, loss='deviance', max_depth=3,
                                               max_features=None, max_leaf_nodes=None,
                                               min_impurity_decrease=0.0, min_impurity_split=None,
                                               min_samples_leaf=1, min_samples_split=2,
                                               min_weight_fraction_leaf=0.0, n_estimators=420,
                                               n_iter_no_change=None, presort='deprecated',
                                               random_state=None, subsample=1.0, tol=0.0001,
                                               validation_fraction=0.1, verbose=0,
                                               warm_start=False)
                    precision: 96.75%, recall: 99.17%
                    accuracy: 97.34%
                    classification_report:
                                  precision    recall  f1-score   support
                               0       0.98      0.94      0.96        68
                               1       0.97      0.99      0.98       120
                        accuracy                           0.97       188
                       macro avg       0.98      0.97      0.97       188
                    weighted avg       0.97      0.97      0.97       188
                    

                    confusion_matrix:
                    [[ 64   4]
                     [  1 119]]
                    

                    ******************* Adaboost ********************
                    Fitting 5 folds for each of 90 candidates, totalling 450 fits
                    

                    [Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
                    [Parallel(n_jobs=-1)]: Done 280 tasks      | elapsed:    1.4s
                    [Parallel(n_jobs=-1)]: Done 450 out of 450 | elapsed:    2.0s finished
                    

                    最优参数: {'algorithm': 'SAMME', 'base_estimator__ccp_alpha': 0.0, 'base_estimator__class_weight': None, 'base_estimator__criterion': 'gini', 'base_estimator__max_depth': None, 'base_estimator__max_features': None, 'base_estimator__max_leaf_nodes': None, 'base_estimator__min_impurity_decrease': 0.0, 'base_estimator__min_impurity_split': None, 'base_estimator__min_samples_leaf': 1, 'base_estimator__min_samples_split': 2, 'base_estimator__min_weight_fraction_leaf': 0.0, 'base_estimator__presort': 'deprecated', 'base_estimator__random_state': None, 'base_estimator__splitter': 'best', 'base_estimator': DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
                                           max_depth=None, max_features=None, max_leaf_nodes=None,
                                           min_impurity_decrease=0.0, min_impurity_split=None,
                                           min_samples_leaf=1, min_samples_split=2,
                                           min_weight_fraction_leaf=0.0, presort='deprecated',
                                           random_state=None, splitter='best'), 'learning_rate': 0.7000000000000001, 'n_estimators': 3, 'random_state': None}
                    training took 2.143543s!
                    AdaBoostClassifier(algorithm='SAMME',
                                       base_estimator=DecisionTreeClassifier(ccp_alpha=0.0,
                              class_weight=None,
                              criterion='gini',
                              max_depth=None,
                              max_features=None,
                              max_leaf_nodes=None,
                              min_impurity_decrease=0.0,
                              min_impurity_split=None,
                              min_samples_leaf=1,
                              min_samples_split=2,
                              min_weight_fraction_leaf=0.0,
                              presort='deprecated',
                              random_state=None,
                              splitter='best'),
                                       learning_rate=0.7000000000000001, n_estimators=3,
                                       random_state=None)
                    precision: 95.04%, recall: 95.83%
                    accuracy: 94.15%
                    classification_report:
                                  precision    recall  f1-score   support
                               0       0.93      0.91      0.92        68
                               1       0.95      0.96      0.95       120
                        accuracy                           0.94       188
                       macro avg       0.94      0.94      0.94       188
                    weighted avg       0.94      0.94      0.94       188
                    

                    confusion_matrix:
                    [[ 62   6]
                     [  5 115]]
                    

                    经过对比发现,通过网格寻优对参数进行调参后,模型的准确率有所上升。

                    最后

                    项目获取:

                    https://gitee.com/sinonfin/algorithm-sharing