


from sklearn.datasets import load_digits

1.1 数据集规格 

  • 1797个样本,每个样本包括8*8像素的图像和一个[0, 9]整数的标签
  • 数据集data中,每一个样本均有64个数据位float64型。
  • 关于手写数字识别问题:通过训练一个8x8 的手写数字图片中每个像素点不同的灰度值,来判定数字,是一个分类问题.


     """Load and return the digits dataset (classification).
        Each datapoint is a 8x8 image of a digit.
        =================   ==============
        Classes                         10
        Samples per class             ~180
        Samples total                 1797
        Dimensionality                  64
        Features             integers 0-16
        =================   ==============
        This is a copy of the test set of the UCI ML hand-written digits datasets
        Read more in the :ref:`User Guide `.
        n_class : int, default=10
            The number of classes to return. Between 0 and 10.
        return_X_y : bool, default=False
            If True, returns ``(data, target)`` instead of a Bunch object.
            See below for more information about the `data` and `target` object.
            .. versionadded:: 0.18
        as_frame : bool, default=False
            If True, the data is a pandas DataFrame including columns with
            appropriate dtypes (numeric). The target is
            a pandas DataFrame or Series depending on the number of target columns.
            If `return_X_y` is True, then (`data`, `target`) will be pandas
            DataFrames or Series as described below.
            .. versionadded:: 0.23
        data : :class:`~sklearn.utils.Bunch`
            Dictionary-like object, with the following attributes.
            data : {ndarray, dataframe} of shape (1797, 64)
                The flattened data matrix. If `as_frame=True`, `data` will be
                a pandas DataFrame.
            target: {ndarray, Series} of shape (1797,)
                The classification target. If `as_frame=True`, `target` will be
                a pandas Series.
            feature_names: list
                The names of the dataset columns.
            target_names: list
                The names of target classes.
                .. versionadded:: 0.20
            frame: DataFrame of shape (1797, 65)
                Only present when `as_frame=True`. DataFrame with `data` and
                .. versionadded:: 0.23
            images: {ndarray} of shape (1797, 8, 8)
                The raw image data.
            DESCR: str
                The full description of the dataset.
        (data, target) : tuple if ``return_X_y`` is True
            A tuple of two ndarrays by default. The first contains a 2D ndarray of
            shape (1797, 64) with each row representing one sample and each column
            representing the features. The second ndarray of shape (1797) contains
            the target samples.  If `as_frame=True`, both arrays are pandas objects,
            i.e. `X` a dataframe and `y` a series.
            .. versionadded:: 0.18
        To load the data and visualize the images::
            >>> from sklearn.datasets import load_digits
            >>> digits = load_digits()
            >>> print(digits.data.shape)
            (1797, 64)
            >>> import matplotlib.pyplot as plt
            >>> plt.gray()
            >>> plt.matshow(digits.images[0]) <...> >>> plt.show()


     1.2 加载数据

    # 获取数据集数据和标签
    datas = load_digits()
    X_data = datas.data
    y_data = datas.target

     1.3 展示数据集中前十个数据


    from matplotlib import pyplot as plt
    #  展示前十个数据的图像
    fig, ax = plt.subplots(
        sharey=True, )
    ax = ax.flatten()
    for i in range(10):
        ax[i].imshow(datas.data[i].reshape((8, 8)), cmap='Greys', interpolation='nearest')



    2.1 划分数据集

    # 划分数据集
    X_train, X_test, y_train, y_test = train_test_split(X_data, y_data, test_size=0.3)


    3.1 逻辑回归

    3.1.1 LogisticRegression()主要参数

            penalty:指定正则化的参数可选为 "l1", “l2” 默认为 “l2”. 注意: l1 正则化会将部分参 数压缩到 0 ,而 l2 正则化不会让参数取到 0 只会无线接近         C:大于 0 的浮点数。 C 越小对损失函数的惩罚越重         multi_class:告知模型要处理的分类问题是二分类还是多分类。 默认为 “ovr” (二分类) “multinational”: 表示处理多分类问题,在solver="liblinear" 时不可用 “auto” : 表示让模型自动判断分类类型         solver:指定求解方式

    3.2 建立逻辑回归模型

    # 建立逻辑回归模型
    model = LogisticRegression(max_iter=10000, random_state=42, multi_class='multinomial')
    # 训练模型
    model.fit(X_train, y_train)


    4.1 十折交叉验证


    scores = cross_val_score(model, X_train, y_train, cv=10)  # 十折交叉验证
    k = 0
    for i in scores:
        k += i
    print("十折交叉验证平均值:", k / 10)


    4.2 错误率

    y_pred = model.predict(X_test)
    error_rate = model.score(X_test, y_test)



