数据处理的统计学习(scikit-learn教程)
|
练习: from sklearn import datasets,neighbors,linear_model digits = datasets.load_digits() X_digits = digits.data y_digits = digits.target 【完整代码】 from sklearn import datasets,linear_model
digits = datasets.load_digits()
X_digits = digits.data
y_digits = digits.target
n_samples = len(X_digits)
X_train = X_digits[:.9 * n_samples]
y_train = y_digits[:.9 * n_samples]
X_test = X_digits[.9 * n_samples:]
y_test = y_digits[.9 * n_samples:]
knn = neighbors.KNeighborsClassifier()
logistic = linear_model.LogisticRegression()print('KNN score: %f' % knn.fit(X_train,y_train).score(X_test,y_test))print('LogisticRegression score: %f'
? ? ?% logistic.fit(X_train,y_test))
(3)支持向量机(SVMs)线性SVNs:
样例:Plot different SVM分类器 iris数据集 SVMs能够被用于回归——SVR(支持向量回归)—用于分类——SVC(支持向量分类) 使用核函数:
svc = svm.SVC(kernel='rbf') 交互式样例: 练习: 尝试使用SVMs根据iris数据集前两个特征将其分成两类。留出每一类的10%作为测试样例。 iris = datasets.load_iris() X = iris.data y = iris.target X = X[y != 0,:2] y = y[y != 0] 完整代码: """================================SVM Exercise================================A tutorial exercise for using different SVM kernels.This exercise is used in the :ref:`using_kernels_tut` part of the:ref:`supervised_learning_tut` section of the :ref:`stat_learn_tut_index`."""print(__doc__)import numpy as npimport matplotlib.pyplot as pltfrom sklearn import datasets,svm
iris = datasets.load_iris()
X = iris.data
y = iris.target
X = X[y != 0,:2]
y = y[y != 0]
n_sample = len(X)
np.random.seed(0)
order = np.random.permutation(n_sample)
X = X[order]
y = y[order].astype(np.float)
X_train = X[:.9 * n_sample]
y_train = y[:.9 * n_sample]
X_test = X[.9 * n_sample:]
y_test = y[.9 * n_sample:]# fit the modelfor fig_num,kernel in enumerate(('linear','rbf','poly')):
? ?clf = svm.SVC(kernel=kernel,gamma=10)
? ?clf.fit(X_train,y_train)
? ?plt.figure(fig_num)
? ?plt.clf()
? ?plt.scatter(X[:,0],X[:,1],c=y,zorder=10,cmap=plt.cm.Paired) ? ?# Circle out the test data
? ?plt.scatter(X_test[:,X_test[:,s=80,facecolors='none',zorder=10)
? ?plt.axis('tight')
? ?x_min = X[:,0].min()
? ?x_max = X[:,0].max()
? ?y_min = X[:,1].min()
? ?y_max = X[:,1].max()
? ?XX,YY = np.mgrid[x_min:x_max:200j,y_min:y_max:200j]
? ?Z = clf.decision_function(np.c_[XX.ravel(),YY.ravel()]) ? ?# Put the result into a color plot
? ?Z = Z.reshape(XX.shape)
? ?plt.pcolormesh(XX,YY,Z > 0,cmap=plt.cm.Paired)
? ?plt.contour(XX,Z,colors=['k','k','k'],linestyles=['--','-','--'],? ? ? ? ? ? ? ?levels=[-.5,0,.5])
? ?plt.title(kernel)
plt.show()
三、模型选择:选择模型和他们的参数(1)分数,和交叉验证分数众所周知,每一个模型会得出一个score方法用于裁决模型在新的数据上拟合的质量。其值越大越好。 from sklearn import datasets,svm digits = datasets.load_digits() X_digits = digits.data y_digits = digits.target svc = svm.SVC(C=1,kernel='linear') svc.fit(X_digits[:-100],y_digits[:-100]).score(X_digits[-100:],y_digits[-100:]) 为了获得一个更好的预测精确度度量,我们可以把我们使用的数据折叠交错地分成训练集和测试集: import numpy as np X_folds = np.array_split(X_digits,3) y_folds = np.array_split(y_digits,3) scores = list()for k in range(3): ? ?# We use 'list' to copy,in order to 'pop' later on ? ?X_train = list(X_folds) ? ?X_test ?= X_train.pop(k) ? ?X_train = np.concatenate(X_train) ? ?y_train = list(y_folds) ? ?y_test ?= y_train.pop(k) ? ?y_train = np.concatenate(y_train) ? ?scores.append(svc.fit(X_train,y_test))print(scores) (编辑:晋中站长网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |

