1，SVM 的实现

SVM 算法即可以处理分类问题，也可以处理回归问题。

sklearn 库的 svm 包中实现了下面四种 SVM 算法：

• LinearSVC：用于处理线性分类问题。
• SVC：用于处理非线性分类问题。
• LinearSVR：用于处理线性回归问题。
• SVR：用于处理非线性回归问题。

LinearSVC/R 中默认使用了线性核函数来出处理线性问题。

SVC/R 可以让我们选择是使用线性核函数还是高维核函数，当选择线性核函数时，就可以处理线性问题；当选择高维核函数时，就可以处理非线性问题。

SVC(C=1.0, kernel='rbf',
degree=3, gamma='scale',
coef0=0.0, shrinking=True,
probability=False,
tol=0.001, cache_size=200,
class_weight=None,
verbose=False, max_iter=- 1,
decision_function_shape='ovr',
break_ties=False,
random_state=None)


• kernel：代表核函数，它有四种选择：
• linear：线性核函数，用于处理线性问题。
• poly：多项式核函数，用于处理非线性问题。
• rbf：高斯核函数，为默认值。用于处理非线性问题，性能比 poly 更好。
• sigmoid：sigmoid 核函数，用于实现多层神经网络 SVM，处理非线性问题。
• C：代表目标函数的惩罚系数，惩罚系数指的是分错样本时的惩罚程度，默认为 1.0。
• 当 C 越大的时候，分类器的准确性越高，但容错率会越低，泛化能力会变差。
• 当 C 越小的时候，泛化能力越强，但是准确性会降低。
• gamma：代表核函数的系数，默认为样本特征数的倒数。

LinearSVC(penalty='l2',
loss='squared_hinge',
dual=True, tol=0.0001,
C=1.0, multi_class='ovr',
fit_intercept=True,
intercept_scaling=1,
class_weight=None,
verbose=0,
random_state=None,
max_iter=1000)


2，准备数据集

sklearn 库中自带了一份乳腺癌数据集，下面就使用该数据来构造 SVM 分类器。

16.13,20.68,108.1,798.8,0.117,0.2022,0.1722,0.1028,0.2164,0.07356,0.5692,1.073,3.854,54.18,0.007026,0.02501,0.03188,0.01297,0.01689,0.004142,20.96,31.48,136.8,1315,0.1789,0.4233,0.4784,0.2073,0.3706,0.1142,0
19.81,22.15,130,1260,0.09831,0.1027,0.1479,0.09498,0.1582,0.05395,0.7582,1.017,5.865,112.4,0.006494,0.01893,0.03391,0.01521,0.01356,0.001997,27.32,30.88,186.8,2398,0.1512,0.315,0.5372,0.2388,0.2768,0.07615,0
13.54,14.36,87.46,566.3,0.09779,0.08129,0.06664,0.04781,0.1885,0.05766,0.2699,0.7886,2.058,23.56,0.008462,0.0146,0.02387,0.01315,0.0198,0.0023,15.11,19.26,99.7,711.2,0.144,0.1773,0.239,0.1288,0.2977,0.07259,1


1 半径平均值 11 半径标准差 21 半径最大值
2 文理平均值 12 文理标准差 22 文理最大值
3 周长平均值 13 周长标准差 23 周长最大值
4 面积平均值 14 面积标准差 24 面积最大值
5 平滑程度平均值 15 平滑程度标准差 25 平滑程度最大值
6 紧密度平均值 16 紧密度标准差 26 紧密度最大值
7 凹度平均值 17 凹度标准差 27 凹度最大值
8 凹缝平均值 18 凹缝标准差 28 凹缝最大值
9 对称性平均值 19 对称性标准差 29 对称性最大值
10 分形维数平均值 20 分形维数标准差 30 分形维数最大值

from sklearn.datasets import load_breast_cancer


feature_names 属性存储了每列数据的含义：

>>> print data.feature_names
['mean radius' 'mean texture' 'mean perimeter' 'mean area'
'mean smoothness' 'mean compactness' 'mean concavity'
'mean concave points' 'mean symmetry' 'mean fractal dimension'
'radius error' 'texture error' 'perimeter error' 'area error'
'smoothness error' 'compactness error' 'concavity error'
'concave points error' 'symmetry error' 'fractal dimension error'
'worst radius' 'worst texture' 'worst perimeter' 'worst area'
'worst smoothness' 'worst compactness' 'worst concavity'
'worst concave points' 'worst symmetry' 'worst fractal dimension']


data 属性存储了特征值：

>>> print data.data
[[1.799e+01 1.038e+01 1.228e+02 ... 2.654e-01 4.601e-01 1.189e-01]
[2.057e+01 1.777e+01 1.329e+02 ... 1.860e-01 2.750e-01 8.902e-02]
[1.969e+01 2.125e+01 1.300e+02 ... 2.430e-01 3.613e-01 8.758e-02]
...
[1.660e+01 2.808e+01 1.083e+02 ... 1.418e-01 2.218e-01 7.820e-02]
[2.060e+01 2.933e+01 1.401e+02 ... 2.650e-01 4.087e-01 1.240e-01]
[7.760e+00 2.454e+01 4.792e+01 ... 0.000e+00 2.871e-01 7.039e-02]]


target 属性存储了目标值：

>>> print data.target
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 0 0 1 0 0 1 1 1 1 0 1 0 0 1 1 1 1 0 1 0 0
1 0 1 0 0 1 1 1 0 0 1 0 0 0 1 1 1 0 1 1 0 0 1 1 1 0 0 1 1 1 1 0 1 1 0 1 1
1 1 1 1 1 1 0 0 0 1 0 0 1 1 1 0 0 1 0 1 0 0 1 0 0 1 1 0 1 1 0 1 1 1 1 0 1
1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 1 0 1 1 0 0 1 1 0 0 1 1 1 1 0 1 1 0 0 0 1 0
1 0 1 1 1 0 1 1 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 1 1 0 1 0 0 0 0 1 1 0 0 1 1
1 0 1 1 1 1 1 0 0 1 1 0 1 1 0 0 1 0 1 1 1 1 0 1 1 1 1 1 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 0 1 1 0 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1
1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 1 1 0 0 0 1 1
1 1 0 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0
0 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1
1 0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 0 1 1 1 1 1 0 1 1
0 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1
1 1 1 1 1 1 0 1 0 1 1 0 1 1 1 1 1 0 0 1 0 1 0 1 1 1 1 1 0 1 1 0 1 0 1 0 0
1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 0 0 0 0 0 0 1]


3，数据预处理

• 前 10 列是每个维度的平均值。
• 中 10 列是每个维度的标准差。
• 后 10 列是每个维度的最大值。

>>> features = data.data[:,0:10] # 特征集
>>> labels = data.target         # 目标集


from sklearn.model_selection import train_test_split

train_features, test_features, train_labels, test_labels =
train_test_split(features, labels, test_size=0.33, random_state=0)


from sklearn.preprocessing import StandardScaler

ss = StandardScaler()
train_features = ss.fit_transform(train_features)
test_features = ss.transform(test_features)


4，构造分类器

from sklearn.svm import SVC
svc = svm.SVC() # 均使用默认参数


svc.fit(train_features, train_labels)


prediction = svc.predict(test_features)


from sklearn.metrics import accuracy_score
score = accuracy_score(test_labels, prediction)
>>> print score
0.9414893617021277


5，总结

sklearn 中实现了 SVM 算法，这里我们展示了如何用它处理实际问题。

（本节完。）

SVM 支持向量机算法-原理篇