-
Notifications
You must be signed in to change notification settings - Fork 25
/
Copy pathRadialKernelSVMnotebook.Rmd
126 lines (74 loc) · 5.15 KB
/
RadialKernelSVMnotebook.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
---
title: "Radial Kernel SVM"
output:
html_document: default
html_notebook: default
---
###Radial Kernel Support Vector Machine
This article will be all about how to separate non linear data using a *__non-linear decision boundary__ *, which cannot be simply separated by a linear separator.
It is often encountered that Linear Separators and boundaries fail because of the non linear interactions in the data and the non linear dependence between the features in feature space.
The trick here is that ,we will do __feature expansion__.
So how we solve this problem is via doing a non linear transformation on the features($X_i$) and converting them to a higher dimentional space called a feature space.Now by this transformation we are able to saperate non linear data using a non linear decision boundary.
Non linearities can simply be added by using higher dimention terms such as square and cubic polynomial terms .
$$y_i = \beta_0 + \beta_1X_{1} \ + \beta_2X_{1}^2 + \beta_3X_2 + \beta_4X_2^2 + \beta_5X_2^3 .... = 0 $$ is the equation of the non linear hyperplane which is generated if we use *higher degree polynomials* terms to fit to data to get a non linear decision boundary.
What we are actually doing is that we are fitting a SVM is an enlarged space.We enlarge the space of features by doing non linear transformations.
But the problem with __Polynomials__ are that in higher dimentions i.e when having lots of predictors it gets wild and generally overfits at higher degrees of polynomials.
Hence there is another elegant way of adding non linearities in SVM is by the use of *__Kernel trick__*.
-----------------
####Kernel Function
Kernel function is a function of form--
$$K(x,y) = (1 + \sum_{j=1}^{p} x_{ij}. y_{ij})^d$$ where d = degree of polynomial.
Now the type of Kernel function we are going to use here is a __Radial kernel__.
The radial kernel is of form:
$$k(x,y) = \exp(- \ \gamma \ \sum_{j=1}^{p}(x_{ij} - y_{ij})^2) $$
Here $\gamma$ is a hyper parameter or a __tuning parameter__ which accounts for the smoothness of the decision boundary and controls the variance of the model.
If $\gamma$ is very large then we get quiet fluctuating and wiggly decision boundaries which accounts for high variance and overfitting.
If $\gamma$ is small , the decison line or boundary is smoother and has low variance.
--------------
### Implementation in R
```{r,message=FALSE,warning=FALSE}
require(e1071)
require(ElemStatLearn)#package containing the dataset
#Loading the data
attach(mixture.example) #is just a simulated mixture data with 200 rows and 2 classes
names(mixture.example)
```
The following data is also 2-D , so lets plot it.
```{r}
plot(x,col=y+3)
#converting data to a data frame
data<-data.frame(y=factor(y),x)
head(data)
```
Now let's fit a Radial kernel using *svm()* function.
```{r}
Radialsvm<-svm(factor(y) ~ .,data=data,kernel="radial",cost=5,scale=F)
Radialsvm
#number of support vectors are 110
#Confusion matrix to ckeck the accuracy
table(predicted=Radialsvm$fitted,actual=data$y)
#misclassification Rate
mean(Radialsvm$fitted!=data$y)*100 #17% wrong predictions
```
Now let's create a grid and make prediction on that grid values.
```{r}
xgrid=expand.grid(X1=px1,X2=px2) #generating grid points
ygrid=predict(Radialsvm,newdata = xgrid) #ygird consisting of predicted Response values
#lets plot the non linear decision boundary
plot(xgrid,col=as.numeric(ygrid),pch=20,cex=0.3)
points(x,col=y+1,pch=19) #we can see that the decision boundary is non linear
```
Now we can also improve the fit , by actually including the decision boundary using the contour() function.
```{r}
func = predict(Radialsvm,xgrid,decision.values = TRUE)
func=attributes(func)$decision #to pull out all the attributes and use decision attr
plot(xgrid,col=as.numeric(ygrid),pch=20,cex=0.3)
points(x,col=y+1,pch=19)
contour(px1,px2,matrix(func,69,99),level=0,add=TRUE,lwd=3) #adds the non linear decision boundary
contour(px1,px2,matrix(prob,69,99),level=0.5,add=T,col="blue",lwd=3)#this is the true decision boundary i.e Bayes decision boundary
legend("topright",c("True Decision Boundary","Fitted Decision Boundary"),lwd=3,col=c("blue","black"))
```
The above plot shows us the tradeoff between the __True Bayes decision boundary__ and the __Fitted decision boundary__ generated by the Radial kernel by learning from data.Both look quiet similar and seems that SVM has done a good functional approximation of the actual true function.
--------------
### Conclusion
Radial kernel support vector machine is a good approch when the data is not linearly separable.The idea behind generating non linear decision boundaries is that we need to do some non linear transformations on the features $X_i$ which transforms them to a higher dimention space.We do this non linear transformation using the *__Kernel trick__*.Now there are 2 hyperparameters in the SVM i.e the regularization parameter __'c'__ and $\gamma$.We can implement cross validation to find the best values of both these tuning parameters which affect our classifier's $C(X)$ perfomance.Another way of finding the best value for these hyperparameters are by using certain optimization techniques such as *__Bayesian Optimization__*.