MODEL VALIDATION IN R - K-Fold Cross Validation

 

K-Fold Cross

Description of Model Validation

Model validation is the process of evaluating a trained model on test data set. This provides the generalization ability of a trained model. 

There are many ways to get the training and test data sets for model validation like:

  • 3-way holdout method of getting training, validation and test data sets.

  • k-fold cross-validation with independent test data set.

  • Leave-one-out cross-validation with independent test data set.


K-Fold Cross Validation : 
The procedure has a single parameter called k that refers to the number of groups that a given data sample is to be split into. As such, the procedure is often called k-fold cross-validation. When a specific value for k is chosen, it may be used in place of k in the reference to the model, such as k=10 becoming 10-fold cross-validation.

Role / Importance

In machine learning, model validation is referred to as the process where a trained model is evaluated with a testing data set. The testing data set is a separate portion of the same data set from which the training set is derived. The main purpose of using the testing data set is to test the generalization ability of a trained model.

Model validation is carried out after model training. Together with model training, model validation aims to find an optimal model with the best performance.


PROBLEM - Iris Data Set

K-Fold Cross Validation

Source Code

#k-fold Cross Validation

library(caret)

# load the iris dataset

data(iris)

# define training control

train_control <- trainControl(method="cv", number=10)

# fix the parameters of the algorithm

grid <- expand.grid(.fL=c(0), .usekernel=c(FALSE))

# train the model

model <- train(Species~., data=iris, trControl=train_control, method="nb", tuneGrid=grid)

# summarize results

print(model)

Output


PROBLEM - Diabetes Data Set

K-Fold Cross Validation

Source Code

library(caret)

diabet<-read.csv('C:/Semester 6/Data Science/diabetes.csv')

diabet$Outcome<-as.factor(diabet$Outcome)

# define training control

train_control <- trainControl(method="cv", number=10)

# fix the parameters of the algorithm

grid <- expand.grid(.fL=c(0), .usekernel=c(FALSE))

# train the model

model <- train(diabet$Outcome~diabet$BMI, data=diabet, trControl=train_control, method="nb", tuneGrid=grid)

# summarize results

print(model)

Output

Comments