CLASSIFICATION IN R - DECISION TREE

    

 CLASSIFICATION



Description of classification decision tree

Decision tree is the most powerful and popular tool for classification and prediction. A Decision tree is a flowchart like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label.

Role / Importance

A Decision Tree is a simple representation for classifying examples. It is a Supervised Machine Learning where the data is continuously split according to a certain parameter.

Decision Tree consists of:

  • Nodes: Test for the value of a certain attribute.

  • Edges/ Branch: Correspond to the outcome of a test and connect to the next node or leaf.

  • Leaf nodes: Terminal nodes that predict the outcome (represent class labels or class distribution).




Source Code

install.packages("partykit")

library(partykit)

install.packages("caret") 

library(caret)

install.packages("pROC")

library(pROC)

install.packages("rattle") 

library(rattle)

install.packages("rpart.plot") 

library(rpart.plot)

install.packages("RColorBrewer")

library(RColorBrewer)


bird <- read.csv('C:/Neha/bird.csv')

summary(bird) 

names(bird) 


install.packages("partykit")

library(partykit) 

bird$type<-as.factor(bird$type)#convert to categorical 

summary(bird$type) 

names(bird) 


set.seed(1234) 

pd<-sample(2,nrow(bird),replace = TRUE, prob=c(0.8,0.2))#two samples with distribution 0.8 and 0.2 

trainingset<-bird[pd==1,]#first partition 

validationset<-bird[pd==2,]#second partition 


tree<-ctree(formula = type ~ huml + ulnal + feml + tibl + tarl + humw + ulnaw + femw + tibw + tarw  , data=trainingset) 

class(bird$type) 

plot(tree) 


pred<- predict(tree,validationset,type="prob")

pred 


pred<-predict(tree,validationset) 

pred 



library(caret)

install.packages('e1071', dependencies=TRUE)

confusionMatrix(as.factor(pred),as.factor(validationset$type))


Output



The R package "party" is used to create decision trees.



set.seed(1234):Selects the random number

#two samples with distribution 0.8 and 0.2 and create 2 partition

black=no and white=yes


plot tree and the probability is sorted in tree, Black section means no and white means


Predicting the probability of the validation set


Predicting the tree according to validation set



A confusion matrix is a summary of prediction results on a classification problem.

The number of correct and incorrect predictions are summarized with count values and broken down by each class.

The confusion matrix shows the ways in which your classification model

is confused when it makes predictions.

Data - a factor of predicted classes (for the default method) or an object of class table.

Reference - a factor of classes to be used as the true results

Positive - an optional character string for the factor level that corresponds to a "positive" result (if that makes sense for your data).

Dnn - a character vector of dimnames for the table

Mode - a single character string either "sens_spec", "prec_recall", or "everything"


Comments