CLASSIFICATION
Description of classification decision tree
Decision tree is the most powerful and popular tool for classification and prediction. A Decision tree is a flowchart like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label.
Role / Importance
A Decision Tree is a simple representation for classifying examples. It is a Supervised Machine Learning where the data is continuously split according to a certain parameter.
Decision Tree consists of:
Nodes: Test for the value of a certain attribute.
Edges/ Branch: Correspond to the outcome of a test and connect to the next node or leaf.
Leaf nodes: Terminal nodes that predict the outcome (represent class labels or class distribution).
Source Code
install.packages("partykit")
library(partykit)
install.packages("caret")
library(caret)
install.packages("pROC")
library(pROC)
install.packages("rattle")
library(rattle)
install.packages("rpart.plot")
library(rpart.plot)
install.packages("RColorBrewer")
library(RColorBrewer)
bird <- read.csv('C:/Neha/bird.csv')
summary(bird)
names(bird)
install.packages("partykit")
library(partykit)
bird$type<-as.factor(bird$type)#convert to categorical
summary(bird$type)
names(bird)
set.seed(1234)
pd<-sample(2,nrow(bird),replace = TRUE, prob=c(0.8,0.2))#two samples with distribution 0.8 and 0.2
trainingset<-bird[pd==1,]#first partition
validationset<-bird[pd==2,]#second partition
tree<-ctree(formula = type ~ huml + ulnal + feml + tibl + tarl + humw + ulnaw + femw + tibw + tarw , data=trainingset)
class(bird$type)
plot(tree)
pred<- predict(tree,validationset,type="prob")
pred
pred<-predict(tree,validationset)
pred
library(caret)
install.packages('e1071', dependencies=TRUE)
confusionMatrix(as.factor(pred),as.factor(validationset$type))
Output
The R package "party" is used to create decision trees.
set.seed(1234):Selects the random number
#two samples with distribution 0.8 and 0.2 and create 2 partition
black=no and white=yes
plot tree and the probability is sorted in tree, Black section means no and white means
Predicting the probability of the validation set
Predicting the tree according to validation set
A confusion matrix is a summary of prediction results on a classification problem.
The number of correct and incorrect predictions are summarized with count values and broken down by each class.
The confusion matrix shows the ways in which your classification model
is confused when it makes predictions.
Data - a factor of predicted classes (for the default method) or an object of class table.
Reference - a factor of classes to be used as the true results
Positive - an optional character string for the factor level that corresponds to a "positive" result (if that makes sense for your data).
Dnn - a character vector of dimnames for the table
Mode - a single character string either "sens_spec", "prec_recall", or "everything"
Comments
Post a Comment