CLASSIFICATION IN R - DECISION TREE

    

 CLASSIFICATION



Description of classification decision tree

Decision tree is the most powerful and popular tool for classification and prediction. A Decision tree is a flowchart like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label.

Role / Importance

A Decision Tree is a simple representation for classifying examples. It is a Supervised Machine Learning where the data is continuously split according to a certain parameter.

Decision Tree consists of:

  • Nodes: Test for the value of a certain attribute.

  • Edges/ Branch: Correspond to the outcome of a test and connect to the next node or leaf.

  • Leaf nodes: Terminal nodes that predict the outcome (represent class labels or class distribution).




Select csv file to test








summary (Titanic): gives the summary of the dataset


names(titanic): Gives names of the headers of each column


install.packages("partykit")


library(partykit)


Converting to categorical

titanic$Survived<-as.factor(titanic$Survived)



summary(titanic$Survived)

names(titanic)


set.seed(1234):Selects the random number

#two samples with distribution 0.8 and 0.2 and create 2 partition

black=no and white=yes




plot tree and the probability is sorted in tree, Black section means no and white means yes





Predicting the probability of the validation set3



Predicting the tree according to validation set




creating matrix for people who survived from validation set




















Comments