CLASSIFICATION IN R - DECISION TREE

on October 01, 2020

CLASSIFICATION

Description of classification decision tree

Decision tree is the most powerful and popular tool for classification and prediction. A Decision tree is a flowchart like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label.

Role / Importance

A Decision Tree is a simple representation for classifying examples. It is a Supervised Machine Learning where the data is continuously split according to a certain parameter.

Decision Tree consists of:

Nodes: Test for the value of a certain attribute.
Edges/ Branch: Correspond to the outcome of a test and connect to the next node or leaf.
Leaf nodes: Terminal nodes that predict the outcome (represent class labels or class distribution).

Select csv file to test

summary (Titanic): gives the summary of the dataset

names(titanic): Gives names of the headers of each column

install.packages("partykit")

library(partykit)

Converting to categorical

titanic$Survived<-as.factor(titanic$Survived)

summary(titanic$Survived)

names(titanic)

set.seed(1234):Selects the random number

#two samples with distribution 0.8 and 0.2 and create 2 partition

black=no and white=yes

plot tree and the probability is sorted in tree, Black section means no and white means yes

Predicting the probability of the validation set3

Predicting the tree according to validation set

creating matrix for people who survived from validation set

Comments