Abstract powerful and popular tool. This algorithm represents some

Abstract
            Data mining is a progression of finding
needed information from the large amount of data which is retrieves different
databases. Classification is one of the techniques in data mining which is
a function that assigns items in a collection to target categories or
classes. The goal of classification is to accurately predict the
target class for each case in the data. Decision tree (DT) algorithm is
based on classification technique which is powerful and popular tool. This
algorithm represents some rules, which can be understood by humans and used
in knowledge system such as database. DT is represented by hierarchical tree
structure. It is used in many different disciplines including diagnosis,
cognitive science, artificial intelligence, game theory, engineering and data
mining. This paper is deliberate to study about Decision Tree learning
Algorithms.
Keywords : Hierarchical tree,
Decision Tree algorithm, Classification

 

I.INTRODUCTION

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

            Data mining is a type of categorization
technique which is actually used to take out unknown patterns from bulky
databases. Data mining uses two types of approaches i.e supervised learning or
unsupervised learning. Classification is based on supervised learning approach.
Classification methods aspire to recognize the classes that belongs objects
from some expressive qualities. It finds convenience in a wide range of human
activities and predominantly in automated decision making.

II.DECISION TREE

            Decision trees are a simple, but powerful
form of multiple variable analysis. DT is a flow chart like tree structure. DT
is especially attractive in a data mining environment. DT can handle high
dimensional data and having ability to break down a complex decision making
process in to collection of simpler decision, which is providing solution to
easy interpret.

 

The reasons are:

1.      Intuitive
representation, the resulting model is easy to assimilate by humans

2.      DT
does not require any parameter setting from the user and suited for exploratory
knowledge discovery.

3.      DT
can be constructed relatively fast and the accuracy.

            DT easily derives the rules
corresponding to the tree by traversing each leaf of the tree starting from the
node. It denoted that many different leaves of the tree refer to the same class
labels, but each leaf refers to a different rule. Each node denotes as
following:-

Ø 
Internal node   :
A test on an attribute.

Ø 
Branch node    :
Outcome of the test.

Ø 
Leaf node        :
Class distributions.

Ø 
Arc/edge         :
Split of one attribute.

Ø 
Path                 :
A disjunction of test to make the final decision.

                                                                                                Root
node

 

                  Internal node            Warm                   Cold

Non-mammals

 

 

                                                                                                           
Leaf node

Non-mammals
 

 

Mammals
 

 

                                  Yes                           No                                                                                                

 

Fig 1: Decision Tree

DT Features are:-

Ø 
Produce
results that communicate very well in symbolic and visual terms and the ability
to incorporate multiple predictors in a simple step-by-step fashion.

Ø 
Incorporate
various levels of measurement, including qualitative (e.g., good-bad) and
quantitative measures.

Ø 
Adapt
to various twists and turns in data – unbalanced effects, nested effects,
offsetting effects, interactions and nonlinearities – that frequently defeat
other one-way and multi-way statistical and numeric approaches.

Ø 
Nonparametric
and highly robust and produce similar effects regardless of the level of
measurement of the fields that are used to construct decision tree branches.

Decision Tree Parameters:

1.     
Minimal
size for split: The
size of a node is the number of Examples in its subset. Only those nodes are
split whose size is greater than or equal to the minimal size for split parameter. This
range is in -1 to +? and type must be an integer. Default size is 4.

2.     
Minimal
Leaf Size: The size
of a leaf is the number of Examples in its subset. The tree is generated in
such a way that every leaf has at least the minimal leaf size number of Examples. This
range is in -1 to +? and type must be an integer. Default size is 2.

3.     
Minimal
Gain: The gain of a
node is calculated before splitting it. The node is split if its gain is
greater than the minimal gain. A higher value of minimal gain results in fewer splits and thus a smaller
tree. A value that is too high will completely prevent splitting and a tree
with a single node is generated. This range is in 0.0 to +? and type
must be real. Default size is 0.1

4.     
Maximal
depth of Tree: The
depth of a tree varies depending upon the size and characteristics of the Example
Set. This parameter is used to restrict the depth of the decision tree. If its
value is set to ‘-1’, the maximal depth
parameter puts no bound on the depth of the tree. In this case the tree is
built until other stopping criteria are met. If its value is set to ‘1’, a tree
with a single node is generated. Default size is 20.

5.     
Decision
Criterion:

Selects the criterion on which
Attributes will be selected for splitting. It can have one of the following
values:
Ø 
Information_gain: The entropies of all
the Attributes are calculated and the one with least entropy is selected for
split. This method has a bias towards selecting Attributes with a large
number of values.
Ø  Gain_ratio:
A variant of information gain that adjusts the information gain for each
Attribute to allow the breadth and uniformity of the Attribute values.
Ø  Gini_index:
A measure of inequality between the distributions of label characteristics.
Splitting on a chosen Attribute results in a reduction in the average gini index
of the resulting subsets.
Ø  Accuracy:
An Attribute is selected for splitting, which maximizes the accuracy of the
whole tree.

 

The construction of the decision tree
involves the following three main phases:-

Construction Phase: The initial
decision tree is constructed in this phased, based on the entire training
data-set. It requires recursively partitioning the training set into two or
more, sub-partition using a splitting criterion, until a stopping criteria is
met.

Pruning Phase: The tree constructed in
the previous phase may not result in the best possible set of rules due to over-fitting.
The pruning phase removes some of the lower branches and nodes to improve its
performance.

Processing the Pruned tree: To improve
understandability.

III.DECISION TREE ALGORITHM

A decision tree
(DT) model is a computational model consisting of three parts:

1) A decision tree
is defined.

2) An algorithm to
create the tree.

3) An algorithm
that applies the tree to data and solves the problem under consideration.

Algorithm:

Input: D //Decision
Tree

Output: O// Model
Prediction

DT Proc algorithm:

Simplest algorithm
to illustrate prediction technique using DT.

For each t € D do

n = root node of D;

While n not leaf
node do

Obtain answer to
question on n applied to t;

Indentify arc from
t, which contains correct answer;

n = node at end of
this arc;

Make prediction
for t based on label of n;

Types of Decision tree algorithms:-

Ø 
Hunt’s

Ø 
CART

Ø 
SPRINT

Ø 
J48

Ø 
C4.5

Ø 
Random Forest

IV. ADVANTAGES AND DISADVANTAGES OF DECISION
TREE

S.No

Advantages

Disadvantages

1

Self
explanatory and easy in terms of readability and who is not familiar with
data mining before can interpret a decision tree if it is of small size.

Big
majority of the inducers require the target value of the data instances to be
discrete valued only.
 

2

Both categorical and numerical or continuous
attributes can be handled.

Cannot
represent the same subtrees as one so it will replicate the trees since every
path is mutually exclusive

3

The
data may contain missing attribute values and decision trees will still
handle it.
 

Decision
trees are greedy, they sometimes become overly sensitive and tend to choose a
split due to a noise or an irrelevant attribute which provides a wrong split
and affect the accuracy poorly.

4

The
data might have error and decision trees will still handle it and they are
robust to noise.
 

Not
sufficient to make a statistically significant decision for the class representation
of the nodes which is also known as the data fragmentation problem.

5

Constructing
models that require large sized training datasets to be easier,
computationally inexpensive (in comparison with other machine learning
algorithms) and faster

Handling
the missing data require a lot of computation which is a drawback by means of
computational time

                                                Table 1: Advantages and Disadvantages

 

V. EVALUATION PERFORMANCE OF DECISION TREE

CONFUSION MATRIX

            A confusion matrix is a table that
is often used to describe the performance of a decision tree model (or
“classifier”) on a set of test data for which the true values are
known. The right
diagonal elements TP (true positive) and TN (true negative) correctly classify
Instances as well as FP (false positive) and FN (false negative) incorrectly
classify Instances.  

 

Actual  Vs Predicted

Predicted

Survived

Not Survived

Actual

Survived

True
Positive
(TP)

False
Negative
(FN)

Not Survived

False
Positive
(FP)

True
Negative
(TN)

                Table 2: Actual Vs
Predicted Confusion Matrix

Total number of
instances = Correctly classified instance + Incorrectly classified instance

Correctly
classified instance = TP + TN

Incorrectly
classified instance = FP + FN

PRECISION

            Precision is the ratio of modules
correctly classified to the number of entire modules classified fault-prone. It
is proportion of units correctly predicted as faulty.

Precision=TP/
TP+ FP

RECALL

            Recall is the ratio of modules
correctly classified as fault-prone to the number of entire faulty modules.

Recall=TP/
TP+ FN

COST MATRIX

            A cost matrix is similar to
confusion matrix but minor difference is with finding the value of cost
accuracy through misclassification error rate.

Misclassification
error rate = 1 – Accuracy

F-MEASURE

            FM is a combination of recall and
precision. It is also defined as harmonic mean of precision and recall.

F-Measure=2*Recall*Precision/
(Recall+Precision)

ACCURACY

            It is defined as the ratio of
correctly classified instances to total number of instances.

Accuracy=TP+TN/
(TP+FP+TN+FN)

CALCULATE VALUE TPR, TNR, FPR and FNR

            For good classifiers, TPR and TNR
both should be nearer to 100%. Similar is the case with precision and accuracy
parameters. On the contrary, FPR and FNR both should be as close to 0% as
possible. Calculate the value of true positive rate, true negative rate, false
positive rate and false negative rate by methods shown below:-

                                                              TPR=TP/ (TP+FN)

TNR=TN/
(FP+TN)

FPR=FN/
(TP+FN)

FNR=FP/
(FP+TN)

 

VI. CONCLUSION

            This paper presented about decision
tree which is highly efficient tools of machine learning and data mining,
capable to produce accurate and easy-to-understand models. This paper provided
some fundamental characteristics about decision tree. According to this paper, performance
evaluation is strongly based on decision criterion such as information gain,
gini-index and gain-ratio. DT can handle large size of training datasets and divide
it to smaller form of data. Finally it is concluded, DT is a readability tree
form structure who is not familiar with classification techniques before can
easily understand about the results of this tree. In future, we will implement
in DT with dataset and will check the performance evaluation.