As we all know R Programming is expanding
its legs in Analytics, So why not to talk about few widely used Data Mining
algorithm in R.
While working with R, I found below
algorithm very useful for Data Mining, It's a personal choice tough. There are
plenty to tools also available for Mining Data and come with respected result
but as a Programmer its always great to design the algorithm the way you want.
Lets not waste any more time and go with few Data Mining Algorithm, which I
found best while working in one of Data Analytics projects.
1. Decision Tree
Decision tree builds classification or regression models in the
form of a tree structure. It breaks down a dataset into smaller and smaller
subsets while at the same time an associated decision tree is incrementally
developed. The final result is a tree with decision nodes and leaf
nodes.
The core algorithm for building decision
trees called ID3 by J. R. Quinlan which employs a top-down,
greedy search through the space of possible branches with no backtracking. ID3
uses Entropy and Information Gain to
construct a decision tree.
A decision tree is built top-down from a
root node and involves partitioning the data into subsets that contain
instances with similar homogenous values.
2. Forest Tree
Random Forests are a
combination of tree predictors where each tree depends on the values of a
random vector sampled independently with the same distribution for all trees in
the forest.
Single decision trees often have high variance or high bias. Random Forests attempts to mitigate the problems of high variance and high bias by averaging to find a natural balance between the two extremes.
Single decision trees often have high variance or high bias. Random Forests attempts to mitigate the problems of high variance and high bias by averaging to find a natural balance between the two extremes.
3. Association Rule Mining (Mostly
like Market Basket Analysis)
Association
rule learning is a method for discovering interesting
relations between variables in large databases. It is intended to identify
strong rules discovered in databases using some measures of
interestingness.
4. Regression Analysis – Linear
Regression (Remember the OHM's Law)
Linear regression attempts to model the
relationship between two variables by fitting a linear equation to observed
data. One variable is considered to be an explanatory variable, and the other
is considered to be a dependent variable.
Regression analysis
generates an equation to describe the statistical relationship between one or
more predictor variables and the response variable.
5. K means Cluster
Clustering is the process
of partitioning a group of data points into a small number of clusters. A
quantitative approach would be to measure certain features of the products. The
goal is to assign a cluster to each data point. K-means is a clustering method
that aims to find the positions
No comments:
Post a Comment