Cluster analysis using excel and excel miner
Cluster analysis using excel and excel miner
Cluster analysis can be viewed as an optimization problem.
Excel includes an optimization tool called Solver. However, using Solver for the cluster analysis is only practical for datasets that are relatively small.
The Evolutionary Solver is a Metaheuristic, which means that it cannot guarantee that when it stops, it has found the optimal solution. Instead, it reports the best solution that it could find.
For example, in order to partition a dataset of 300 survey objects (a relatively small sized dataset) into 3 non-empty subsets, the number of choices is astronomical. Even on simplifying the problem by reducing the number of decisions to 3, the number of choices is over 4.4 million.
Thus, the optimization approach has some severe limitations for large datasets. This is why approximation methods such as hierarchical clustering and K-means have become the most common procedures for cluster analysis. XLMiner includes both of these methods.
The most common question about clustering is how many clusters to use in a cluster analysis. There is no theory about how to find the right number of clusters. In some settings, it might not be completely clear what the right number of clusters means.
The two ways to decide this, however, are:
Cluster analysis can be viewed as an optimization problem.
Excel includes an optimization tool called Solver. However, using Solver for the cluster analysis is only practical for datasets that are relatively small.
The Evolutionary Solver is a Metaheuristic, which means that it cannot guarantee that when it stops, it has found the optimal solution. Instead, it reports the best solution that it could find.
For example, in order to partition a dataset of 300 survey objects (a relatively small sized dataset) into 3 non-empty subsets, the number of choices is astronomical. Even on simplifying the problem by reducing the number of decisions to 3, the number of choices is over 4.4 million.
Thus, the optimization approach has some severe limitations for large datasets. This is why approximation methods such as hierarchical clustering and K-means have become the most common procedures for cluster analysis. XLMiner includes both of these methods.
The most common question about clustering is how many clusters to use in a cluster analysis. There is no theory about how to find the right number of clusters. In some settings, it might not be completely clear what the right number of clusters means.
The two ways to decide this, however, are:
- Put every observation in its own cluster. Here, we have no predictive power, because we will not have a cluster where to put a new observation.
- Put all observations in a single cluster. This option results in a trivial amount that tells us nothing about new observations.