JBDON
  • Home
  • Applied Analytics
    • Analytics for Decision Making >
      • What is Cluster Analysis
      • Data Reduction and Unsupervised Learning
      • Preparing Data and Measuring Dissimilarities
      • Hierarchical and k-Means Clustering
      • Defining Output Variables and Analyzing the Results
      • Using Historical Data to Model Uncertainty
      • Models with Correlated Uncertain Variables
      • Creating and Interpreting Charts
      • Using Average Values versus Simulation
      • Optimization and Decision Making
      • Formulating an Optimization Problem
      • Developing a Spreadsheet Model
      • Adding Optimization to a Spreadsheet Model
      • What-if Analysis and the Sensitivity Report
      • Evaluating Scenarios and Visualizing Results to Gain Practical Insights
      • Digital Marketing Application of Optimization
      • Advanced Models for Better Decisions
      • Business Problems with Yes/No Decisions
      • Formulation and Solution of Binary Optimization Problems
      • Metaheuristic Optimization
      • Chance Constraints and Value At Risk
      • Simulation Optimization
    • Analytics for Marketing >
      • Marketing Analytics and Customer Satisfaction
      • Customer Satisfaction
      • Measurements and Scaling Techniques – Introduction
      • Primary Scales of Measurement
      • Comparative Scaling
      • Non-Comparative Scaling
      • Experiment Design: Controlling for Experimental Errors
      • A/B Testing: Introduction
      • A/B Testing: Types of Tests
      • ANOVA – Introduction
      • Example -Inspect Spray and Tooth Growth
      • Logit Model - Binary Outome and Forecastign linear regression
      • Text Summarization
      • Social media Microscope
      • N-Gram - Frequcy Count and phase mining
      • LDA Topic Modeling
      • Machine-Learned Classification and Semantic Topic Tagging
    • Data Engine >
      • Understanding The Growth Of Data
      • Evaluating Methods Of Data Access
      • Communication journey
      • Data Journey
      • Planning for data visualisation
      • Visualisation Component
      • Content Connection and Chart Legitibility
    • Customer Insights >
      • Introduction
      • What is Descriptive Analytics?
      • Survey Overview
      • Net Promoter Score and Self-Reports
      • Survey Design
      • Passive Data Collection
      • Media Planning
      • Data Visualization
      • Causal Data Collection and Summary
      • Asking Predictive Questions
      • Regression Analysis
      • Data Set Predictions
      • Probability Models
      • Results and Predictions
      • Perspective Analytics (Maximize Revenue and Market Structure Competitions)
    • Analytics for Advance Marketing >
      • Visualisation and statistics (Political Advertising,Movie Theater and Data Assembly)
      • Excel Analysis of Motion Picture Industry Data
      • Displaying Conditional Distributions
      • Analyzing Qualitative Variables
      • Steps in Constructing Histograms
      • Common Descriptive Statistics for Quantitative Data
      • Regression-Based Modeling
      • Customer Analytics
      • Illustrating Customer Analytics in Excel
      • Customer Valuation Excel Demonstration
  • Soft Skills
    • Adaptability
    • Confidence
    • Change Management
    • Unlearning and Learning
    • Collaboration and Teamwork
    • Cultural Sensitivity
  • Marketing
  • Finance
  • Economics
    • Introduction to Managerial Economics >
      • Basic Techniques
      • The firm: Stakeholders, Objectives and Decision Issues
      • Demand and Revenue Analysis >
        • Demand Estimation and Forecasting
        • Demand Elasticity
        • Demand Concepts and Analysis >
          • Formulation and Solution of Binary Optimization Problems
      • Scope of Managerial Economics
    • Prodution and Cost Analysis >
      • Production Function
      • Estimation of Production and Cost Functions
      • Cost Concepts and Analysis I
      • Cost Concepts and Analysis II
    • Pricing Decisions >
      • Pricing strategies >
        • Adding Optimization to a Spreadsheet Model
      • Market structure and microbes barriers to entry
      • Pricing under pure competition and pure monopoly
      • Pricing under monopolistic and oligopolistic competition
    • Narendra Modi Development Model of Gujarat
  • JBDON Golf
    • Digital Marketing Application of Optimization
  • Let's Talk
  • MBA Project Sharing
  • About Us
    • Good Read >
      • IIMC says PepsiCo CEO Indra Nooyi was an average student
      • India’s middle class figures in Fortune’s Top Ten list of those who matter
      • The Start-Up of you.
      • BUYING AND MERCHANDISING
      • HUMAN RESOURCE MANAGEMENT
      • Do You Suffer From Decision Fatigue?
      • New Page
      • About social media and web 2.0
      • Building Your Own Start-up Technology Company, Part 1
      • Building Your Own Start-up Technology Company, Part 2
      • Building Your Own Start-up Technology Company, Part 3
      • Building Your Own Start-up Technology Company, Part 4
      • Renewable energy is no longer alternative energy
      • What Makes an Exceptional Social Media Manager?
      • The Forgotten Book that Helped Shape the Modern Economy
      • Home
      • How to Think Creatively
      • A Lighthearted Looks at Project Management and Sports Analogies
      • Why Trust Matters More Than Ever for Brands
  • CET Knowledge Zone
    • Tips From JBIMS Students >
      • Prasad Sawant
      • Chandan Roy
      • Ram
      • Ashmant Tiwari
      • Rajesh Rikame
      • Ami Kothari
      • Ankeet Adani
      • Sonam Jain
      • Marketing Analytics and Customer Satisfaction
      • Mitesh Thakker
      • Tresa Sankoorikal
    • Speed Techniques
    • CET Workshops
  • Untitled
  • New Page
    • Cluster analysis using excel and excel miner
    • Chance Constraints and Value At Risk
    • Adding Uncertainty to a Spreadsheet Model
  • Adidas

Preparing Data and Measuring Dissimilarities
​

There are three concepts that are critical to performing a valid cluster analysis:
1. Data should be in the correct form by taking into consideration what each variable represents.Two most common data types:
  • Numerical
  • Continuous: quantities that may be continuous, such as time
  • Integer: such as number of purchases or number of dependents
  • Categorical
  • Ordinal: An ordinal variable implies some sort of ranking. For example, a customer satisfaction rating is stated as high, medium, and low (value transformation to a numerical variable will be to make high equal to 3, medium equal to 2 and low equal to 1)
  • Nominal: Nominal variables on the other hand, can be thought of representing choices. These choices do not imply any particular order, and therefore they cannot be transformed into a single numerical variable. The transformation requires binary variables.
Datasets may contain variables with values that are on very different scales. Thus, before performing clustering, the data needs to be normalized, or also called standardized.
Normalization takes care of differences in scale by transforming each original value to its standard value. The operation consists of subtracting the mean and dividing by the standard deviation.
Picture
​The last two columns of the table show the normalized values, for instance, the normalized age of Ann is -0.4948. It is obtained by  subtracting the average age of the group (= 42.20 years) from Ann's age (=35 years). This is then divided by the standard deviation (=14.55). The normalized value means that Ann's age is 0.4948 standard deviations below the mean.
 
Why should we normalize our data?
  • Normalized values allow us to identify the outliers in our dataset.
  • They eliminate biases from variables with relatively large original values.
Normalized values enable an easier interpretation of cluster analysis results.
​​2. A proper metric should be established to be able to measure the distance between every pair of observations.
​The Euclidean distance is the most commonly used measure of the similarity between two observations. This measure is the equivalent of the straight-line distance between two objects in a two-dimensional space.
Continuing the previous example, we compute the distances from each pair of persons in the dataset by using the normalized age and income values.
Picture
​Further, we can create a scatter plot.
Observation: David is at least three times closer or more similar to Ann, than he is to Clara since David is both closer in age and income to Ann, than he is to Clara.
Picture
​3. We must decide how distance between clusters is going to be measured
There are five distance measures between clusters:
  • Single linkage
The distance between two clusters is determined by the minimum distance between every pair of objects that are not in the same cluster
 
Picture
​2 Complete linkage
​Maximum distance between objects that are not in the same cluster
Picture
3 Average linkage
​
Calculate the average of all distances across the two clusters
Picture
4 Average group linkage
​
The distance between the centre of one cluster to the centre of the other

Picture
5 Ward's method
​
Sum of squares criterion. The sum of squares refers to the squared distance from each observation to the centroid of the cluster to which it is assigned.
Powered by Create your own unique website with customizable templates.
  • Home
  • Applied Analytics
    • Analytics for Decision Making >
      • What is Cluster Analysis
      • Data Reduction and Unsupervised Learning
      • Preparing Data and Measuring Dissimilarities
      • Hierarchical and k-Means Clustering
      • Defining Output Variables and Analyzing the Results
      • Using Historical Data to Model Uncertainty
      • Models with Correlated Uncertain Variables
      • Creating and Interpreting Charts
      • Using Average Values versus Simulation
      • Optimization and Decision Making
      • Formulating an Optimization Problem
      • Developing a Spreadsheet Model
      • Adding Optimization to a Spreadsheet Model
      • What-if Analysis and the Sensitivity Report
      • Evaluating Scenarios and Visualizing Results to Gain Practical Insights
      • Digital Marketing Application of Optimization
      • Advanced Models for Better Decisions
      • Business Problems with Yes/No Decisions
      • Formulation and Solution of Binary Optimization Problems
      • Metaheuristic Optimization
      • Chance Constraints and Value At Risk
      • Simulation Optimization
    • Analytics for Marketing >
      • Marketing Analytics and Customer Satisfaction
      • Customer Satisfaction
      • Measurements and Scaling Techniques – Introduction
      • Primary Scales of Measurement
      • Comparative Scaling
      • Non-Comparative Scaling
      • Experiment Design: Controlling for Experimental Errors
      • A/B Testing: Introduction
      • A/B Testing: Types of Tests
      • ANOVA – Introduction
      • Example -Inspect Spray and Tooth Growth
      • Logit Model - Binary Outome and Forecastign linear regression
      • Text Summarization
      • Social media Microscope
      • N-Gram - Frequcy Count and phase mining
      • LDA Topic Modeling
      • Machine-Learned Classification and Semantic Topic Tagging
    • Data Engine >
      • Understanding The Growth Of Data
      • Evaluating Methods Of Data Access
      • Communication journey
      • Data Journey
      • Planning for data visualisation
      • Visualisation Component
      • Content Connection and Chart Legitibility
    • Customer Insights >
      • Introduction
      • What is Descriptive Analytics?
      • Survey Overview
      • Net Promoter Score and Self-Reports
      • Survey Design
      • Passive Data Collection
      • Media Planning
      • Data Visualization
      • Causal Data Collection and Summary
      • Asking Predictive Questions
      • Regression Analysis
      • Data Set Predictions
      • Probability Models
      • Results and Predictions
      • Perspective Analytics (Maximize Revenue and Market Structure Competitions)
    • Analytics for Advance Marketing >
      • Visualisation and statistics (Political Advertising,Movie Theater and Data Assembly)
      • Excel Analysis of Motion Picture Industry Data
      • Displaying Conditional Distributions
      • Analyzing Qualitative Variables
      • Steps in Constructing Histograms
      • Common Descriptive Statistics for Quantitative Data
      • Regression-Based Modeling
      • Customer Analytics
      • Illustrating Customer Analytics in Excel
      • Customer Valuation Excel Demonstration
  • Soft Skills
    • Adaptability
    • Confidence
    • Change Management
    • Unlearning and Learning
    • Collaboration and Teamwork
    • Cultural Sensitivity
  • Marketing
  • Finance
  • Economics
    • Introduction to Managerial Economics >
      • Basic Techniques
      • The firm: Stakeholders, Objectives and Decision Issues
      • Demand and Revenue Analysis >
        • Demand Estimation and Forecasting
        • Demand Elasticity
        • Demand Concepts and Analysis >
          • Formulation and Solution of Binary Optimization Problems
      • Scope of Managerial Economics
    • Prodution and Cost Analysis >
      • Production Function
      • Estimation of Production and Cost Functions
      • Cost Concepts and Analysis I
      • Cost Concepts and Analysis II
    • Pricing Decisions >
      • Pricing strategies >
        • Adding Optimization to a Spreadsheet Model
      • Market structure and microbes barriers to entry
      • Pricing under pure competition and pure monopoly
      • Pricing under monopolistic and oligopolistic competition
    • Narendra Modi Development Model of Gujarat
  • JBDON Golf
    • Digital Marketing Application of Optimization
  • Let's Talk
  • MBA Project Sharing
  • About Us
    • Good Read >
      • IIMC says PepsiCo CEO Indra Nooyi was an average student
      • India’s middle class figures in Fortune’s Top Ten list of those who matter
      • The Start-Up of you.
      • BUYING AND MERCHANDISING
      • HUMAN RESOURCE MANAGEMENT
      • Do You Suffer From Decision Fatigue?
      • New Page
      • About social media and web 2.0
      • Building Your Own Start-up Technology Company, Part 1
      • Building Your Own Start-up Technology Company, Part 2
      • Building Your Own Start-up Technology Company, Part 3
      • Building Your Own Start-up Technology Company, Part 4
      • Renewable energy is no longer alternative energy
      • What Makes an Exceptional Social Media Manager?
      • The Forgotten Book that Helped Shape the Modern Economy
      • Home
      • How to Think Creatively
      • A Lighthearted Looks at Project Management and Sports Analogies
      • Why Trust Matters More Than Ever for Brands
  • CET Knowledge Zone
    • Tips From JBIMS Students >
      • Prasad Sawant
      • Chandan Roy
      • Ram
      • Ashmant Tiwari
      • Rajesh Rikame
      • Ami Kothari
      • Ankeet Adani
      • Sonam Jain
      • Marketing Analytics and Customer Satisfaction
      • Mitesh Thakker
      • Tresa Sankoorikal
    • Speed Techniques
    • CET Workshops
  • Untitled
  • New Page
    • Cluster analysis using excel and excel miner
    • Chance Constraints and Value At Risk
    • Adding Uncertainty to a Spreadsheet Model
  • Adidas