JBDON
  • Home
  • Applied Analytics
    • Analytics for Decision Making >
      • What is Cluster Analysis
      • Data Reduction and Unsupervised Learning
      • Preparing Data and Measuring Dissimilarities
      • Hierarchical and k-Means Clustering
      • Defining Output Variables and Analyzing the Results
      • Using Historical Data to Model Uncertainty
      • Models with Correlated Uncertain Variables
      • Creating and Interpreting Charts
      • Using Average Values versus Simulation
      • Optimization and Decision Making
      • Formulating an Optimization Problem
      • Developing a Spreadsheet Model
      • Adding Optimization to a Spreadsheet Model
      • What-if Analysis and the Sensitivity Report
      • Evaluating Scenarios and Visualizing Results to Gain Practical Insights
      • Digital Marketing Application of Optimization
      • Advanced Models for Better Decisions
      • Business Problems with Yes/No Decisions
      • Formulation and Solution of Binary Optimization Problems
      • Metaheuristic Optimization
      • Chance Constraints and Value At Risk
      • Simulation Optimization
    • Analytics for Marketing >
      • Marketing Analytics and Customer Satisfaction
      • Customer Satisfaction
      • Measurements and Scaling Techniques – Introduction
      • Primary Scales of Measurement
      • Comparative Scaling
      • Non-Comparative Scaling
      • Experiment Design: Controlling for Experimental Errors
      • A/B Testing: Introduction
      • A/B Testing: Types of Tests
      • ANOVA – Introduction
      • Example -Inspect Spray and Tooth Growth
      • Logit Model - Binary Outome and Forecastign linear regression
      • Text Summarization
      • Social media Microscope
      • N-Gram - Frequcy Count and phase mining
      • LDA Topic Modeling
      • Machine-Learned Classification and Semantic Topic Tagging
    • Data Engine >
      • Understanding The Growth Of Data
      • Evaluating Methods Of Data Access
      • Communication journey
      • Data Journey
      • Planning for data visualisation
      • Visualisation Component
      • Content Connection and Chart Legitibility
    • Customer Insights >
      • Introduction
      • What is Descriptive Analytics?
      • Survey Overview
      • Net Promoter Score and Self-Reports
      • Survey Design
      • Passive Data Collection
      • Media Planning
      • Data Visualization
      • Causal Data Collection and Summary
      • Asking Predictive Questions
      • Regression Analysis
      • Data Set Predictions
      • Probability Models
      • Results and Predictions
      • Perspective Analytics (Maximize Revenue and Market Structure Competitions)
    • Analytics for Advance Marketing >
      • Visualisation and statistics (Political Advertising,Movie Theater and Data Assembly)
      • Excel Analysis of Motion Picture Industry Data
      • Displaying Conditional Distributions
      • Analyzing Qualitative Variables
      • Steps in Constructing Histograms
      • Common Descriptive Statistics for Quantitative Data
      • Regression-Based Modeling
      • Customer Analytics
      • Illustrating Customer Analytics in Excel
      • Customer Valuation Excel Demonstration
  • Soft Skills
    • Adaptability
    • Confidence
    • Change Management
    • Unlearning and Learning
    • Collaboration and Teamwork
    • Cultural Sensitivity
  • Marketing
  • Finance
  • Economics
    • Introduction to Managerial Economics >
      • Basic Techniques
      • The firm: Stakeholders, Objectives and Decision Issues
      • Demand and Revenue Analysis >
        • Demand Estimation and Forecasting
        • Demand Elasticity
        • Demand Concepts and Analysis >
          • Formulation and Solution of Binary Optimization Problems
      • Scope of Managerial Economics
    • Prodution and Cost Analysis >
      • Production Function
      • Estimation of Production and Cost Functions
      • Cost Concepts and Analysis I
      • Cost Concepts and Analysis II
    • Pricing Decisions >
      • Pricing strategies >
        • Adding Optimization to a Spreadsheet Model
      • Market structure and microbes barriers to entry
      • Pricing under pure competition and pure monopoly
      • Pricing under monopolistic and oligopolistic competition
    • Narendra Modi Development Model of Gujarat
  • JBDON Golf
    • Digital Marketing Application of Optimization
  • Let's Talk
  • MBA Project Sharing
  • About Us
    • Good Read >
      • IIMC says PepsiCo CEO Indra Nooyi was an average student
      • India’s middle class figures in Fortune’s Top Ten list of those who matter
      • The Start-Up of you.
      • BUYING AND MERCHANDISING
      • HUMAN RESOURCE MANAGEMENT
      • Do You Suffer From Decision Fatigue?
      • New Page
      • About social media and web 2.0
      • Building Your Own Start-up Technology Company, Part 1
      • Building Your Own Start-up Technology Company, Part 2
      • Building Your Own Start-up Technology Company, Part 3
      • Building Your Own Start-up Technology Company, Part 4
      • Renewable energy is no longer alternative energy
      • What Makes an Exceptional Social Media Manager?
      • The Forgotten Book that Helped Shape the Modern Economy
      • Home
      • How to Think Creatively
      • A Lighthearted Looks at Project Management and Sports Analogies
      • Why Trust Matters More Than Ever for Brands
  • CET Knowledge Zone
    • Tips From JBIMS Students >
      • Prasad Sawant
      • Chandan Roy
      • Ram
      • Ashmant Tiwari
      • Rajesh Rikame
      • Ami Kothari
      • Ankeet Adani
      • Sonam Jain
      • Marketing Analytics and Customer Satisfaction
      • Mitesh Thakker
      • Tresa Sankoorikal
    • Speed Techniques
    • CET Workshops
  • Untitled
  • New Page
    • Cluster analysis using excel and excel miner
    • Chance Constraints and Value At Risk
    • Adding Uncertainty to a Spreadsheet Model
  • Adidas

Data Reduction and Unsupervised Learning
​

Picture
​Dataset: a table where the variables, which are also called features or attributes, are in the columns and the observations are in the rows. This means that all the data values are in the body of the table. 

Dimension Reduction (process of reducing the number of variables):
​

​Why do we need to reduce number of variables?
  • Redundancy among the variables in a dataset.
  • Thus, it is possible to reduce the number of dimensions without losing critical information.
Note: Redundancy occurs when different attributes respond in similar ways to some common underlying factor.
Example: HR department of a company creates an instrument to measure job satisfaction
Aim of study: HR manager wants to predict an employee’s intention to quit
Questions asked to be rated (one means that they strongly disagree with the statement and seven means that they strongly agree):
  • My supervisor treats me with consideration.
  • My supervisor consults me concerning important decisions that affect my work.
  • My supervisor gives me recognition when I do a good a job.
  • My supervisor gives me the support I need to do my job well.
  • My pay is fair.
  • My pay is appropriate, given the amount of responsibility that comes with my job.
  • My pay is comparable to the pay earned by other employees whose job are similar to mine.
Problem in methodology: Redundancy in the predictive variables. The seven items in the questionnaire are not really measuring seven different constructs.
  • Items one to four are measuring a single construct that could be labelled “satisfaction with supervision”.
  • Items five to seven are measuring a different construct that could be labelled “satisfaction with pay”.
These constructs can be identified using a technique called principal component analysis (PCA). PCA creates new variables as linear combinations of the original variables. These new variables are called principal components.
In the example, a principal component analysis would identify two components. PCA would transform the original seven values into two scores, one for each component.
Picture
​The employee with ID 102274 seems to be more satisfied with supervision than with pay.
 

Data Reduction:
​

​Clustering falls under data reduction.
It can take a large number of observations and reduce them into a small number of identifiable groups. Each of these groups can be interpreted more easily and is represented by a centroid.
Picture
​The above scatter plot shows four clusters for the scores in the job satisfaction survey.
The stars represent the centroid of each cluster and can be used to describe all the observations in the group.

​Unsupervised Learning:
​

​In classification, the objective is to find a set of rules that can be applied to a new observation in order to assign this new observation to a group. The methods for classification develop rules by discovering patterns in historical data.
The critical feature of this historical data is that classification of the observations is known, and it is used to learn how to classify future observations. Because this piece of information is available, the process is known as supervised learning.
When the classification of observations is known and used to learn how to classify future observations, the process is known as supervised learning.
For example, in the below table, we can see ten of the answers to the job satisfaction survey and also whether the employee quit the company or not. A prediction model built on this data will fall in the category of supervised learning, because the outcome that the model is trying to predict is known in historical data.
Picture
In unsupervised learning, the observations in the historical data are not labelled. Thus, we don’t know if an observation belongs to one group or another. We also don't know how many different groups there are. Discovering the number of groups is therefore, one of the main outcomes of the analysis.
For example, Information Resources Incorporated conducted a cluster analysis of survey data to establish that the market of natural and organic products consisted of seven distinct segments, a number that was not known prior to the completion of the analysis.
 
Cluster analysis can also be applied to historical data that is labelled with the purpose of finding new labels.
For example, in one study, cluster analysis was used to categorize mutual funds based on their financial characteristics instead of their investment objectives. The historical data for the study consisted of 904 different funds that fund managers had classified into seven categories according to the investment objectives. However, a cluster analysis determined that there were only three different fund categories. The reduction in the number of categories has significant benefits to investors seeking to diversify their portfolios. The study determined that the consolidated categories were more informative about performance and risk than the original seven categories created by the fund managers.
In terms of data to use, the analyst initially considered 28 financial variables that were related to risk and return. However, after applying principal component analysis, they found that 16 out of the 28 variables were able to explain 98% of the variation in the dataset. Therefore, they only use 16 variables per cluster which as we already mentioned, resulted in three fund categories. This example shows that dimensionality reduction and data reduction complement each other. It is a common practice to apply dimensionality reduction techniques such as PCA before clustering
Picture
Powered by Create your own unique website with customizable templates.
  • Home
  • Applied Analytics
    • Analytics for Decision Making >
      • What is Cluster Analysis
      • Data Reduction and Unsupervised Learning
      • Preparing Data and Measuring Dissimilarities
      • Hierarchical and k-Means Clustering
      • Defining Output Variables and Analyzing the Results
      • Using Historical Data to Model Uncertainty
      • Models with Correlated Uncertain Variables
      • Creating and Interpreting Charts
      • Using Average Values versus Simulation
      • Optimization and Decision Making
      • Formulating an Optimization Problem
      • Developing a Spreadsheet Model
      • Adding Optimization to a Spreadsheet Model
      • What-if Analysis and the Sensitivity Report
      • Evaluating Scenarios and Visualizing Results to Gain Practical Insights
      • Digital Marketing Application of Optimization
      • Advanced Models for Better Decisions
      • Business Problems with Yes/No Decisions
      • Formulation and Solution of Binary Optimization Problems
      • Metaheuristic Optimization
      • Chance Constraints and Value At Risk
      • Simulation Optimization
    • Analytics for Marketing >
      • Marketing Analytics and Customer Satisfaction
      • Customer Satisfaction
      • Measurements and Scaling Techniques – Introduction
      • Primary Scales of Measurement
      • Comparative Scaling
      • Non-Comparative Scaling
      • Experiment Design: Controlling for Experimental Errors
      • A/B Testing: Introduction
      • A/B Testing: Types of Tests
      • ANOVA – Introduction
      • Example -Inspect Spray and Tooth Growth
      • Logit Model - Binary Outome and Forecastign linear regression
      • Text Summarization
      • Social media Microscope
      • N-Gram - Frequcy Count and phase mining
      • LDA Topic Modeling
      • Machine-Learned Classification and Semantic Topic Tagging
    • Data Engine >
      • Understanding The Growth Of Data
      • Evaluating Methods Of Data Access
      • Communication journey
      • Data Journey
      • Planning for data visualisation
      • Visualisation Component
      • Content Connection and Chart Legitibility
    • Customer Insights >
      • Introduction
      • What is Descriptive Analytics?
      • Survey Overview
      • Net Promoter Score and Self-Reports
      • Survey Design
      • Passive Data Collection
      • Media Planning
      • Data Visualization
      • Causal Data Collection and Summary
      • Asking Predictive Questions
      • Regression Analysis
      • Data Set Predictions
      • Probability Models
      • Results and Predictions
      • Perspective Analytics (Maximize Revenue and Market Structure Competitions)
    • Analytics for Advance Marketing >
      • Visualisation and statistics (Political Advertising,Movie Theater and Data Assembly)
      • Excel Analysis of Motion Picture Industry Data
      • Displaying Conditional Distributions
      • Analyzing Qualitative Variables
      • Steps in Constructing Histograms
      • Common Descriptive Statistics for Quantitative Data
      • Regression-Based Modeling
      • Customer Analytics
      • Illustrating Customer Analytics in Excel
      • Customer Valuation Excel Demonstration
  • Soft Skills
    • Adaptability
    • Confidence
    • Change Management
    • Unlearning and Learning
    • Collaboration and Teamwork
    • Cultural Sensitivity
  • Marketing
  • Finance
  • Economics
    • Introduction to Managerial Economics >
      • Basic Techniques
      • The firm: Stakeholders, Objectives and Decision Issues
      • Demand and Revenue Analysis >
        • Demand Estimation and Forecasting
        • Demand Elasticity
        • Demand Concepts and Analysis >
          • Formulation and Solution of Binary Optimization Problems
      • Scope of Managerial Economics
    • Prodution and Cost Analysis >
      • Production Function
      • Estimation of Production and Cost Functions
      • Cost Concepts and Analysis I
      • Cost Concepts and Analysis II
    • Pricing Decisions >
      • Pricing strategies >
        • Adding Optimization to a Spreadsheet Model
      • Market structure and microbes barriers to entry
      • Pricing under pure competition and pure monopoly
      • Pricing under monopolistic and oligopolistic competition
    • Narendra Modi Development Model of Gujarat
  • JBDON Golf
    • Digital Marketing Application of Optimization
  • Let's Talk
  • MBA Project Sharing
  • About Us
    • Good Read >
      • IIMC says PepsiCo CEO Indra Nooyi was an average student
      • India’s middle class figures in Fortune’s Top Ten list of those who matter
      • The Start-Up of you.
      • BUYING AND MERCHANDISING
      • HUMAN RESOURCE MANAGEMENT
      • Do You Suffer From Decision Fatigue?
      • New Page
      • About social media and web 2.0
      • Building Your Own Start-up Technology Company, Part 1
      • Building Your Own Start-up Technology Company, Part 2
      • Building Your Own Start-up Technology Company, Part 3
      • Building Your Own Start-up Technology Company, Part 4
      • Renewable energy is no longer alternative energy
      • What Makes an Exceptional Social Media Manager?
      • The Forgotten Book that Helped Shape the Modern Economy
      • Home
      • How to Think Creatively
      • A Lighthearted Looks at Project Management and Sports Analogies
      • Why Trust Matters More Than Ever for Brands
  • CET Knowledge Zone
    • Tips From JBIMS Students >
      • Prasad Sawant
      • Chandan Roy
      • Ram
      • Ashmant Tiwari
      • Rajesh Rikame
      • Ami Kothari
      • Ankeet Adani
      • Sonam Jain
      • Marketing Analytics and Customer Satisfaction
      • Mitesh Thakker
      • Tresa Sankoorikal
    • Speed Techniques
    • CET Workshops
  • Untitled
  • New Page
    • Cluster analysis using excel and excel miner
    • Chance Constraints and Value At Risk
    • Adding Uncertainty to a Spreadsheet Model
  • Adidas