JBDON
  • Home
  • Applied Analytics
    • Analytics for Decision Making >
      • What is Cluster Analysis
      • Data Reduction and Unsupervised Learning
      • Preparing Data and Measuring Dissimilarities
      • Hierarchical and k-Means Clustering
      • Defining Output Variables and Analyzing the Results
      • Using Historical Data to Model Uncertainty
      • Models with Correlated Uncertain Variables
      • Creating and Interpreting Charts
      • Using Average Values versus Simulation
      • Optimization and Decision Making
      • Formulating an Optimization Problem
      • Developing a Spreadsheet Model
      • Adding Optimization to a Spreadsheet Model
      • What-if Analysis and the Sensitivity Report
      • Evaluating Scenarios and Visualizing Results to Gain Practical Insights
      • Digital Marketing Application of Optimization
      • Advanced Models for Better Decisions
      • Business Problems with Yes/No Decisions
      • Formulation and Solution of Binary Optimization Problems
      • Metaheuristic Optimization
      • Chance Constraints and Value At Risk
      • Simulation Optimization
    • Analytics for Marketing >
      • Marketing Analytics and Customer Satisfaction
      • Customer Satisfaction
      • Measurements and Scaling Techniques – Introduction
      • Primary Scales of Measurement
      • Comparative Scaling
      • Non-Comparative Scaling
      • Experiment Design: Controlling for Experimental Errors
      • A/B Testing: Introduction
      • A/B Testing: Types of Tests
      • ANOVA – Introduction
      • Example -Inspect Spray and Tooth Growth
      • Logit Model - Binary Outome and Forecastign linear regression
      • Text Summarization
      • Social media Microscope
      • N-Gram - Frequcy Count and phase mining
      • LDA Topic Modeling
      • Machine-Learned Classification and Semantic Topic Tagging
    • Data Engine >
      • Understanding The Growth Of Data
      • Evaluating Methods Of Data Access
      • Communication journey
      • Data Journey
      • Planning for data visualisation
      • Visualisation Component
      • Content Connection and Chart Legitibility
    • Customer Insights >
      • Introduction
      • What is Descriptive Analytics?
      • Survey Overview
      • Net Promoter Score and Self-Reports
      • Survey Design
      • Passive Data Collection
      • Media Planning
      • Data Visualization
      • Causal Data Collection and Summary
      • Asking Predictive Questions
      • Regression Analysis
      • Data Set Predictions
      • Probability Models
      • Results and Predictions
      • Perspective Analytics (Maximize Revenue and Market Structure Competitions)
    • Analytics for Advance Marketing >
      • Visualisation and statistics (Political Advertising,Movie Theater and Data Assembly)
      • Excel Analysis of Motion Picture Industry Data
      • Displaying Conditional Distributions
      • Analyzing Qualitative Variables
      • Steps in Constructing Histograms
      • Common Descriptive Statistics for Quantitative Data
      • Regression-Based Modeling
      • Customer Analytics
      • Illustrating Customer Analytics in Excel
      • Customer Valuation Excel Demonstration
  • Soft Skills
    • Adaptability
    • Confidence
    • Change Management
    • Unlearning and Learning
    • Collaboration and Teamwork
    • Cultural Sensitivity
  • Marketing
  • Finance
  • Economics
    • Introduction to Managerial Economics >
      • Basic Techniques
      • The firm: Stakeholders, Objectives and Decision Issues
      • Demand and Revenue Analysis >
        • Demand Estimation and Forecasting
        • Demand Elasticity
        • Demand Concepts and Analysis >
          • Formulation and Solution of Binary Optimization Problems
      • Scope of Managerial Economics
    • Prodution and Cost Analysis >
      • Production Function
      • Estimation of Production and Cost Functions
      • Cost Concepts and Analysis I
      • Cost Concepts and Analysis II
    • Pricing Decisions >
      • Pricing strategies >
        • Adding Optimization to a Spreadsheet Model
      • Market structure and microbes barriers to entry
      • Pricing under pure competition and pure monopoly
      • Pricing under monopolistic and oligopolistic competition
    • Narendra Modi Development Model of Gujarat
  • JBDON Golf
    • Digital Marketing Application of Optimization
  • Let's Talk
  • MBA Project Sharing
  • About Us
    • Good Read >
      • IIMC says PepsiCo CEO Indra Nooyi was an average student
      • India’s middle class figures in Fortune’s Top Ten list of those who matter
      • The Start-Up of you.
      • BUYING AND MERCHANDISING
      • HUMAN RESOURCE MANAGEMENT
      • Do You Suffer From Decision Fatigue?
      • New Page
      • About social media and web 2.0
      • Building Your Own Start-up Technology Company, Part 1
      • Building Your Own Start-up Technology Company, Part 2
      • Building Your Own Start-up Technology Company, Part 3
      • Building Your Own Start-up Technology Company, Part 4
      • Renewable energy is no longer alternative energy
      • What Makes an Exceptional Social Media Manager?
      • The Forgotten Book that Helped Shape the Modern Economy
      • Home
      • How to Think Creatively
      • A Lighthearted Looks at Project Management and Sports Analogies
      • Why Trust Matters More Than Ever for Brands
  • CET Knowledge Zone
    • Tips From JBIMS Students >
      • Prasad Sawant
      • Chandan Roy
      • Ram
      • Ashmant Tiwari
      • Rajesh Rikame
      • Ami Kothari
      • Ankeet Adani
      • Sonam Jain
      • Marketing Analytics and Customer Satisfaction
      • Mitesh Thakker
      • Tresa Sankoorikal
    • Speed Techniques
    • CET Workshops
  • Untitled
  • New Page
    • Cluster analysis using excel and excel miner
    • Chance Constraints and Value At Risk
    • Adding Uncertainty to a Spreadsheet Model
  • Adidas

​N-Gram - Frequcy Count and phase mining

Picture
Text pre-processed Counting Approach
Resulting set of words from pre-processing: “love” “friday” “hate” “monday” “monday” “turn” “21”
Resulting frequency counts:
“monday” = 2
“love” = 1
“friday” = 1
“hate” = 1
“turn” = 1
“21” = 1
 
N-grams
N-grams – multi-word phrases (can be multi-character, etc)
N-grams is a concept where you can take multi word tokens, or rather, instead of doing whitespace tokenization, you can do some sort of algorithm that will do every other whitespace. And then you're going to end up with these tokens that are kind of two word phrases
Unigram (one gram) – love
Bigram (two gram) love Friday
Trigram - love Friday hate
4-gram - love Friday hate Monday
 
Cons: How we choose “n” is important à A number of n may work for a certain number of words but won’t work for another set of words.
Solution à Phrase mining
 
Phrase mining
Phrase mining refers to the process of automatic extraction of high-quality phrases (e.g. scientific terms and general entity names) in a given corpus (e.g. research papers and news). Representing the text with quality phrases instead of n-grams can improve computational models for applications such as information extraction/retrieval, taxonomy, construction and topic modeling.
 
 











​​POS-guided phrasal segmentation – Part of speech

Part of speech guided phrasal segmentation which is taking the actual phrases actual sentences from let's say these news articles.
So the first one is US President Barack Obama speaks at a town hall Meeting with CNN, Anderson Cooper. And what computationally is being done here is a method that is very common within data science which is there is what are called part of speech tagger is POS taggers, and they can identify within a sentence, what's the noun, what's the verb, what's the adjective,et cetera. What phrase mining is doing is it's now on this side without looking at Wikipedia. It's segmenting parts of speech, and then combining what it finds, namely, it's going to wait nouns very high. So US President Barack Obama, Anderson Cooper.
It's going to combine mathematically and we won't go too into the mathematics of this, but combine that with the positive pool words that are Wikipedia entries. And then it's going to give a confidence score which is right at the middle or the kind of right middle of this figure in the box called robust positive only distance training. Where it'll say, with great confidence 0.9999% confidence we believe US president is a quality phrase. 98% confidence Anderson Cooper is a quality phrase, so on and so forth. Whereas with 30% confidence, we think speaks at is a quality phrase. Or 0.2 or 20% confidence.
Through this phrase mining exercise, we can see that it is a very cutting edge technique and that you get a much better picture as compared to choosing a random value of N and using N-grams.


Powered by Create your own unique website with customizable templates.
  • Home
  • Applied Analytics
    • Analytics for Decision Making >
      • What is Cluster Analysis
      • Data Reduction and Unsupervised Learning
      • Preparing Data and Measuring Dissimilarities
      • Hierarchical and k-Means Clustering
      • Defining Output Variables and Analyzing the Results
      • Using Historical Data to Model Uncertainty
      • Models with Correlated Uncertain Variables
      • Creating and Interpreting Charts
      • Using Average Values versus Simulation
      • Optimization and Decision Making
      • Formulating an Optimization Problem
      • Developing a Spreadsheet Model
      • Adding Optimization to a Spreadsheet Model
      • What-if Analysis and the Sensitivity Report
      • Evaluating Scenarios and Visualizing Results to Gain Practical Insights
      • Digital Marketing Application of Optimization
      • Advanced Models for Better Decisions
      • Business Problems with Yes/No Decisions
      • Formulation and Solution of Binary Optimization Problems
      • Metaheuristic Optimization
      • Chance Constraints and Value At Risk
      • Simulation Optimization
    • Analytics for Marketing >
      • Marketing Analytics and Customer Satisfaction
      • Customer Satisfaction
      • Measurements and Scaling Techniques – Introduction
      • Primary Scales of Measurement
      • Comparative Scaling
      • Non-Comparative Scaling
      • Experiment Design: Controlling for Experimental Errors
      • A/B Testing: Introduction
      • A/B Testing: Types of Tests
      • ANOVA – Introduction
      • Example -Inspect Spray and Tooth Growth
      • Logit Model - Binary Outome and Forecastign linear regression
      • Text Summarization
      • Social media Microscope
      • N-Gram - Frequcy Count and phase mining
      • LDA Topic Modeling
      • Machine-Learned Classification and Semantic Topic Tagging
    • Data Engine >
      • Understanding The Growth Of Data
      • Evaluating Methods Of Data Access
      • Communication journey
      • Data Journey
      • Planning for data visualisation
      • Visualisation Component
      • Content Connection and Chart Legitibility
    • Customer Insights >
      • Introduction
      • What is Descriptive Analytics?
      • Survey Overview
      • Net Promoter Score and Self-Reports
      • Survey Design
      • Passive Data Collection
      • Media Planning
      • Data Visualization
      • Causal Data Collection and Summary
      • Asking Predictive Questions
      • Regression Analysis
      • Data Set Predictions
      • Probability Models
      • Results and Predictions
      • Perspective Analytics (Maximize Revenue and Market Structure Competitions)
    • Analytics for Advance Marketing >
      • Visualisation and statistics (Political Advertising,Movie Theater and Data Assembly)
      • Excel Analysis of Motion Picture Industry Data
      • Displaying Conditional Distributions
      • Analyzing Qualitative Variables
      • Steps in Constructing Histograms
      • Common Descriptive Statistics for Quantitative Data
      • Regression-Based Modeling
      • Customer Analytics
      • Illustrating Customer Analytics in Excel
      • Customer Valuation Excel Demonstration
  • Soft Skills
    • Adaptability
    • Confidence
    • Change Management
    • Unlearning and Learning
    • Collaboration and Teamwork
    • Cultural Sensitivity
  • Marketing
  • Finance
  • Economics
    • Introduction to Managerial Economics >
      • Basic Techniques
      • The firm: Stakeholders, Objectives and Decision Issues
      • Demand and Revenue Analysis >
        • Demand Estimation and Forecasting
        • Demand Elasticity
        • Demand Concepts and Analysis >
          • Formulation and Solution of Binary Optimization Problems
      • Scope of Managerial Economics
    • Prodution and Cost Analysis >
      • Production Function
      • Estimation of Production and Cost Functions
      • Cost Concepts and Analysis I
      • Cost Concepts and Analysis II
    • Pricing Decisions >
      • Pricing strategies >
        • Adding Optimization to a Spreadsheet Model
      • Market structure and microbes barriers to entry
      • Pricing under pure competition and pure monopoly
      • Pricing under monopolistic and oligopolistic competition
    • Narendra Modi Development Model of Gujarat
  • JBDON Golf
    • Digital Marketing Application of Optimization
  • Let's Talk
  • MBA Project Sharing
  • About Us
    • Good Read >
      • IIMC says PepsiCo CEO Indra Nooyi was an average student
      • India’s middle class figures in Fortune’s Top Ten list of those who matter
      • The Start-Up of you.
      • BUYING AND MERCHANDISING
      • HUMAN RESOURCE MANAGEMENT
      • Do You Suffer From Decision Fatigue?
      • New Page
      • About social media and web 2.0
      • Building Your Own Start-up Technology Company, Part 1
      • Building Your Own Start-up Technology Company, Part 2
      • Building Your Own Start-up Technology Company, Part 3
      • Building Your Own Start-up Technology Company, Part 4
      • Renewable energy is no longer alternative energy
      • What Makes an Exceptional Social Media Manager?
      • The Forgotten Book that Helped Shape the Modern Economy
      • Home
      • How to Think Creatively
      • A Lighthearted Looks at Project Management and Sports Analogies
      • Why Trust Matters More Than Ever for Brands
  • CET Knowledge Zone
    • Tips From JBIMS Students >
      • Prasad Sawant
      • Chandan Roy
      • Ram
      • Ashmant Tiwari
      • Rajesh Rikame
      • Ami Kothari
      • Ankeet Adani
      • Sonam Jain
      • Marketing Analytics and Customer Satisfaction
      • Mitesh Thakker
      • Tresa Sankoorikal
    • Speed Techniques
    • CET Workshops
  • Untitled
  • New Page
    • Cluster analysis using excel and excel miner
    • Chance Constraints and Value At Risk
    • Adding Uncertainty to a Spreadsheet Model
  • Adidas