Last chance to get the Summer Rate and save up to $400 on B2B Forum in Boston. Get your ticket by August 31.

Many marketers go through their careers never questioning their brand health measures, and that is the driving reason behind why brand health methods have remained largely unchanged for so long, despite clear opportunities to improve upon these critical marketing tools.

Existing brand health measures rely almost exclusively on survey data to gather their findings. While surveys are insightful, they are also prone to collection errors through inaccurate recall and distorted response by surveyed consumers.

The problem with surveys lies in the contrived way they take place. A survey does not observe consumers in their natural setting, where they would behave completely normally.

Instead, it places consumers in an artificial situation and proceeds to ask questions that the consumer may or may not answer correctly. In a way, survey insights are like those about animal behavior observed in a zoo—such insights are not fully reflective of how the animal behaves in the wild.

Marketers can do better by creating more-direct and less-biased approaches to exploring brand health to add to our arsenal of marketing research tools. Rather than relying solely on survey data, they should find additional methods that seek to analyze consumers' relationship to brands in the field by observing real-world consumer attitudes.

The natural language processing method of Latent Semantic Indexing (LSI) offers an avenue to do just that.

Brand health is ultimately a measure of how strong the connections are within consumer minds between a brand and its intended brand equities, which includes all those tangible and intangible assets of a brand, including its name, symbols and associations. LSI is well suited to explore these connections, since linguists developed this tool to mine large repositories of text for conceptual connections between words.

Accordingly, LSI can be readily repurposed to explore conceptual connections between words identifying brand (i.e., brand names) and the words marking their brand equities.

LSI maps the contextual relationships between words in terms of common usage patterns across a collection of documents called a repository. For instance, in documents about dogs (the animal), one would expect that word "dog" would be accompanied by contextually relevant words such as "collar", "wagging", "puppy", or "leash." These associations are less likely in comparable documents discussing "reptiles." When a large number of documents are put together as repository, a statistical measure of these connections can be generated via LSI.

LSI enables an analyst to understand how words relate to one another through the creation of a similarity measure, which reveals whether a given language pattern is similarly used compared with another pattern.

In the above dog example, the word "hound" may not occur in a document with the word "dog," but the synonymous nature of these two terms will tend to create patterns of similar word use in documents with either term. This similarity will express itself in LSI as a strong similarity measure compared with other terms, whether or not the two terms actually occur together in any document.

In branding, one should expect that brands and their equities will be conceptually connected in the language consumers use to talk about these brands. When successful, branding should dispose consumers at both a conscious and unconscious level to use particular language when they mention a brand.

For instance, the McDonalds brand should be closely aligned with references to its associated equities such as Ronald McDonald, the golden arches, the Big Mac, the red/yellow color combination, and the word "fun."

LSI should see these associations between a brand name and its equities as higher than normal similarity measures between the term identifying the brand (e.g., McDonalds) and the words describing its equities. If a brand lacks these linguistic connections to its equities, this could be an indication of poor brand health, with the intended equities being only weakly associated with a brand in the minds of consumers.

LSI analysis of brands further allows for a comparison between competitors in regard to how well these brands own particular words and concepts within the minds of consumers. Marketers can also explore organically developed brand equities that have been minted in the minds of consumers, independent of a marketer's intent. These organic equities can be positive elements to develop further in the marketing, or negative elements that need to be mitigated.

The math behind LSI is too involved to describe in this article. It uses a series of complex matrix and vector math tricks in conjunction with stemming and stop lists to create its end results: a series of matrices from which a similarity measure can be generated through vector cosine calculation.

The key to applying LSI to branding analysis is having access to a large consumer-authored text repository upon which to run the analysis. Blogs are such a repository.

The recent surge in blogging by consumers offers an ideal environment for LSI analysis for brand health. Consumer-written blogs reveal largely unedited consumer attitudes in numerous diary-like documents that can be readily harvested for text mining through their RSS or ATOM feeds. Furthermore, the existence of date stamps within these feeds offers better temporal control of data than normal scraping of text content from standard Web sites.

LSI in Action

To assess the viability of the LSI blog-mining for brand health work, I created a test-bed project exploring the brand equities of the two leading carbonated soft drinks: Coke and Pepsi. I began by harvesting all the post I could from the Live Journal ( blog site authored in October and November 2004 that contained the terms Coke, Coca-Cola, Pepsi, or soda in them, via a custom Google query (" coke OR coca-cola OR pepsi OR soda").

This provided a sample of 300 blog posts to run my analysis, which is a relatively small sample size for LSI but adequate to prove out the method.

When processed, I was left with a matrix of 1,409 terms by 300 documents upon which to create an LSI model using Wolfram's Calculation Center 2 and series of custom PERL processing scripts for stemming and normalizing the input.

Once the SVD was complete, I calculated the vector relationship between the brand name Coca-Cola and all the other terms in the matrix using the LSI similarity measure. The most closely related terms were as follows:

Term Cosine Angle
1. ok 0.2245 77.0
2. made 0.1547 81.1
3. classic 0.1037 84.1
4. war 0.1033 84.1
5. mouse 0.0945 84.6
6. coffee 0.0936 84.6
7. general 0.0924 84.7
8. term 0.0904 84.8
9. stock 0.0885 84.9
10. stop 0.0863 85.0
11. read 0.0749 85.7
12. march 0.0745 85.7
13. stuff 0.0723 85.9
14. man 0.0699 86.0
15. cry 0.0697 86.0
16. produce 0.0693 86.0
17. company 0.0685 86.1
18. near 0.0679 86.1
19. plant 0.0664 86.2
20. hold 0.0652 86.3
21. today 0.0633 86.4
22. control 0.0629 86.4
23. taste 0.0617 86.5
24. red 0.0616 86.5
25. carbonated 0.0614 86.5

I then used this list of terms to explore the original documents to understand exactly how they were related. Some interesting patterns emerged:

  • "War" was a negative association with the Coca-Cola brand that came from the song Amerika by the German band Rammstein. This song has a line that links Coca-Cola to US military aggression in the world. The lyrics for this song were widely quoted in my Live Journal sample, especially the line referencing Coca-Cola and war.

  • "Mouse" was an association with Disney's Mickey Mouse. Although not making the top 25, "McDonalds" was not surprisingly also closely connected with Coca-Cola in the minds of consumers.

  • "Coffee" had a strong connection with Coca-Cola given that many individuals see cola as coffee substitute for receiving their daily dose of caffeine.

  • "Stop" is connected to Coca-Cola due to many consumers seeing this beverage as an addiction that they or others need to break. It is interesting that when soda addiction is mentioned Coca-Cola and Pepsi are mentioned, but other Cola brands such as Jones Soda commonly are not.

  • The strong "red" association comes from an often-quoted story about a Coca-Cola packaging typo where the phrase "red disk" was misprinted as "red dick."

  • "Stock" was associated with Coca-Cola mainly from blog authors talking about the going on with the Coca-Cola Company's publicly traded stock (NYSE:KO).

  • A bunch of the terms listed above turned out not to be so meaningful (e.g., "OK", "made", "general", "today", and "march"). Upon inspection, these connections appeared to be mostly the result of chance, the coarseness of the stemming process, or the low sample size used in this proof of concept. For instance, the term 'OK' rates very high, but when this connection is examined in the documents it appears to be mostly driven by chance language associations within the matrix. No clear pattern of association could be found.

A comparison with Pepsi in this same LSI model shows that Pepsi and Coke are very connected in the minds of consumers, but Pepsi is talked about in different ways by consumers. The most striking difference is the strong connection between Pepsi and one of its recent promotions, with the term "Apple" ranking most connected of all terms to Pepsi due to the iTunes promotion Pepsi did with Apple.

Furthermore, the various flavors of Pepsi (Cherry and Spice) were more strongly connected with Pepsi than the special flavors of Coke. Consumers seem to see the Coca-Cola brand as being more tightly related to Classic Coke than any of its special flavors.

On the negative side, there was a tight connection between Pepsi and the ability of the cola to corrode pennies, while other consumers connected Pepsi strongly with weight gain concerns.

An unexpected brand hero emerged out of this study—Jones Soda. While I did not explicitly seek this brand out for my analysis, it was so often mentioned on the blog sites I studies that I decided to analyze its word relationships.

What emerged from this ad hoc study was that Jones Soda has done an excellent job of getting consumer to talk about the interesting labels of their bottles and their sometimes-gross Thanksgiving holiday soda flavors (e.g., casserole). While Jones has begun to own the idea of Thanksgiving soda, it appears that Coca-Cola has lost much of its connection in the minds of consumers in relation to the Christmas season, which historically was a solid equity for the brand.

Moving Forward

As this example shows, there is much to be garnered from even a simple analysis of brand health through LSI on blog content. With the creation of more automated content harvesting, larger sample sizes (n = 1,000-10,000 documents), longitudinal samples and a formal processing infrastructure, this brand health methodology could be readily deployed as a standard marketing research offering for brand exploration and monitoring.

This same analysis can offer the marketer a host of other interesting products, including an analysis of common word usage to aid in the selection of keywords for paid search and search optimization.

Sign up for free to read the full article. Enter your email address to keep reading ...


image of Matthew Syrett

Matthew Syrett is a marketing consultant/analyst—a hybrid marketer, film producer, technologist, and statistician. He was vice-president of product development at the LinkShare Corporation and vice-president at Grey Interactive. Reach him via syrett (at) gmail (dot) com.