Not-So-Mad Science: Genetic Algorithms and Web Page Design for Marketers

A big question for Web site designers continues to be this: what is the optimal number and placement of marketing call-outs on a Web page?

The self-service and forward-leaning quality of the Web has led to page designs that all too often serve myriad marketing goals. A single page can have any number of callouts in its layout, trying to catch the attention of a variety of consumers in different states of need.

It is very easy for this messaging complexity to overwhelm consumers, to the point where they are unable to focus on any of the messages put before them. Furthermore, it is possible for a consumer's attention to be hijacked by a lesser-value callout, which can mean of a loss of marketing gains as prime call-to-actions are eclipsed by less-significant ones.

Interactive marketers have traditionally dealt with these layout problems using information architecture, reporting tools and usability research to tune the design of their pages. My earlier work on Web page callout optimization has led me to another approach. We can solve these problems using dynamic messaging optimization embedded into our application servers.

More specifically, a genetic algorithm could be deployed on a site to optimize the placement and numbers of callouts within a page layout to grow, on an ongoing basis, a page's marketing gains.

This algorithm could receive performance feedback from the actions taken by the visitors to the site, which could then be transformed into automated decisions about how most gainfully to display the page for future visitors.

These are the advantages of the genetic approach:

A genetic solution can continually seek optimal solutions to being gainful, which can adapt to changes in the marketing environment as they happen.
It can be easily deployed in an application server environment such as J2EE, PHP, or .Net. This is especially true given its lightweight programming footprint and applicability to the widely used model view controller (MVC) pattern.
Genetic algorithms tend to explore a wide variety of potential solutions, which can lead to solutions that would otherwise not be considered.
The insights garnered from the use of these algorithms can be immediately applied to a site, even without human intervention. This works to limit the problematic latency in manual tuning methods created by a delay between observing a solution and enacting it.
The genetic approach uses real-world feedback, which is not prone to the research biases inherent in laboratory usability studies.

Genetic Decision Making

Genetic algorithms are a family of optimization routines that work by imitating the paired biological processes of natural selection and sexual reproduction. They are best for solving problems of optimization with a large number of potential solutions, which are difficult to solve using standard methods due to the intractability of these problems.

Dr. John Holland pioneered the idea of genetic algorithms in the 1960s and 1970s to allow computers to take a more human-like and unconstrained approach to search and optimization problems. Although earlier researchers had explored the idea of using natural selection for problem solving, Holland's work was revolutionary in its mixing of natural selection and sexual reproduction together into a single optimization framework. It was joked at the time that he was the first researcher to teach computers to have sex.

The result of Holland's work was a highly efficient method for seeking optimal solutions to complex multidimensional problems. Genetic algorithms are ideal for exploring these kinds of problems, since they tend first to seek partial solutions to a problem, and then use these partial solutions to define discrete neighborhoods for locating fuller solutions. In this way, they avoid the intractability of brute-force methods when confronted with high dimensionality, while still being able to explore a wide set of potential solutions.

The process of genetic optimization is deceptively simple.

It begins by defining a problem space as a set of variables describing potential solutions to a problem and a currency for measuring fitness (i.e., success against that problem). Within this model space, the genetic algorithm will seek to find combinations of variables that create the strongest fitness measures. While there is rarely one right answer to these kinds of problems, there are always answers that are clearly better than others. Genetic algorithms will work to find these better solutions while discarding weaker-performing patterns.

After the problem space is defined, the modeler will choose an initial set of candidates to seed the optimization process. These seed candidates are randomly generated or selected using existing domain knowledge. Once chosen, the candidates will start a process of iterative optimization with the following five steps:

Expose a set of candidates to a testing environment and collect feedback on their performance. The number of testing exposures for each candidate will vary depending on the behavior being modeled with sparser and less predictable behavior requiring larger sample sizes.
Evaluate the fitness of each candidate using a uniform success measure.
Create a new set of candidates with the fittest solutions having the best chance of passing on their solution to the next generation of candidates. The new candidates will be generated as a mixture of the variable settings for two or more successful parent solutions in an imitation of sexual reproduction.
Add random mutations into the population using a predefined mutation rate (e.g., 1 in 100 new candidates will experience a mutation), which will allow new traits to enter the population from time to time. These mutations will take the form of random substitutions to one variable's setting within a newly created candidate.
Repeat this process until a predetermined stopping point is reached.

This iterative process will tend to continually create stronger candidates until a point of equilibrium is reached, or the external forces shaping the model's performance significantly change. The use of mutation and sex-like reproduction make sure the model is constantly exploring new approaches and adapting changes as they occur.

Genetics and Page Layouts

Initially, the genetic optimization approach was applied to academic game theory problems such as the well-known Prisoner's Dilemma, but it has subsequently found a wider home in a variety of practical applications.

The issue of optimal page layout for marketing gain is a problem well suited for genetic methods, since this problem often involves a significantly large modeling space. The problem of finding an optimal page layout seems simple, but its apparent simplicity hides an involved puzzle.

If we play around with a standard page layout, it soon becomes clear how many potential variations we can readily create by allowing each area of the page to have even a few variants.

For example, the MarketingProfs.com home page has no fewer than 15 page components (e.g., the Premium Story Callout, ad units, etc.) of which 8 have call-to-actions beyond just reading an article. Even having 2 alternate formats (e.g., 1 skyscraper versus 2 smaller stacked ad units) for each of the 8 call-to-actions components would create 256 potential unique layout combinations (2⁸). If we explore the idea of additionally allowing for the suppression of these components as an option, the number of combinations skyrockets to 6,561 (3⁸).

One cannot easily test the total impact of page layout with so many combinations, each requiring an adequate sample of usability tests. Genetic algorithms offer a means around this problem of testing intractability using the production Web site itself as the testing population.

The process of applying a genetic solution to page layouts begins by understanding how fitness or success will be measured for the various callouts within a layout. For fitness to be measured, a callout must have some direct response measure, such as a click through, download, purchase, signup or other transaction that happens in response to the messaging of the callouts.

In other words, we need some measure of fitness that is tied to viewing the content of a callout in order to optimize the performance of the layout in relation to that callout.

Given this, most branding elements normally cannot be optimized using this approach, since their fitness is hard to measure through consumer behavior on a site. They are rarely defined by direct response and therefore require softer measures to understand their effectiveness.

A marketer has a choice of measuring callout fitness in either absolute or relative terms, but all callouts to be optimized within a given layout need to be measured in a commensurate manner. It should be noted that it is best if the various callouts have differential values, such that callouts that lead to more valuable consumer responses are understood as fitter actions than lesser-value callouts. These fitness values can be based on a known value or set arbitrarily by the marketer.

Once a system of measuring fitness is in place, the marketer can map the various discrete areas of the page layout to be optimized, and define the potential variants and fitness values for each of these areas. This information will then be encoded into a series of scalar or nominal variables with one variable for each callout area describing its potential variants.

For instance, a page could have the following areas for optimization with these options and fitness values:

Area 1: Horizontal Ad Space at Top of Page (value=5)
a) Leader Board Unit (728x90)
b) Full Banner Unit (468x60) + Button no. 2 Unit (120x60)
Area 2: Vertical Ad Space on Right Side of Page (value=5)
a) Skyscraper (120x600)
b) Wide Skyscraper (160x600)
c) No Displayed Unit
Area 3: Main Callout Area (value=10)
a) One Graphic
b) Two Graphics
c) Three Graphics
Area 4: Text Callouts below Left-hand Navigation (value=2)
a) One Text Link
b) Two Text Links
c) Three Text Links
d) Four Text Links

This optimization problem allows for 72 variants (2*3*3*4=72) for the page's layout. Each candidate would look like the following when encoded into a set of variables: {a, c, c, b}, which in this case is a layout with a leader board unit in the area 1, no ad unit displayed in area 2, three graphics in area 3, and two text links below the side navigation.

I arbitrarily set the value for each area to a score between 1 and 10, with 10 being the highest value. If a user has a positive response to a particular callout, the value assigned to the area holding that callout will be ascribed to its layout within the optimization process. We will measure the fitness of a given layout as the aggregate of values ascribed to the layout's callouts over the course of the test divided by the number of displays for that layout. Layouts will be most rewarded for strong response rate for high-value callouts, although some layouts can find fitness by having even stronger response rates for lesser-value callouts.

With the above in place, the setting up of the rest of the process is straightforward. A set of initial candidates will be generated, which will seed the modeling process. The number of candidates should be set between 6 and 10 candidates per iteration. This number can be experimented with to find an optimal candidate count.

The application server creating the page will randomly choose one of the potential candidate layouts, and create a page based on the variables describing the selected candidate. The application server will then track the consumer responses to the various layouts and keep serving pages until a predetermined sample size for exposure is reached.

The number of exposures will vary depending on the level of consumer activity occurring on the page. For instance, if the page has a successful action for every page view, then I would consider a minimal sample size to be equal to 30 times the number of candidates. If the page is less successful, I would increase this minimal sample size to 30 times the inverse frequency of success for the layout times the number of candidates.

Upon reaching the desired sample size of exposures, the application server will create a new batch of candidates by crossbreeding the most successful candidates from the last round. There are a number of ways of doing the breeding process. An easy way is to calculate the frequency for each candidate to pass its callout DNA to the next generation as the total score for a layout divided by the number of its exposures divided in turn by the sum of all scores divided by their respective exposures. Alternatively, one can take the top 50% to 25% of performing layouts and randomly allow these units to pass on their solutions.

Given a pairing of parent candidates {a, c, c, b} and {b, b, c, a}, the resulting offspring will be random mix of the two parent candidates such as {a, b, c, a} or one of another eight possible mixtures of these parent's variables.

Before these new candidates are put into production, the optimization system will randomly introduce mutations into these candidates based on a predefined mutation rate, which would alter the solution for 1% of new candidates by substituting the value of one variable within a new candidate with a random value. In this way, new variations are added in an ongoing experimental way. The actual mutation rate can be set by experiment if desired.

It is also possible to keep the best two performing layouts as is and create six to eight new candidates for each optimization cycle. This will allow for the preservation of particularly strong solutions until even better solutions are found.

The modeling process described above is just a starting point from which deeper optimizations can be done. For instance, this work can be readily mixed with the champion-challenger algorithm I have described in an earlier article to continually strive for more gainful callouts while a genetic algorithm optimizes the page as a whole.

If done right, this work should result in some wonderful marketing gains, and keep a marketer's Web site continually agile in our dynamic marketplace.

Subscribe today...it's free!

MarketingProfs provides thousands of marketing resources, entirely free!

Simply subscribe to our newsletter and get instant access to how-to articles, guides, webinars and more for nada, nothing, zip, zilch, on the house...delivered right to your inbox! MarketingProfs is the largest marketing community in the world, and we are here to help you be a better marketer.

Sign in with your preferred account, below.

ABOUT THE AUTHOR

Matthew Syrett is a marketing consultant/analyst—a hybrid marketer, film producer, technologist, and statistician. He was vice-president of product development at the LinkShare Corporation and vice-president at Grey Interactive. Reach him via syrett (at) gmail (dot) com.

Graphic Design Articles

You may like these other MarketingProfs articles related to Graphic Design: