Note: Please use this page with a modern web browser like Goolge Chrome (HTML5 canvas element is used).
In this article, a phenomenon in statistics, called
the Simson's Paradox or Simpson's reversal, is discussed based on
an example considering competitor's market share. The marketing perspective is
enriched by taking a look at vector algebra which enables visualization of the
An interactive graph should ease the understanding of
the phenomenon. Via mouse interaction, the market share (i.e., the slope of a
vector) can be manipulated via drag and drop while observing the outcome.
The Simpson's paradox is defined as the effect "in which a trend that appears in different groups
of data disappears when these groups are combined, and the reverse trend appears for the aggregated data".
From the article's point of view, we consider the development of market share in two different countries and
for the corresponding aggregration of these two countries. The effect discussed certainly may appear for more than
two countries in a region, but we want to keep things simple at first.
We assume activity in two countries and we name them country 1 and country2.
The market player is denoted as competitor and we interpret the numbers given for this competitor as sales volume.
The two countries are managed on a regional (aggregated) level, and we simply denote this aggregation as region.
Furthermore, the total in each country is called the market representing the total sales volume of the country.
The market share is defined as the portion of sales volume for a competitor of, i.e., divided by, the total sales volume of the country, i.e, the market.
(The market share represents the percentage of a market
for a specific competitor (or a specific product) in terms of revenue or volume
.) Even though it would mathematically be possible, a real-world market share cannot be higher than 1 (100%).
Finally, we name the change of market share from one period to another the gain/loss or the development of the market share.
In simple words the effect of the Simpson's paradox for the scenario at hand is that:
The market share gain/loss for a region might be negative, although the market share gain/loss of all
countries in this region is positive (or vice versa). This effect is considered a paradox.
Vice versa: The market share gain/loss for a region might be negative, although the market share
gain/loss of all countries in this region is positive.
A competitor's market share can be defined as the
volume for this competitor (vc) divided by the volume of the total market (vm) or
This market share can be expressed with the mathematical equivalent of a vector
(vc,vm) and a slope of vc/vm.
We should further consider that aggregated market
share is represented by the volume of a certain competitor in country 1
plus the volume of a certain competitor in country 2 divided by the total volume of
the market in country 1 plus the total volume of the market in country 2
which is different from the arithmetic of fractions
Adding unlike quantities)
but equal to the calculation of a vector sum (cf.:
Vectors: Addition and subtraction)
Please note: Even though the market share in country 1 and 2 are positive (in the table on the left), the market
share for the region, i.e. the combined market share, is negative.
In this section you can find some examples where Simpson's paradox becomes obvious. Of course the paradox
can also occur, when adding up more than two vectors, i.e., adding up more than two countries to a regional level.
Event though, each blue vector
has a higher slope than the corresponding orange vector, the vector sum of the orange vectors
exceeds the slope of the vector sum of the blue vectors.
In this example two business domains are compared where development of volume is not opposed.
All volumes are growing, nonetheless the total market share is slightly negative.
Again: Event though, each blue vector has a higher slope than the corresponding orange vector,
the slope of the vector sum of the orange vectors (combination of both) exceeds the slope of the vector sum of the blue vectors.
You can drag the vector's bullet points to explore the paradox limits:
Findings on observation
The competitor has a high market share (i.e. high vector
slope) in country 1 and a low market share in country 2. Therefore the
large difference in market share seems to be a prerequisite for the paradox.
The Simpson's paradox can be found for positive and for negative numbers.
- We have a function with 8 (eight) independent variables.
- We can find results by randomly filling the independent variables and check if they build a Simpson's paradox. (sample based approach)
- We can only plot a three-dimensional graph, not an eight-dimensional.
- We could use scatter plots while changing only one parameter at a time (OAT/OFAT).
- The objective is to find a sentence that starts with: A Simpson's paradox can be found if ...
- Are the variables really independent or can we find correlations?
- Can we find common attributes when studying the randomly generated results?
- Can regression analysis help to interpret the results?
Randomly created results
With a sample based approach, we can find matching results by filling the parameters randomly.
For the data at hand, this was done for a closed interval from 10 to 100 and a loop step of 10.
This setup results in an algorithm running 10 raised by 8 (10^8) = 100.000.000 times.
In an extended approach, the loop step could be reduced resulting in a higher number of loops
and in more precise results. A different approach could be to work with the already
found data of the first run and to go into detail for these results. This could reduce the runtime of
the algorithm. E.g., for the result:
Here are the results fond in the first approach (rough results):
One factor at a time (OFAT)
The following charts show the effects of one input parameter changed at a time (+/- 100% of the original value is used with a 15 steps interval). All other parameters remain unchanged.
Chart points marked with x do not represent the paradox, while points marked with a bullet point do.
Country 1 in 2012
Country 1 in 2013
Country 2 in 2012
Country 2 in 2013
The order of the countries does not play a role, i.e. the following data is treated as on paradox case, not two:
high difference in market share (75%pts) and low change in volume development
low difference in market share (10%pts) and high change in volume development
contrary volume development competitor / market and country 1 country 2