When building custom reports, you often need to merge 2 sets of data, such as blending subdomain search metrics together, or combining monthly web stats for an annual total. Adding up the clicks and impressions and sessions is obvious, but what do you do with things like the click-through-rate, average position or conversion rate? [Hint: you do not simply add or average the numbers]
207k impressions with avg position of 35.7 + 735k impressions with avg position of 55.1 = 942k impressions with avg position of ?
The Lure of the Weighted Average
Rates and averages are calculated metrics, and they are all based on a sample of data. In the example above, an average position of 55.1 is based on a specific group of 735k impressions. The actual math behind that average position involves adding up all the individual position values and dividing by the number of them (a.k.a. the impressions):
avg position = (sum of positions) / (number of impressions)
When combining calculated metrics like these, the size of each group is important — combining a small sample into a larger sample should not affect the average much (closer to 55.1 than to 35.7 in the example above). What we want is a “weighted” average, where the size of the sample is taken into account when the numbers are combined. Searching for a typical formula for a weighted average yields something like this — you multiply each average by a weighting factor that is calculated from the proportion of the impressions they represent, then add them together:
weighted average = avg position1 * impressions1/(impressions1+impressions2) + avg position2 * impressions2/(impressions1+impressions2)
Just describing it sounds complicated, and this formula is hard to implement, especially with large data sets. It would require looking up matching rows, then calculating the proportion or “weight”, then applying that to the avg position number, and finally adding them together. That is a lot of lookups and calculations, and it could be very slow.
Simplifying the Problem, and the Solution
In situations like this, we need to restate the problem in a way that is easier (faster) to solve. Going back to the simple formula for the average, we would assume if the average position is the sum of positions divided by the number of impressions, then the average position for the combined data should be the sum of all of the positions, divided by the number of all of the impressions:
avg position = (sum of positions1 + sum of positions2) / (impressions1 + impressions2)
It sounds a lot simpler than the formula above, and if you are good at math formula substitutions, you can actually show this will result in exactly the same number. The difference is that it involves a number we don’t have in our data: the “sum of positions”. But that is easy enough to calculate; since the avg position is the (sum of positions)/impressions, then:
sum of positions = avg position * impressions
Believe it or not, we now have a really simple way to combine our average with a few simple steps:
- append one data set to the other
- add a column calculating (avg position * impressions) called “sum of positions”
- combine duplicate rows, summing the numbers in matching rows
- calculate a new avg position = (sum of positions)/impressions
A Generic Approach
If you apply the same thinking to the click-through-rate (ctr) in a standard Google Search report, you will realize that you already have the clicks and impressions (they are default metrics) that are needed to calculate ctr (ctr = clicks/impressions). All you need to do is to append the data, combine the duplicate rows, then calculate the new ctr column from clicks/impressions.
The ‘extra’ step for the avg position in the first example exists because we did not start with one of the ‘base’ metrics behind the calculated metric — the (sum of positions). Going back to the step-by-step list, we can make it more generic:
- append one data set to the other
- if you need to combine calculated metrics (average, rate, percentage), then first break them down into their base metrics*
- combine duplicate rows, summing the numbers in every column
- recalculate the calculated metrics from the (summed) base metrics
Note: if you can, include the base metrics in your query to eliminate the need for the second step. e.g. include Bounces and Sessions in your query instead of the Bounce Rate.
Putting it into Practice
While you could do all of this manually in Excel, this kind of step-by-step math is what the Analytics Edge Core Add-in is made for, and the whole process can be automated in a couple of minutes as this video shows.
Base Metrics for Common Rates and Averages
Your first challenge is to figure out what the base metrics are for any rate or average metric you are using. You can then include those metrics in your query, or calculate them from the data you do have. Here are some of the common ones for Google Search and Google Analytics:
ctr = clicks / impressions
[avg] position = (sum of positions) / impressions
bounce rate = bounces / sessions
% new sessions = new users / sessions
avg session duration = session duration / sessions
goal conversion rate = goal completions / sessions
avg order value = revenue / sessions
ecommerce conversion rate = transactions / sessions
If you need to merge two or more sets of numbers and they include calculated metrics like averages, rates or percentages, make sure you combine them the right way. Don’t be put off by complicated formulas for weighted averages – the problem, and solution, is actually quite simple.
Start with (or calculate) the base metrics, combine (sum) the duplicate rows, and recalculate the calculated measures. Simple operations for accurate reporting.