How the Blog Cohort Analysis Macro Works

If you’re trying to use the free report Cohort Analysis for Blog Articles, but don’t have the same blog path, here’s a behind the scenes look at how the macro was made so you can adapt it to your own article path.

First of all, it works by extracting the year and month from the page path, so if your URLs do not have year and month components, this report won’t work for you. Let’s step through the macro and see what makes it work. You can step through it yourself to see if your results are consistent or where you might need to make changes.

step-through-macro-300x164>> GA Reports

Makes the query to get the data we need. Edit this function to specify the account and Google Analytics view you want to use. It is set to use your default settings, which is convenient for me, but you really should set it to a specific view to avoid reporting the wrong web property by mistake some day in the future (if your default selection changes).

The Filters tab sets a filter to restrict the response to content where the Page dimension matches a regular expression of “/20../”, meaning any pages with /2000/ through /2099/ in it, but it might also match /20AA/ or similar if your site has such things.

Note that the “minimize sampling” is checked on the Options tab, which will cause 12 queries to be run, one for each month over the past year. If you have a low-volume blog site (under 500,000 sessions per year), you can turn off this option to speed up the refresh (making only 1 query for the whole period).

Join

In the end result, we have a column of links that let us go to the actual article. Google’s Page dimension does not include the hostname component (http://www.analyticsedge.com), so we have to join the Hostname column with the Page column to build one. Note that it is still missing the “http://” part, but we will add that later.

www.analyticsedge.com/2014/08/cohort-analysis-blog-articles/

Replace (.*/)(20../..)(/.*)

Now that the Page column is no longer needed, we’ll use it to determine the year and month that the article was published. This takes a series of steps, the first of which is to extract the 2014/08 portion from the URL, and a regular expression replacement does the trick.

2014/08

Split (x2)

Next we split the Month of Year column (201409) into year (2014) and month (09) portions using the fixed number of characters option, then split the Page column (2014/08) using the “/” character. The result is 4 new columns with year and month components we need for the cohort analysis — the month measured and the month published.

split-columns

Convert by name

Note that the 4 columns are text, and to do any calculations with them, we need to convert them into numbers.

Calculate (x6)

Now the challenge: having the year and month the article was published and the year and month when pageviews were counted, calculate the difference in months. Several ways you could tackle this; I divided the months by 12 to create fractions of a year, then added that to the years, then got the difference, and finally multiplied by 12 to get it in months. [In hindsight, if I had multiplied the years by 12, I could have kept it all in months and saved a step and avoided the next step].

age-calculation

Convert by name

Since I divided by 12 and took a difference, when I multiplied by 12 again the numbers sometimes didn’t come to nice clean values (like 0.99999999). To fix this, I used the Rounding function of the Convert function to round off to the nearest month.

Sort

I am going to want to pivot the table by the Age, but before I do that, I sort on the Age column in ascending order. This causes the pivot operation to put the columns in ascending numerical order — they get created in the order they are ‘discovered’ as the pivot takes place.

Pivot

Here I create new columns for each unique age value and collapse the table to show the number of pageviews for each page versus the age of the article.

Total columns

The next step is to decide how to order the results, and you have a choice here. I created a ‘Peak’ column with the maximum value for each row. When I sort this column in descending order, it will put the article with the single highest peak interest (pageview count) at the top. I wanted to see the most popular article for any single month at the top.

peak-pageviews

An alternative is to use a Total (sum) to show the article with the most pageviews for the entire year at the top. I found that favored the old articles and dropped the extremely popular new articles way down the list. Choose what suits your website goals and reporting desires.

Sort

Sort the pivoted table using column D values. Note that I used the By Position option instead of By Name to make it easier for you to change the ‘Peak’ column into a ‘Total’ column and not have to worry about the column column name being different.

Keep top 50

Don’t really want a huge list of hundreds of pages, do you? Pick a number…any number.

Arrange by position

This function lets me rename the ‘Page’ column to what it now represents — the date Published.

Replace *

Finally, I need to add that “http://” to all the values in the Link column.

WriteToWorksheet Cohort Analysis

Time to share your results with the world and write them to a worksheet. In this case, I selected the options to NOT clear the cell formatting — this allows me to use conditional formatting in the results without having the macro wipe it out when refreshing the data.

I also selected to add hyperlinks to the web addresses, which allows you to click a link in the report to see the actual page. Note: you may need to select the cells in this column and change the Excel number format to ‘General’ from ‘Text’ if it is displaying a formula (=HYPERLINK…).

cohort-analysis

That’s it! You can change any of the steps to suit your website URL structure, change the math, or change the sort order to your preference. Heck, you can change anything you want!