How the Blog Cohort Analysis Macro Works

If you’re trying to use the free report Cohort Analysis for Blog Articles, but don’t have the same blog path, here’s a behind the scenes look at how the macro was made so you can adapt it to your own article path.

The macro in this workbook extracts the year and month from the page path, so if your URLs do not have year and month components, this isn’t going to work for you. Let’s step through the macro to see how it works. Remember that you can step through it to see if your results make sense, or if something needs to change to fit your website URL structure.

Each function step is highlighted below, along with a description of what that function does in the overall process. Every function is easily edited (see: Task Pane), so you can make the report your own.

step-through-macro-300x164>> GA Reports

Makes the query to get the data we need. Edit this function to specify the account and Google Analytics view you want to use. It is set to use your default settings, which is convenient for me, but you really should set it to a specific view to avoid reporting the wrong web property by mistake some day in the future (if your default selection changes).

The Filters tab in the wizard sets a filter to restrict the response to content where the Page dimension matches a regular expression of “/20../”, meaning any pages with /2000/ through /2099/ in it, but it might also match /20AA/ or similar if your site has such things.

If your website has more than 500,000 sessions in a year, you should turn on the “minimize sampling” option (Options tab), which will cause 12 queries to be run, one for each month to reduce the sampling done by Google Analytics.

Join

In the end report, we want a column of links that let us go to the actual article. Google’s Page dimension does not include the hostname component (https://www.analyticsedge.com), so we have to join the Hostname dimension with the Page dimension to build one. Note that it is still missing the “https://” part, but we will add that later.

www.analyticsedge.com/2014/08/cohort-analysis-blog-articles/

Replace (.*/)(20../..)(/.*)

Now that the Page column is no longer needed, we’ll use it to determine the year and month that the article was published. This takes a series of steps, the first of which is to extract the 2014/08 portion from the URL, and a regular expression replacement does the trick.

If you don’t have a year and month component in your URL, then you will need to remove these steps.

2014/08

Split (2 separate functions)

Next we split the Month of Year column (201409) into year (2014) and month (09) portions using the fixed number of characters option (resulting in Month of Year-1 and Month of Year-2), then split the Page column (2014/08) using the “/” character (resulting in Page-1, Page-2, etc). The result is 4 new columns with the year and month components we need for a cohort analysis — the month being measured, and the month the article was published.

split-columns

Convert by name

Note that the 4 new columns are text, and to do any calculations with them, we need to convert them into numbers. Convert can easily do this.

Calculate (5 steps)

Now a small math challenge: having the year and month the article was published and the year and month when pageviews were counted, calculate the difference expressed in months. Several ways you could tackle this; I multiplied the years by 12, then added the associated months column, then got the difference between the two dates (in months).

Sort

Just before I pivot the table by the Age, I sort the Age column in ascending order. This causes the pivot operation to put the columns (Ages) in ascending numerical order — they get created in the order they are ‘discovered’ as the pivot takes place.

Pivot

Here I create new columns for each unique age value and collapse the table to show the number of pageviews for each page versus the age of the article.

Total columns

The next step is to decide how to order the results, and you have a choice here. I created a ‘Peak’ column with the maximum value for each row. When I sort this column in descending order, it will put the article with the single highest peak interest (pageview count) at the top. I wanted to see the most popular article for any single month at the top.

peak-pageviews

An alternative is to use a Total (sum) function to show the article with the most pageviews for the entire year at the top. I found that favoured the old articles and dropped the popular new articles way down the list. Choose what suits your website goals and reporting desires.

Sort

Sort the pivoted table using column D values. Note that I used the Sort – By Position option instead of By Name to make it easier for you to change the ‘Peak’ column into a ‘Total’ column and not have to worry about the column name being different.

Keep top 50

To avoid a huge list of irrelevant numbers and uninteresting page stats, truncate the list. Pick a number…any number that suits your site.

Arrange by position

This function lets me rename the ‘Page’ column to what it now represents — the date Published.

Replace *

Finally, I need to add that “https://” to all the values in the Link column so that the links will be clickable.

WriteToWorksheet Cohort Analysis

Time to commit your result to a worksheet. In this case, I selected the options to not clear the cell formatting — this allows me to use conditional formatting in the results without having the macro wipe it out when refreshing the data.

I also selected an option to add hyperlinks to the web addresses, which allows you to click a link in the report to see the actual page. Note: you may need to select the cells in this column and change the Excel number format to ‘General’ from ‘Text’ if it is displaying a formula (=HYPERLINK…).

cohort-analysis

That’s it! You can change any of the steps to suit your website URL structure, change the math, or change the sort order to your preference. Heck, you can change anything you want!