Google Analytics has had a bit of a referral spam problem over the past year, and although you can install filters in your account to prevent most of it (see the Definitive Guide to Removing GA Spam), you can’t apply them retroactively. The good news is that you can use a little-known feature of Analytics Edge to strip it out of your Excel reports.
In order to exclude spam visits from your Google Analytics reports, you would need to apply a Google Analytics Segment with the required filter expressions. While you could create this segment in your account and select it in the Analytics Edge wizard, it can be tedious to maintain the expressions in GA so that you can report in Excel.
The Free and Pro versions of the Google Analytics connectors from Analytics Edge include the ability to specify a custom dynamic segment, so you can embed the expressions right in your worksheet. If you are reporting for multiple clients, you can quickly create custom expressions by modifying worksheet cells instead of bouncing between the Google Analytics web interface and your report.
Referencing a Segment in a Worksheet Cell
Download the sample workbook: GoogleAnalytics-Dynamic-Spam-Filtering-1-0.xlsx
While you can enter a custom segment definition right in the query wizard, you can also reference an Excel worksheet cell. When you select the DYNAMIC segment, a second wizard will open – select Custom and click the button to add the cell reference.
Building a Spam Filter Segment
According to the Definitive Guide, you should use a Valid Hostname filter as well as several Spam Crawler filters. The filter expression must be entered in a way that the Google Analytics API can understand it, and the API limits any single expression to 128 characters, so multiple expressions will be needed.
I have simplified the process in the sample workbook using an Excel formula to concatenate several worksheet cells. The structure of the expression looks like this (formatted for clarity):
sessions::condition::ga:hostname=~<my-hostnames-expression> ;condition::!ga:source=~<expression1> ,ga:source=~<expression2> ,ga:source=~<expression3> (and so on...)
The Excel formula uses cell references for each of the expressions, and it looks like this:
="sessions::condition::ga:hostname=~"&B4&";condition::!ga:source=~"&B6&",ga:source=~"&B7&",ga:source=~"&B8&",ga:source=~"&B9&",ga:source=~"&B10
NOTE: this formulas assumes that you have 5 rows of filter expressions (B6, B7, B8, B9 and B10) as in the example workbook. If you have more or less, adjust the formula accordingly!
Since the API limits expressions to 128 characters while the web interface can handle 256, I had to split up my published spam crawler filters for the sample workbook. They are up to date as of 2016-01-19, but you can expand them as needed over time.
Spam-Free Reporting
Modify the Valid Hostname expression to suit your website, and you should be good to go. The Spam Filters worksheet can be copied to other workbooks and a dynamic segment cell reference added to each of your existing queries.
Quick and easy spam filtering from Analytics Edge.
Cautionary Note About Using Segments in Queries
If the time period you are reporting for will involve more than 500,000 sessions in total (unfiltered), the Google Analytics sampling may affect your results. Analytics Edge has a feature that can minimize sampling, but you will have to have a date dimension in your query.