This Analytics Edge report shows the pages on your site that appear in Search Analytics reports even though they are not in your sitemaps.
This is similar to the Active Pages report, but shows content that Google found that is missing from the sitemaps.
Creating the report with Analytics Edge
The Macro Part 1 – Process the sitemaps
The first part of the Analytics Edge macro processes the sitemaps. Reading sitemaps with Analytics Edge is explained in more detail here.
Multiple sitemap URLs can be entered into the spaces provided in the Setup worksheet. The ReadWorksheet, Arrange, and WriteToWorksheet functions copy this list to the _sitemaps worksheet to be processed. It then filters out the list and writes a blank list to the _pages worksheet, clearing any previous pages left over from previous runs.
The _sitemap sheet is read, and if there are any XML URLs listed, they are processed by another macro (_get_pages_from_sitemaps) described below (image). This other macro reads each sitemap and splits the results into _sitemaps or _pages depending on what it finds.
That 2-step process is repeated twice more to capture the pages from hierarchical sitemaps 3-levels deep, which should be enough for more sites.
The Macro Part 2 – Process the click data
The macro then makes a Google Search Analytics query to get the pages appearing in search. A Match function removes the pages that were found in the sitemap (image).
The Sort function reorders by clicks and impressions descending before WriteToWorksheet commits the data into the worksheet.
Note that you could insert a Filter function if you find too many ‘pages’ reported with unique query parameters — they usually have a very small number of impressions.
The Macro Part 3 – Getting pages from the sitemaps
The ‘_get_pages_from_sitemaps’ macro is a Repeat Macro, which loops through the entries of the _sitemap worksheet. It uses the ReadTextFile function to read the sitemap URL from the internet, then uses Split and Arrange functions to extract the URLs from the XML-formatted content.
TableName is used to hold that list, then it is filtered for other XML URLs (more sitemaps), and they are appended to the _sitemap worksheet. The sitemap being processed is then removed from the _sitemap worksheet by a ReadWorksheet-Filter-WriteWorksheet sequence.
TableName is used to switch back to the unfiltered list of URLs, then any XML sitemap references are removed, leaving only page URLs. Those are appended to the worksheet _pages.
The Repeat Macro function then repeats all of that for the next sitemap.
The listing may reveal navigation pages that are not in your sitemap. These ‘pages’ may detract from or help your SEO efforts — but now you know they exist.
Analytics Edge makes it easy to explore your data in new ways.