Understanding Why Google Analytics 4 Is Necessary

The key to understanding the difference between Universal Analytics (previously known as Google Analytics) and Google Analytics 4 is that UA had things like a Session object, and GA 4 does not. Everything becomes so much easier to understand once you realize this fact. It also becomes clear why GA 4 is needed now.

Universal Analytics Started With A Simple Data Model

UA started as a simple way to track page views as ‘hits’ to a tracking server, basically creating a page-by-page log similar to the web server log itself, but without a lot of the noise. To simplify reporting, they added a Session object to keep track of some session-level things, like the landing page, the referral source, or the total time on site.

The tracking server maintained this information by processing the page-level data — this is why your data was not immediately available in the reports. It looked for pages viewed by the same user, and created a Session object with the first page (Landing Page), the next page (Second Page) and even the last page (Exit Page) in the sequence. It calculated the total time from first to last and recorded it as Time on Site (and Session Duration). It noted the total number of pages viewed (Page Depth), kept track of a counter for the number of sessions previously made for this user (Count of Sessions) and even calculated the Days Since Last Session for you.

Reporting was easy since the Session object held all these pre-recorded items. Life was good.

Adapting to Demands

Of course, it quickly became clear that people didn’t just view web pages — they interacted with things on the page, creating the need for a different type of ‘hit’ to be recorded: events. While pageview hits tracked things like the page title and page path, the event hit tracked the category, action and label. But then money became involved, and an ecommerce ‘hit’ was needed, and of course as social networks evolved, they needed their own type of ‘hit’. Each one was tailored to a new type of information that was needed.

Ecommerce got more complicated and another barnacle was added: enhanced ecommerce ‘hits’. There were even a few technical needs like exception and user timing ‘hits’. Realizing that mobile apps needed tracking too, a unique screen view ‘hit’ was added.

With all these new hit types, a series of special reports was needed to show the special values associated with each hit type. Custom metric and dimensions were added and expanded. People were finding things that didn’t fit, and they shoehorned them into existing hit types. As things start shifting more heavily towards mobile, this big boat full of barnacles was simply not meeting the needs of new cross-platform applications and interactive web experiences.

Enter Google Analytics 4. 

Google Analytics 4: Structure Without The Structure

So how do you build a new measurement system without creating a new boat that will grow barnacles in a year or two? Answer: don’t force a structure on it. Go back to basics, and simply record hits…but we’ll call them ‘events’ because it sounds better. Make each hit, uh, event customizable so it can hold any type of information — a page title, a link url, or an ecommerce  product name.

To do this technically, you need to give every event a name (event_name) with as many custom parameters as needed. To do that, you would need to capture the parameter name (event_params.key) and value (event_params.value). There could be different type of values, too, like text or numbers. The event would need to hold an unrestricted number of these key-value combinations. This is not a simple table of rows and columns like a spreadsheet  — it is a hierarchy.

The challenge with this type of structure-less structure is reporting. “Standard” reports become impossible without everyone using the same event names and parameter names. This is why Google pre-populates most of the common event types and publishes clear guidance for creating new ones — they need to do that so they can provide a few core reports.

Reporting Challenges

Without the structure they had in UA, you lose all of those specialty reports for each hit type – the sessions, pages, events and goals. A few of them can be recreated using simple event name filters (which they have done), but some things are simply not possible any more. The landing page, for example, can be extracted from the page_location parameter in the session_start event. But without a Session object, the second page or exit page requires a complex database operation, so you don’t see any reporting options for those.

Generic reporting gives way to custom reporting. With the hierarchical data model, getting anything different quickly becomes very complicated. Google’s response is to leverage the technology they available — the BigQuery database can handle the hierarchical data, so they let you export there for free. They also get to put their Artificial Intelligence and Machine learnings to work generating Insights widgets.

Make no mistake: navigating customized hierarchical data is complicated. They have provided some new “exploration” tools like funnel and path exploration templates, but these feel like they were designed for data scientists, not marketing people that check on their websites every now and then. They are playing with the new Insights feature, but it is just touching the surface right now — there is a lot of growth needed in the use of this new technology.

Introducing User Engagement

I have not touched on a number of other new technologies built into this new offering, most notable is the measurement of user engagement, a thing that UA was extremely bad at. GA 4 uses newer browser capabilities to track how long your content is in the foreground — actually being looked at. It also tags sessions as ‘engaged’ if there was at least 10 seconds in the foreground, another page click or a conversion. This fixes a long standing problem with the bounce rate, which really only tracked single page sessions, never whether the visitor spent any time looking at the page. It also generates a more accurate time-on-page measurement.

Less Spam By Design

Speaking of annoyances with UA, Google has closed one of the loopholes in UA allowing spammers to stuff fake traffic into your analytics reports without ever visiting your website — the “measurement protocol”. Anyone from anywhere could send a ‘hit’ to Google’s tracking server for your website, and the data would simply appear in your reports. No one considered security back when that capability was designed, but spammers discovered it and that put Google into the position of policing all the hits to try to keep the fake traffic at bay…and sometimes they let a little [a lot] slip through.

With GA 4, the new ‘measurement protocol’ requires a new ‘measurement id’ secret for each web property (data stream), which will effectively kills that vector for spammers. Sigh of relief.

Bottom Line

The push back against Google Analytics 4 is understandable – it is just leaving the early adopter phase and not ready for the masses. But they don’t really have much choice – UA has been barnacled to the point it is sinking. It is time to move to a new boat, and we’ll have to work out the kinks along the way.

If you had a look at GA 4 and decided against it, I urge you to reconsider. At least add GA 4 tracking to your site along with UA tracking – they can live happily side-by-side. It is inevitable that the offering will evolve to better meet your needs, and you want to have some historical data available for when you are ready to switch.