Updated 2021-03-27 after a deep dive into additional metrics…
As reported in one of my Misunderstood Metrics articles, one of the quirks we discovered in Google Analytics is that sessions that span midnight are restarted on the first hit after midnight, even though the session id (and Count of Sessions dimension value) doesn’t change. But what quirks await us in Google Analytics 4?
Restarting the session after midnight was probably done historically in Urchin (GA#1) to simplify session count totals for each day. While the day’s sessions counts might be correct, a lot of other metrics like entrances, exits, pages/session, and bounces were affected.
With multi-day reporting, the sessions were double-counted and the other metrics were meshed. If you used custom events, it was also possible to see sessions that had no pageviews (just events after midnight), which has confused a lot of people.
If you have a lot of visitor traffic around midnight, this problem would become noticeable when you (or your client or your boss) look closely at your reports.
Google Analytics 4 Does Not Start A New Session After Midnight
In Google Analytics 4, sessions spanning midnight do NOT trigger a second ‘session_start’ event, but other things may happen behind the scenes that can equally confuse your reporting.
…Or Does It?
Imagine my surprise when I compared the web reports for this user. Google Analytics 4 shows 2 sessions over the 2 days! WHAT???
AND where did the 3m 44s engagement time come from — the event report showed only 14 seconds of engagement?
AND the total engagement time is different from the one value shown (3m 51s vs 3m 44s)?
AND a row with ‘(not set)’ source with no sessions and no engagement? Where did that come from?
A Deeper Dive Into The Data
OK, it was partially my fault — highly filtering reports to isolate a single user can overlook unexpected events. In this case, it seems the user came back to his open browser tabs and ‘engaged’ with 2 of the pages 8 hours later, correctly triggering a new ‘session_start’ event.
Light gray: second session.
That session explains the second session is the 2-day report.
Engagement of 448 seconds is 7m 28s, half of which is 3m 44s — that explains another number.
Add another 14s of reported engagement from the previous session, and you get 7m 42s, over 2 sessions would yield an average of 3m 51s — another mystery number explained!
And notice the first session starts with a source of ‘google’ but it drops to ‘(not set)’ after midnight? That explains the existence of the ‘(not set)’ row.
Midnight Spanning Quirks Remain In GA 4
As I looked at the data, there still appears to be some unanswered questions. What follows might be a mind-bender for some people, so let me just say that, contrary to all the hype, GA 4 is not that different from the previous version – do not expect accurate reporting if the session spans midnight.
Where did the 14s engagement time go for the ‘…misunderstood-metrics-new-…’ page?
Why did the source change to ‘(not set)’ after midnight?
Since the new Data API is still in v1alpha status, and the web interface is not much better for detailed analysis, I grabbed the gory details from the BigQuery export, and suddenly GA and GA4 look a lot more similar…
Orange background: first session. Red font: first day.
The first page view after midnight is tagged as an ‘entrance’ (same as in GA) and ‘ignore_referrer’ is ‘true’. I assume this translates into the report labelling the source as ‘(not set)’ for these page views. But it seems an ‘entrance’ does not become a session in the reports (both sessions had a source of ‘google’). Just be aware that an entrance does not always identify the ‘session’ landing page.
Since the second session did not include any ‘page_view’ events, there are no entrances. I think the only reason it showed up with the ‘google’ source is because the ‘user_engagement’ event was recorded on the page that came from the first day.
I also notice that there is no engaged_session_event associated with the day 2 engagement in the session that spans midnight, even though the engagement time was recorded.
Wild conjecture: they hold the session_start event until the session ends and log it with the ‘session_engaged’ value. When a session spans midnight, they record what they have so far and start a new buffer. Since the session doesn’t start in day 2, there’s no record in the buffer and no way to record the session was engaged, so it just drops it. Result: no engaged session in day 2 and no way to tell day 1 that the session it recorded was actually engaged.
Page-Level Reporting Differences
When looking at page-level reporting, things line up much better, and the numbers can be easily explained (assuming you know about the 8-hour-later page engagements).
It is worth noting that both systems lose the time on page/engagement time across the midnight hour – that is unfortunate.
Another quirk: note that the Average engagement time at the top is not an average of the individual page engagement times displayed — it is the average engagement time per session. They leave it to you to figure out how the average can be higher than any of the individual values…which won’t be so easy to figure out in a report with lots of sessions and pages.
Summary
The bottom line is that Google Analytics 4 delivers better page-level reporting; specifically, the new engagement-time metric is much better that the old time-on-page numbers were.
The introduction of user engagement events means that delayed sessions without page views (as I experienced) may become more common. People return to the browser tabs they opened earlier, read for a bit and close it ==> new session with user engagement but no page view,
As for session-level reporting, there are still quirks for sessions that span midnight. For most people, this should be negligible, but for anyone with a significant number of sessions spanning midnight, assume your numbers can be slightly off, and the only way to explain inconsistencies is to get into gory, gory detail…and you don’t want to go there without some pretty good analysis tools.