Are you seeing major data discrepancies in your Google Analytics account? The issue may be due to Google’s use of data sampling. Data sampling is used to reduce load times and increase overall usability. However, this speed sometimes comes at the expense of precision and accuracy, creating difficulties when it comes to making real business decisions based on your analytics data.
Learn more about data sampling and how you can find non-sampled data sets in Google Analytics.
Why Data Sampling Happens
Data sampling only occurs on the reporting end. All data is still collected accurately, but Google occasionally chooses to sample in order to display reports in a more timely fashion. Typically this occurs when there is too much base data for Google to process quickly.
The biggest driver of whether or not Google uses sampling is the date range. If you look at a full year of data, especially if looking it at by day (rather than by week or by month), there is simply too much root data. Google will automatically turn on sampling.
You’ll know data is being sampled when you see the yellow notification in the top right corner of your report.
Why Data Sampling Causes Discrepancies
Once Google determines that it needs to sample data, it will pull a sample dataset from your site’s base data. Each time a report is requested Google pulls a sample data set, which can differ slightly each time it’s pulled. Google always tries to make it a representative sample, pulling consistent, normal data about visits, bounce rate, time on site, etc., to reduce the impact that sampling has on accuracy and create as little error in the data as possible.
Sometimes, however, the system can’t pull a consistent sample. This could be due to high variation in a particular metric during the date range in question. Or it could be due to an overall low volume of whatever metric is in question in proportion to visits. For example, if your site has an extremely low transaction count compared to total visits, Google can’t confidently provide a sampling. In this example, some of the sampled data may be accurate, such as engagement metrics or visit counts, but revenue can be drastically different.
How to Get Accurate Data
To avoid data sampling, you can try one of three things in Google Analytics: 1. Use a different report; 2. Shorten the date range; or 3. Control the sampling level.
Using a Different Report
Oftentimes, analysts will start at the top, viewing the All Traffic - Channels report. This is a great report for comparisons between the major pre-defined channels. However, once you begin to drill down to campaign level metrics, it becomes less and less useful. This is when sampling kicks in because the report is requesting and filtering a massive amount of data.
Instead of starting with All Traffic, get a little closer to the end goal by view Source / Medium reports. Then use campaign as the primary metric or add it as a secondary metric. Sampling should not be turned on when you view reports in this way.
Sampling is also very common when requesting ad-hoc changes to a standardized report. Building a custom reports can solve this problem by isolating standard sets of data, which can be viewed on a consistent basis.
Shortening the Date Range
This solution doesn’t need a whole lot of explanation. If you’re viewing a full year’s worth of data, simply change the date range to a shorter time period.
Controlling the Sampling Level
You do have the option of increasing or decreasing the “level” of sampling that occurs. When you see the notification alerting you that sampling is engaged, click on the grid in the top left. This will open a dialogue option to increase or decrease the sample size. Use the slider to move towards faster processing (smaller sample size) or higher precision (larger sample size). The default sample size is 250,000 visits.
When you’re using Google Analytics data to make important business decisions, it’s important that that data is as accurate as possible. Ensure that your reports are using all data, not just a sample, to make certain that your decisions are backed by detailed and definitive numbers.