The idea is that Google, through the Chrome team and tools they make available to developers, has collected real user metrics (RUM) for billions of sites around the world and has grouped them according to country.
The first example uses BigQuery to search for all unique origins in Chile as of December 2019.
SELECT COUNT(DISTINCT origin) FROM `chrome-ux-report.country_cl.201912`
let’s unpack the query referencing Rick Viscomi’s Using the Chrome UX Report on BigQuery
SELECT COUNT(DISTINCT origin) means querying for the number of unique origins in the table. Roughly speaking, two URLs are part of the same origin if they have the same scheme, host, and port.
FROM `chrome-ux-report.country_cl.201912` specifies the address of the source table, which has three parts:
- The Cloud project name chrome-UX-report within which all CrUX data is organized
- The dataset
country_cl, representing data from Chile (county code CL)
- The table 201912, representing December 2019 in
With slight modifications, we can get a list of the URLs for the unique origins. We remove
COUNT and leave the select statement as
SELECT DISTINCT origin FROM `chrome-ux-report.country_cl.201912`
You can see a detailed preview of the data in BigQuery. Note that BigQuery is moving to the Google Cloud Console and that the URL may change as a result.
Of particular importance are the performance metrics available as part of the report:
- first_paint (FP)
- first_contentful_paint (FCP)
- dom_content_loaded (DCL)
- onload (OL)
- experimental.first_input_delay (FID)
The data for each metric is organized as an array of objects that we can capture as
The following example will pick up the sum of all First Contentful Paint density values and assign them to the variable
It will pull the data from the 2019/12 report and will flatten all the values in
The matching conditions are: the origin has to be
https://www.vidasecurity.cl/ and the
fcp.start value has to be greater than 10000, meaning that the site has to take 10 seconds or more to load.
SELECT SUM(fcp.density) AS fcp_density FROM `chrome-ux-report.country_cl.201912`, UNNEST(first_contentful_paint.histogram.bin) AS fcp WHERE origin = 'https://www.vidasecurity.cl/' AND fcp.start > 10000
The following example, unlike the other ones, selects multiple elements from a specific table to get more fine-grained information from the data.
This example asks the question: in the country_cl.201912 report, how many origins took more than 20 seconds (20000 milliseconds) to load on a phone. List the results by origin
SELECT origin, fcp FROM `chrome-ux-report.country_cl.201912`, UNNEST(first_contentful_paint.histogram.bin) AS fcp WHERE form_factor.name = 'phone' AND fcp.start > 20000 ORDER BY origin
And this is the equivalent query for desktop connections taking more than 2000 seconds to start.
SELECT _TABLE_SUFFIX AS yyyymm, AVG(fcp.density) AS fast_fcp FROM `chrome-ux-report.country_cl.*`, UNNEST(first_contentful_paint.histogram.bin) AS fcp WHERE form_factor.name = 'desktop' AND fcp.start > 20000 GROUP BY yyyymm ORDER BY yyyymm
With these two queries, we can start making comparisons and predictions about the data before we jump into more in-depth queries.
Some of the questions I had about the data:
- How much do numbers change over the years?
- Is there a significant difference between desktop and mobile values?
Using these questions as a starting point we can dig deeper into the general data or query specific sites for more information. If we want, we can save the data as JSON and use a visualization library like D3 to generate graphical representations of the data or save it as CSV to manipulate on Excel or Google Sheets.
Once you’ve got all the answers you can get out of the CrUX dataset you can move to the HTTP Archive dataset. This dataset is a far more comprehensive both in the breadth of the data it collects and the frequency that it collects the data.
For more information on how to use the HTTP Archive BigQuery dataset see Getting Started Accessing the HTTP Archive with BigQuery by Paul Calvano.