BigQuery allows to define nested and repeated fields
in a table. Although this is very powerful, it makes it much more complex to retrieve the
data if one is not used to such structures. Especially beginners tend to use an
on the nested fields, followed by a huge
GROUP BY statement on the not-originally-repeated fields. Imho, using
is oftentimes the better approach here.
SELECT id, (SELECT value from t.repeated_fields LIMIT 1)
FROM table t
Caution: When using expression subqueries, you need to make sure that the result is a single value (scalar or array), otherwise you will
get the error message
Scalar subquery produced more than one element
In the example code above this is ensured by enforcing one result via
<script src=”https://gist.github.com/paslandau/03c73ee5eef2ce217af82a8f7edcb125.js”><script src=”https://gist.github.com/paslandau/03c73ee5eef2ce217af82a8f7edcb125.js”>
Run on BigQuery
Open in BigQuery Console
The most prominent use case is probably the BigQuery export schema of Google Analytics.
To be honest, I also feel that the schema is not very friendly for newcomers with its ~30 RECORD-type (nested) fields and 300+ columns.
In a nutshell, each row represents one session.
A session consists of multiple hits. Those hits are also available in the nested and repeated
hits field. But wait, there is more…
Each hit can have a number of so called
customDimensions (meta data that can be attached to each hit). So the resulting table structue looks something
- hits - field_1 - field_2 - customDimensions - index - value
The following example uses the public Google Analytics sample dataset for BigQuery and shows
a couple of sample expression subqueries
SELECT fullVisitorId, visitStartTime, TIMESTAMP_SECONDS(visitStartTime) as started_at, TIMESTAMP_SECONDS(visitStartTime + CAST( (SELECT time from t.hits ORDER BY hitNumber DESC LIMIT 1) /1000 AS INT64)) as ended_at, (SELECT COUNT(*) from t.hits) as hit_count, (SELECT page.hostname || page.pagePath from t.hits WHERE isEntrance = TRUE) as landing_page, ( SELECT (SELECT COUNT(*) from h.customDimensions) FROM t.hits h WHERE hitNumber = 1 ) as customDimension_count_of_first_hit,
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801` t
ORDER BY visitStartTime asc