Find out what it takes to build a successful enterprise web application.
BigQuery allows to define nested and repeated fields
in a table. Although this is very powerful, it makes it much more complex to retrieve the
data if one is not used to such structures. Especially beginners tend to use an
on the nested fields, followed by a huge
GROUP BY statement on the not-originally-repeated fields. Imho, using
is oftentimes the better approach here.
SELECT id, (SELECT value from t.repeated_fields LIMIT 1) FROM table t
Caution: When using expression subqueries, you need to make sure that the result is a single value (scalar or array), otherwise you will
get the error message
Scalar subquery produced more than one element
In the example code above this is ensured by enforcing one result via
<script src=”https://gist.github.com/paslandau/03c73ee5eef2ce217af82a8f7edcb125.js”><script src=”https://gist.github.com/paslandau/03c73ee5eef2ce217af82a8f7edcb125.js”>
Run on BigQuery
The most prominent use case is probably the BigQuery export schema of Google Analytics.
To be honest, I also feel that the schema is not very friendly for newcomers with its ~30 RECORD-type (nested) fields and 300+ columns.
In a nutshell, each row represents one session.
A session consists of multiple hits. Those hits are also available in the nested and repeated
hits field. But wait, there is more…
Each hit can have a number of so called
customDimensions (meta data that can be attached to each hit). So the resulting table structue looks something
- field_1 - field_2 - hits - field_1 - field_2 - customDimensions - index - value
The following example uses the public Google Analytics sample dataset for BigQuery and shows
a couple of sample expression subqueries
SELECT fullVisitorId, visitStartTime, TIMESTAMP_SECONDS(visitStartTime) as started_at, TIMESTAMP_SECONDS(visitStartTime + CAST( (SELECT time from t.hits ORDER BY hitNumber DESC LIMIT 1) /1000 AS INT64)) as ended_at, (SELECT COUNT(*) from t.hits) as hit_count, (SELECT page.hostname || page.pagePath from t.hits WHERE isEntrance = TRUE) as landing_page, ( SELECT (SELECT COUNT(*) from h.customDimensions) FROM t.hits h WHERE hitNumber = 1 ) as customDimension_count_of_first_hit, FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801` t ORDER BY visitStartTime asc
In Google BigQuery we can define named subqueries via
WITH clauses are a very comfortable way to structure complex queries as it allows to reference those queries like actual tables later on.
Note: BigQuery also supports actcual temporary tables via
CREATE TEMPORARY TABLE. See the official documention on
temporary tables for further infos.
This is out of scope for this snippet, though.
WITH filtered_data as ( SELECT id FROM table WHERE id BETWEEN 5 and 10 ) SELECT * FROM filtered_data
<script src=”https://gist.github.com/paslandau/662a42456dc9dc77b6cbdb1d6acb8c99.js”><script src=”https://gist.github.com/paslandau/662a42456dc9dc77b6cbdb1d6acb8c99.js”>
Run on BigQuery
- Gist on Github
- Example on BigQuery
- Answer to “How to create temporary table in Google BigQuery” on Stackoverflow
Named subqueries are a great way to structure complex queries and give sub-results a meaningful name.
When working with partitioned tables, I always use temporary tables via WITH to make sure I restrict the query to
scan only a limited number of partitions.
DECLARE from_date TIMESTAMP DEFAULT "2018-04-09"; DECLARE to_date TIMESTAMP DEFAULT "2018-04-10"; WITH huge_table_partition as( SELECT * FROM huge_table WHERE _PARTITIONTIME BETWEEN from_date AND to_date ) SELECT * FROM huge_table_partition
We can use variables by defining them with a
DECLARE foo STRING DEFAULT "foo"; #DECLARE <variable> <type> DEFAULT <value>;
<type> being one of the BigQuery’s built-in standard-sql data types
This is equivalent to variables of other SQL databases, e.g.
DECLARE foo_var STRING DEFAULT "foo"; SELECT foo_var
<script src=”https://gist.github.com/paslandau/0cb51ba9e532a71fff5108f156afd2f5.js”><script src=”https://gist.github.com/paslandau/0cb51ba9e532a71fff5108f156afd2f5.js”>
Run on BigQuery
- Gist on Github
- Example on BigQuery
- Answer to “Setting Big Query variables like mysql” on Stackoverflow
Hardcoding variables is generally considered a bad practice as it makes it harder to understand and modify a query.
A frequent use case for me is the definition of date ranges (from and to dates) that are used for querying partitioned tables:
DECLARE from_date DATE DEFAULT DATE("2018-04-09"); DECLARE to_date DATE DEFAULT DATE("2018-04-10"); WITH data as( SELECT 1 as id, DATE("2018-04-08") AS date, UNION ALL SELECT 2, DATE("2018-04-09") UNION ALL SELECT 3, DATE("2018-04-10") UNION ALL SELECT 4, DATE("2018-04-11") ) SELECT id, date FROM data WHERE date BETWEEN from_date AND to_date
A PHP interface defines a contract which a class must fulfill. If a PHP class is a blueprint for objects, an interface is a blueprint for classes. Any class implementing a given interface can be expected to have the same behavior in terms of what can be called, how it can be called, and what will be returned.
The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode’s MP3 file, and it’s available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news
- Derick Rethans 0:16
Hi, I’m Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 54. Today I’m talking with Gabriel Caruso about his ensure correct signatures of magic methods RFC. Hello Gabriel, would you please introduce yourself?
- Gabriel Caruso 0:37
Hello Derick and hello to everyone as well. My name is Gabriel. I’m from Brazil, but I’m currently in the Netherlands. I’m working in a company called Usabila, which is basically a feedback company. Yeah, let’s talk about this new RFC for PHP eight.
- Derick Rethans 0:52
Yes, well, starting off at PHP eight. Somebody told me that you also have some other roles to play with PHP eight.
- Gabriel Caruso 0:59
Yeah, I think last week I received the news that I’m going to be the new release manager together with Sara. We’re going to basically take care of PHP eight, ensuring that we have new versions, every month that we have stable versions every month free of bugs, we know that it’s not going to happen.
- Derick Rethans 1:17
That’s why there’s a release cycle with alphas and betas.
- Gabriel Caruso 1:20
- Derick Rethans 1:21
I’ve been through this exactly a year early, of course, because I’m doing a seven four releases.
- Gabriel Caruso 1:25
Oh, nice. Yeah. So I’m gonna ask a lot of questions for you.
- Derick Rethans 1:29
Oh, that’s, that’s fine. It’s also the role of the current latest release manager to actually kickstart the process of getting the PHP, in this case, PHP eight release managers elected. Previously, there were only very few people that wanted to do it. So in for the seven four releases it was Peter and me. But in your case, there were four people that wanted to do it, which meant that for the first time I can ever remember we actually had to hold some form of election process for it. That didn’t go as planned because we ended up having a tie twice, which was interesting. So we had to run a run off election for the second person between you and Ben Ramsey, that’s going to go continuing for you for the next three and a half years likely.
- Gabriel Caruso 2:11
- Derick Rethans 2:12
So good luck with that.
- Gabriel Caruso 2:13
Thank you. Thank you very much.
- Derick Rethans 2:15
In any case, let’s get back to the RFC that we actually wanted to talk about today, which is the ensure correct signatures of magic methods RFC. What are these magic methods?
- Gabriel Caruso 2:24
So PHP, let’s say out of the box, gives the user some magic methods that every single class have it. We can use that those methods for anything, but basically, what magic methods are are just methods that are called by PHP w
Truncated by Planet PHP, read more at the original (another 20573 bytes)