BigQuery: Use expression subqueries for querying nested and repeated fields – Pascal Landau

BigQuery allows to define nested and repeated fields
in a table. Although this is very powerful, it makes it much more complex to retrieve the
data if one is not used to such structures. Especially beginners tend to use an
UNNEST statement
on the nested fields, followed by a huge
GROUP BY statement on the not-originally-repeated fields. Imho, using
expression subqueries
is oftentimes the better approach here.

Code

SELECT id, (SELECT value from t.repeated_fields LIMIT 1)
FROM table t 

Caution: When using expression subqueries, you need to make sure that the result is a single value (scalar or array), otherwise you will
get the error message

Scalar subquery produced more than one element

In the example code above this is ensured by enforcing one result via LIMIT 1.

Working Example

<script src=”https://gist.github.com/paslandau/03c73ee5eef2ce217af82a8f7edcb125.js”><script src=”https://gist.github.com/paslandau/03c73ee5eef2ce217af82a8f7edcb125.js”>

Run on BigQuery

Open in BigQuery Console

BigQuery Console: How to use expression subqueries for nested and repeated fields example

Links

Use cases

The most prominent use case is probably the BigQuery export schema of Google Analytics.
To be honest, I also feel that the schema is not very friendly for newcomers with its ~30 RECORD-type (nested) fields and 300+ columns.

In a nutshell, each row represents one session.
A session consists of multiple hits. Those hits are also available in the nested and repeated hits field. But wait, there is more…
Each hit can have a number of so called customDimensions (meta data that can be attached to each hit). So the resulting table structue looks something
like this:

- field_1
- field_2
- hits - field_1 - field_2 - customDimensions - index - value 

The following example uses the public Google Analytics sample dataset for BigQuery and shows
a couple of sample expression subqueries

SELECT fullVisitorId, visitStartTime, TIMESTAMP_SECONDS(visitStartTime) as started_at, TIMESTAMP_SECONDS(visitStartTime + CAST( (SELECT time from t.hits ORDER BY hitNumber DESC LIMIT 1) /1000 AS INT64)) as ended_at, (SELECT COUNT(*) from t.hits) as hit_count, (SELECT page.hostname || page.pagePath from t.hits WHERE isEntrance = TRUE) as landing_page, ( SELECT (SELECT COUNT(*) from h.customDimensions) FROM t.hits h WHERE hitNumber = 1 ) as customDimension_count_of_first_hit,
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801` t
ORDER BY visitStartTime asc

BigQuery: Use “temporary tables” via WITH (named subqueries) – Pascal Landau

In Google BigQuery we can define named subqueries via WITH clauses.
Those WITH clauses are a very comfortable way to structure complex queries as it allows to reference those queries like actual tables later on.

Note: BigQuery also supports actcual temporary tables via CREATE TEMPORARY TABLE. See the official documention on
temporary tables for further infos.
This is out of scope for this snippet, though.

Code

WITH filtered_data as ( SELECT id FROM table WHERE id BETWEEN 5 and 10
)
SELECT *
FROM filtered_data 

Working Example

<script src=”https://gist.github.com/paslandau/662a42456dc9dc77b6cbdb1d6acb8c99.js”><script src=”https://gist.github.com/paslandau/662a42456dc9dc77b6cbdb1d6acb8c99.js”>

Run on BigQuery

Open in BigQuery Console

BigQuery Console: How to use temporay tables via WITH named subqueries example

Links

Use cases

Named subqueries are a great way to structure complex queries and give sub-results a meaningful name.
When working with partitioned tables, I always use temporary tables via WITH to make sure I restrict the query to
scan only a limited number of partitions.

Conceptual example:

DECLARE from_date TIMESTAMP DEFAULT "2018-04-09";
DECLARE to_date TIMESTAMP DEFAULT "2018-04-10"; WITH huge_table_partition as( SELECT * FROM huge_table WHERE _PARTITIONTIME BETWEEN from_date AND to_date
) SELECT *
FROM huge_table_partition

BigQuery: Declare and use Variables – Pascal Landau

We can use variables by defining them with a DECLARE statement,
e.g.

DECLARE foo STRING DEFAULT "foo"; #DECLARE <variable> <type> DEFAULT <value>;

with <type> being one of the BigQuery’s built-in standard-sql data types

This is equivalent to variables of other SQL databases, e.g.

Code

DECLARE foo_var STRING DEFAULT "foo"; SELECT foo_var

Working Example

<script src=”https://gist.github.com/paslandau/0cb51ba9e532a71fff5108f156afd2f5.js”><script src=”https://gist.github.com/paslandau/0cb51ba9e532a71fff5108f156afd2f5.js”>

Run on BigQuery

Open in BigQuery Console

BigQuery Console: How to declare and use variables example

Links

Use cases

Hardcoding variables is generally considered a bad practice as it makes it harder to understand and modify a query.
A frequent use case for me is the definition of date ranges (from and to dates) that are used for querying partitioned tables:

DECLARE from_date DATE DEFAULT DATE("2018-04-09");
DECLARE to_date DATE DEFAULT DATE("2018-04-10"); WITH data as( SELECT 1 as id, DATE("2018-04-08") AS date, UNION ALL SELECT 2, DATE("2018-04-09") UNION ALL SELECT 3, DATE("2018-04-10") UNION ALL SELECT 4, DATE("2018-04-11")
) SELECT id, date
FROM data
WHERE date BETWEEN from_date AND to_date

PHP Internals News: Episode 54: Magic Method Signatures – Derick Rethans

PHP Internals News: Episode 54: Magic Method Signatures

In this episode of “PHP Internals News” I chat with Gabriel Caruso (Twitter, GitHub, LinkedIn) about the “Ensure correct signatures of magic methods” RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode’s MP3 file, and it’s available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I’m Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 54. Today I’m talking with Gabriel Caruso about his ensure correct signatures of magic methods RFC. Hello Gabriel, would you please introduce yourself?

Gabriel Caruso 0:37

Hello Derick and hello to everyone as well. My name is Gabriel. I’m from Brazil, but I’m currently in the Netherlands. I’m working in a company called Usabila, which is basically a feedback company. Yeah, let’s talk about this new RFC for PHP eight.

Derick Rethans 0:52

Yes, well, starting off at PHP eight. Somebody told me that you also have some other roles to play with PHP eight.

Gabriel Caruso 0:59

Yeah, I think last week I received the news that I’m going to be the new release manager together with Sara. We’re going to basically take care of PHP eight, ensuring that we have new versions, every month that we have stable versions every month free of bugs, we know that it’s not going to happen.

Derick Rethans 1:17

That’s why there’s a release cycle with alphas and betas.

Gabriel Caruso 1:20

Yeah.

Derick Rethans 1:21

I’ve been through this exactly a year early, of course, because I’m doing a seven four releases.

Gabriel Caruso 1:25

Oh, nice. Yeah. So I’m gonna ask a lot of questions for you.

Derick Rethans 1:29

Oh, that’s, that’s fine. It’s also the role of the current latest release manager to actually kickstart the process of getting the PHP, in this case, PHP eight release managers elected. Previously, there were only very few people that wanted to do it. So in for the seven four releases it was Peter and me. But in your case, there were four people that wanted to do it, which meant that for the first time I can ever remember we actually had to hold some form of election process for it. That didn’t go as planned because we ended up having a tie twice, which was interesting. So we had to run a run off election for the second person between you and Ben Ramsey, that’s going to go continuing for you for the next three and a half years likely.

Gabriel Caruso 2:11

Yep.

Derick Rethans 2:12

So good luck with that.

Gabriel Caruso 2:13

Thank you. Thank you very much.

Derick Rethans 2:15

In any case, let’s get back to the RFC that we actually wanted to talk about today, which is the ensure correct signatures of magic methods RFC. What are these magic methods?

Gabriel Caruso 2:24

So PHP, let’s say out of the box, gives the user some magic methods that every single class have it. We can use that those methods for anything, but basically, what magic methods are are just methods that are called by PHP w

Truncated by Planet PHP, read more at the original (another 20573 bytes)

Announcing “Thinking Functionally in PHP” – larry@garfieldtech.com


Part 2: Code Is The Most Expensive Solution – Brandon Savage

Editor’s Note: This is the second in a five-part series on refactoring and modernizing PHP applications. If you were to identify the most expensive part of any development project, what would that part be? The planning phase? Acquiring copyright or intellectual property? Marketing and advertising? All of these would be wrong. The most expensive part […]

The post Part 2: Code Is The Most Expensive Solution appeared first on BrandonSavage.net.

Interview with Nicolas Grekas – Voices of the ElePHPant

This episode is sponsored by
Using the WordPress REST API

The post Interview with Nicolas Grekas appeared first on Voices of the ElePHPant.