PHP Internals News: Episode 56: Mixed Type v2 – Derick Rethans

PHP Internals News: Episode 56: Mixed Type v2

In this episode of “PHP Internals News” I chat with Dan Ackroyd (Twitter, GitHub) about the Mixed Type v2 RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode’s MP3 file, and it’s available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:20

Weekly a podcast dedicated to demystifying the development of the PHP language. This is Episode 56. Today I’m talking with Dan Ackroyd about an RFC that he’s made together with Mate Kocsic it’s called the mixed type version two. Hello, Dan, would you please introduce yourself?

Dan Ackroyd 0:38

Hi Derick. So my name is Dan Ackroyd, also known as Dan Ack online. I maintain the PHP image extension. And I also contribute to PHP internals illegitimate by maintaining some documents that called the RFC codecs that are a set of notes of why certain ideas haven’t reached fruition in PHP core, and occasionally I help other people write RFCs.

Derick Rethans 1:04

Continuing with the improvement of PHP type system in the last few releases. And we’ve seen a few more things coming into PHP eight but union types. For a long time, there has been an issue with PHP’s internal functions that the type that a return cannot necessarily be represented in PHP type system because they do strange things. It is RFC building more on top of PHP’s type system. What is this is trying to solve?

Dan Ackroyd 1:29

There’s a couple of different problems that’s trying to solve. The one I care more about is userland code, I don’t actually contribute that much to internals code so I’m not that familiar with all the problems that has. The reason I got involved with doing the mixed RFC was: I had a library for validating parameters, and due to how that library needs to work the code passes user data around a lot internally, and then back out to whether libraries return the validators result. So I was upgrading that library to PHP 7.4, and that version introduced property types, which are very useful things. What I was finding was that I was going through the code, trying to add types everywhere occurred. And there’s a significant number of places where I just couldn’t add a type, because my code was holding user data that could be any other type. The mixed type had been discussed before, an idea that people kind of had been kicking around but it just never been really worked on. That was the motivation for me, I was having this problem where I couldn’t upgrade my library, as I wanted to, I kept forgetting has this bit of code here, been upgraded. And I just can’t add a type, or is it the case that I haven’t touched this bit of code yet. So coincidentally, I saw that Mate was also looking at picking up the RFC, and he had copied the version that Michael Moravec had been working on previously. I want as I mentioned earlier, I help people write RFCs is for a lot of people where English isn’t their first language, it’s a difficult thing to do writing technical documents in English. I also think that writing RCFs in general is slightly harder than people really anticipate. Each RFC needs to present clearly why something’s a problem, why the proposed solution would work, snd, at least to some extent why other solutions wouldn’t work. Looking at the text from the previous version I could see the tool though, I understood, all of the parts of that RFC, I don’t think that it made the case for why mixed was the right thing to do in a very clear way. So I spent some time working with Mate to redraft the RFC, discussing it between ourselves and going through a few of the smaller issues before presenting it to internals, for it to be officially discussed as an RFC.

Derick Rethans 3:51

Where does th

Truncated by Planet PHP, read more at the original (another 24663 bytes)

Kindo Acquired by MyHeritage – Demian Turner

I didn’t get a chance yet to blog about our last TechCrunching, but Kindo, the startup I co-founded in March 2007, today announced its sale to MyHeritage, the biggest player in the family tree space.

Kindo is a PHP social net app built on the Seagull framework and other open source software.  At peak popularity our users were building 38k profiles/day and we acquired more than 1m profiles in our first 10 weeks.

Hats off to the Kindo team and to the Kindo devs who don’t appear in the TC photo.

BigQuery: Use expression subqueries for querying nested and repeated fields – Pascal Landau

BigQuery allows to define nested and repeated fields
in a table. Although this is very powerful, it makes it much more complex to retrieve the
data if one is not used to such structures. Especially beginners tend to use an
UNNEST statement
on the nested fields, followed by a huge
GROUP BY statement on the not-originally-repeated fields. Imho, using
expression subqueries
is oftentimes the better approach here.

Code

SELECT id, (SELECT value from t.repeated_fields LIMIT 1)
FROM table t 

Caution: When using expression subqueries, you need to make sure that the result is a single value (scalar or array), otherwise you will
get the error message

Scalar subquery produced more than one element

In the example code above this is ensured by enforcing one result via LIMIT 1.

Working Example

<script src=”https://gist.github.com/paslandau/03c73ee5eef2ce217af82a8f7edcb125.js”><script src=”https://gist.github.com/paslandau/03c73ee5eef2ce217af82a8f7edcb125.js”>

Run on BigQuery

Open in BigQuery Console

BigQuery Console: How to use expression subqueries for nested and repeated fields example

Links

Use cases

The most prominent use case is probably the BigQuery export schema of Google Analytics.
To be honest, I also feel that the schema is not very friendly for newcomers with its ~30 RECORD-type (nested) fields and 300+ columns.

In a nutshell, each row represents one session.
A session consists of multiple hits. Those hits are also available in the nested and repeated hits field. But wait, there is more…
Each hit can have a number of so called customDimensions (meta data that can be attached to each hit). So the resulting table structue looks something
like this:

- field_1
- field_2
- hits - field_1 - field_2 - customDimensions - index - value 

The following example uses the public Google Analytics sample dataset for BigQuery and shows
a couple of sample expression subqueries

SELECT fullVisitorId, visitStartTime, TIMESTAMP_SECONDS(visitStartTime) as started_at, TIMESTAMP_SECONDS(visitStartTime + CAST( (SELECT time from t.hits ORDER BY hitNumber DESC LIMIT 1) /1000 AS INT64)) as ended_at, (SELECT COUNT(*) from t.hits) as hit_count, (SELECT page.hostname || page.pagePath from t.hits WHERE isEntrance = TRUE) as landing_page, ( SELECT (SELECT COUNT(*) from h.customDimensions) FROM t.hits h WHERE hitNumber = 1 ) as customDimension_count_of_first_hit,
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801` t
ORDER BY visitStartTime asc

BigQuery: Use “temporary tables” via WITH (named subqueries) – Pascal Landau

In Google BigQuery we can define named subqueries via WITH clauses.
Those WITH clauses are a very comfortable way to structure complex queries as it allows to reference those queries like actual tables later on.

Note: BigQuery also supports actcual temporary tables via CREATE TEMPORARY TABLE. See the official documention on
temporary tables for further infos.
This is out of scope for this snippet, though.

Code

WITH filtered_data as ( SELECT id FROM table WHERE id BETWEEN 5 and 10
)
SELECT *
FROM filtered_data 

Working Example

<script src=”https://gist.github.com/paslandau/662a42456dc9dc77b6cbdb1d6acb8c99.js”><script src=”https://gist.github.com/paslandau/662a42456dc9dc77b6cbdb1d6acb8c99.js”>

Run on BigQuery

Open in BigQuery Console

BigQuery Console: How to use temporay tables via WITH named subqueries example

Links

Use cases

Named subqueries are a great way to structure complex queries and give sub-results a meaningful name.
When working with partitioned tables, I always use temporary tables via WITH to make sure I restrict the query to
scan only a limited number of partitions.

Conceptual example:

DECLARE from_date TIMESTAMP DEFAULT "2018-04-09";
DECLARE to_date TIMESTAMP DEFAULT "2018-04-10"; WITH huge_table_partition as( SELECT * FROM huge_table WHERE _PARTITIONTIME BETWEEN from_date AND to_date
) SELECT *
FROM huge_table_partition

BigQuery: Declare and use Variables – Pascal Landau

We can use variables by defining them with a DECLARE statement,
e.g.

DECLARE foo STRING DEFAULT "foo"; #DECLARE <variable> <type> DEFAULT <value>;

with <type> being one of the BigQuery’s built-in standard-sql data types

This is equivalent to variables of other SQL databases, e.g.

Code

DECLARE foo_var STRING DEFAULT "foo"; SELECT foo_var

Working Example

<script src=”https://gist.github.com/paslandau/0cb51ba9e532a71fff5108f156afd2f5.js”><script src=”https://gist.github.com/paslandau/0cb51ba9e532a71fff5108f156afd2f5.js”>

Run on BigQuery

Open in BigQuery Console

BigQuery Console: How to declare and use variables example

Links

Use cases

Hardcoding variables is generally considered a bad practice as it makes it harder to understand and modify a query.
A frequent use case for me is the definition of date ranges (from and to dates) that are used for querying partitioned tables:

DECLARE from_date DATE DEFAULT DATE("2018-04-09");
DECLARE to_date DATE DEFAULT DATE("2018-04-10"); WITH data as( SELECT 1 as id, DATE("2018-04-08") AS date, UNION ALL SELECT 2, DATE("2018-04-09") UNION ALL SELECT 3, DATE("2018-04-10") UNION ALL SELECT 4, DATE("2018-04-11")
) SELECT id, date
FROM data
WHERE date BETWEEN from_date AND to_date