PHP Internals News: Episode 77: fsync: Buffers All The Way Down – Derick Rethans

PHP Internals News: Episode 77: fsync: Buffers All The Way Down

In this episode of “PHP Internals News” I chat with David Gebler (GitHub) about his suggestion to add the fsync() function to PHP, as well as file and output buffers.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode’s MP3 file, and it’s available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:13

Hi, I’m Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 77. In this episode I’m talking with David Gebler about an RFC that he’s written to add a new function to PHP called fsync. David, would you please introduce yourself?

David Gebler 0:35

Hi, I’m David. I’ve worked with PHP professionally among other languages as a developer of websites and back end services. I’ve been doing that for about 15 years now. I’m a new contributor to PHP core, fsync is my first RFC.

Derick Rethans 0:48

What is the reason why you want to introduce fsync into the PHP language?

David Gebler 0:52

It’s an interesting question. I suppose in one sense, I’ve always felt that the absence of fsync and some interface to fsync is provided by most other high level languages, has always been something of an oversight in PHP. But the other reason was that it was an exercise for me in familiarizing myself with PHP’s core getting to learn the source code, and it’s a very small contribution, but it’s one that I feel is potentially useful, and it was easy for me to do as a learning exercise.

Derick Rethans 1:16

How did you find learning about PHP’s internals?

David Gebler 1:19

Quite the roller coaster. The PHP internals are very arcane I suppose I would say, it’s it’s something that’s not particularly well documented. It’s quite an interesting challenge to get into it. I think a lot of it you have to pick up from digging through the source code, looking at what’s already been done, putting together the pieces, but there is a really great community on the internals list, and indeed elsewhere online, and I found a lot of people very helpful in answering questions and again giving feedback when I first opened my initial proof of concept PR

Derick Rethans 1:48

Did you manage to find room 11 on Stack Overflow chat as well?

David Gebler 1:52

I did not, no.

Derick Rethans 1:53

I’ll make sure to add a link in the show notes and it’s where many of the PHP core contributors hang out quite a bit.

David Gebler 2:00

Sounds good to know for the future.

Derick Rethans 2:02

I read the RFC earlier today. And it talks about fsync, but it also talks about flush, or f-flush. What is the difference between them and what does fsync actually do?

David Gebler 2:14

That’s the question that will be on everyone’s lips when they hear about this feature being introduced into the language, hopefully. What does fsync do and what does fflush do? To understand that we have to understand the concept of the different types of buffering, an application runs on a system. So we have the application or sometimes called the user space buffer, and we have the operating system kernel space buffer,

Derick Rethans 2:36

And we’re talking a

Truncated by Planet PHP, read more at the original (another 22440 bytes)

Does it belong in the application or domain layer? – Matthias Noback

Where should it go?

If you’re one of those people who make a separation between an application and a domain layer in their code base (like I do), then a question you’ll often have is: does this service go in the application or in the domain layer? It sometimes makes you wonder if the distinction between these layers is superficial after all. I’m not going to write again about what the layers mean, but here is how I decide if a service goes into Application or Domain:

Is it going to be used in the Infrastructure layer? Then it belongs in the Application layer.

Dependency rule V2

I like to follow the Dependency rule: layers can have only inward dependencies. This ensures that the layers are decoupled. The Infrastructure layer can use services from the Application layer. The Application layer can use Domain layer services. In theory, Infrastructure could use services from Domain, but I’d rather not allow that. I want the Application layer to define a programming interface/API that can be used by the Infrastructure layer. This makes the Domain layer, including the Domain model, an implementation detail of the Application layer. Which I think is rather cool. No need to do anything with aggregates, or domain events; as long as everything can be hidden behind the Application-as-an-interface.

Making this a bit more concrete, consider the use case “purchasing an e-book”. In code it will be represented as a PurchaseEbookController, living in Infrastructure, which creates a PurchaseEbook command object, which it passes to the PurchaseEbookService. This service is an application service, living in the Application layer. The service creates a Purchase entity and saves it using the PurchaseRepository, living in the Domain layer. At runtime, a PurchaseRepositoryUsingSql will be used, living in Infrastructure, which implements the PurchaseRepository interface.

Why is the PurchaseRepository interface in Domain? Because it won’t and shouldn’t be used directly from Infrastructure (e.g. the controller). The same goes for the Purchase entity. It should only be created or manipulated in controlled ways by application services. But from the standpoint of the Infrastructure layer, we don’t care if the application service uses an entity, a repository interface, or any other design pattern. As long as it does its job in a decoupled way. That is, it’s not coupled to specific infrastructure, neither by code nor by the need for it to be available at runtime.

Why is the application service in the Application layer? Because it’s called directly from the controller, which is Infrastructure. The application service itself is part of the API defined by the Application layer.

Application-as-an-interface or ApplicationInterface?

This gets us to an interesting possibility, which is somewhat experimental: we can define the API that the Application layer offers to its surrounding infrastructure as an actual interface. E.g.

namespace Application; interface ApplicationInterface
{ public function purchaseEbook(PurchaseEbook $command): void; /** * @return EbookForList[] */ public function listAvailableEbooks(): array;
}

That second method, listAvailableEbooks() is an example of a view model that could be made accessible via the ApplicationInterface as well.

I think the-application-as-an-interface is a nice design trick to force Infrastructure to be decoupled from Application and Domain code. Infrastructure, like controllers, can only invoke Application behavior via this interface. Another advantage is that creating acceptance tests for the application becomes really easy. You only need the ApplicationInterface in your test and you can run commands and queries against it, to prove that it behaves correctly. You can also create better integration tests for your left-side, or input adapters, because you can replace the entire core of your application by mocking a single interface. I’ll leave a discussion of the options for another article.

Successful refactoring projects – The Mikado Method – Matthias Noback

You’ve picked a good refactoring goal. You are prepared to stop the project at anytime. Now how to determine the steps that lead to the goal?

Bottom-up development

There is an interesting similarity between refactoring projects, and regular projects, where the goal is to add some new feature to the application. When working on a feature, I’m always happy to jump right in and think about what value objects, entities, controllers, etc. I need to build. Once I’ve written all that code and I’m ready to connect the dots, I often realize that I have created building blocks that I don’t even need, or that don’t offer a convenient API. This is the downside of what’s commonly called “bottom-up development”. Starting to build the low-level stuff, you can’t be certain if you’re contributing to the higher-level goal you have in mind.

Refactoring projects often suffer from the same problem. When you start at the bottom, you’ll imagine some basic tasks you need to perform. I find this kind of work very rewarding. I feel I’m capable of creating a value object. I’m capable of cleaning up a bit of old code. But does it bring me any closer to the goal I set? I can’t say for sure.

Top-down development

A great way to improve feature development is to turn the process around: start with defining the feature at a higher level, e.g. as a scenario that describes the desired behavior of the application at large, and at the same time tests if the application exposes this behavior (see Behavior-Driven Development). When you take this top-down approach, you’ll have less rework, because you’ll be constantly working towards the higher-level goal, formulated as a scenario.

The Mikado Method

For refactoring projects the same approach should be taken. Formulate what you want to achieve, and start your project from there. This has been described in great detail in the book The Mikado Method, by Ola Ellnestam and Daniel Brolund. I read it a few years ago, so I might not be completely faithful to the actual method here. What I’ve taken from it is that you have to start at the end of the refactoring project. I’ll give an example from my current project, where the team has decided they want to get rid of Doctrine ORM as a dependency. This seems like a daunting task. But Mikado can certainly help here.

The first thing to do is to actually remove the Doctrine ORM dependency: composer remove doctrine/orm. Commit this change, run the tests and of course you’ll get an error: could not find class Doctrine\ORM\... in file xxx.php on line y. So now you know that before you can remove the doctrine/orm package you have to ensure that file xxx.php does not use that class from the Doctrine\ORM namespace anymore. This is called a prerequisite; something we need to do first, before we can even think about doing the big change. We now have to revert the commit, because it’s not a commit we should keep; we broke the application.

This may seem like a stupid exercise, but it’s still very powerful because it’s an empirical method. Instead of guessing what needs to be done to achieve the end goal, you are now absolutely sure what needs to be done. In this example, it may be quite obvious that removing a package means you can no longer use classes from that package, but in other situations it’s less obvious.

The next step is to rewrite file xxx.php in such a way that it no longer uses those classes from Doctrine\ORM. This may require a bit of work, like rewriting the mapping logic. You can define all those tasks as prerequisites too.

When you’re done with any of the prerequisites, and the tests pass, you can commit and merge your work to the main branch. Everything is good. Of course, you still have all those other classes that use Doctrine\ORM classes, but at least there’s one less. You are closer to your end goal.

You can stop at any time

Being able to commit and merge smaller bits of work multiple times a day means that Mikado is compatible with the rule that you should be able to stop at any time. You can (and should) use short-lived branches with Mikado. Everything you do is going to get you closer to your goal, but also doesn’t break the project in any way.

With Mikado it actually feels like with every commit you’re gaining XP so you can unlock new levels for your application

Truncated by Planet PHP, read more at the original (another 1515 bytes)

PHP Internals News: Episode 76: Deprecate null, and Array Unpacking – Derick Rethans

PHP Internals News: Episode 76: Deprecate null, and Array Unpacking

In this episode of “PHP Internals News” I chat with Nikita Popov (Twitter, GitHub, Website) about two RFCs: Deprecate passing null to non-nullable arguments of internal functions, and Array Unpacking with String Keys.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode’s MP3 file, and it’s available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi I’m Derick. Welcome to PHP internals news, a podcast dedicated to explain the latest developments in the PHP language. This is Episode 76. In this episode, I’m talking with Nikita Popov about a few more RFCs that he has been working on over the past few months. Nikita, would you please introduce yourself.

Nikita Popov 0:34

Hi, I’m Nikita. I work on PHP core development on behalf of JetBrains.

Derick Rethans 0:39

In the last few PHP releases PHP is handling of types with regards to internal functions and user land functions, has been getting closer and closer, especially with types now. But there’s still one case where type mismatches behave differently between internal and user land functions. What is this outstanding difference?

Nikita Popov 0:59

Since PHP 8.0 on the remaining difference is the handling of now. So PHP 7.0 introduced scalar types for user functions. But scalar types already existed for internal functions at that time. Unfortunately, or maybe like pragmatically, we ended up with slightly different behaviour in both cases. The difference is that user functions, don’t accept null, unless you explicitly allow it using nullable type or using a null default value. So this is the case for all user types, regardless of where or how they occur as parameter types, return values, property types, and independent if it’s an array type or integer type. For internal functions, there is this one exception where if you have a scalar type like Boolean, integer, float, or a string, and you’re not using strict types, then these arguments also accept null values silently right now. So if you have a string argument and you pass null to it, then it will simply be converted into an empty string, or for integers into zero value. At least I assume that the reason why we’re here is that the internal function behaviour existed for a long time, and the use of that behaviour was chosen to be consistent with the general behaviour of other types at the time. If you have an array type, it also doesn’t accept now and just convert it to an empty array or something silly like that. So now we are left with this inconsistency.

Derick Rethans 2:31

Is it also not possible for extensions to check whether null was passed, and then do a different behaviour like picking a default value?

Nikita Popov 2:40

That’s right, but that’s a different case. The one I’m talking about is where you have a type like string, while the one you have in mind is where you effectively have a type like string or null.

Derick Rethans 2:51

Okay.

Nikita Popov 2:52

In that case, of course, accepting null is perfectly fine.

Derick Rethans 2:56

Even though it might actually end up being different defaults.

Nikita Popov 3:01

Yeah. Nowadays we would prefer to instead, actually specify a default value. Instead of using null, but using mull as a default and then a

Truncated by Planet PHP, read more at the original (another 13687 bytes)

Successful refactoring projects – Set the right goal – Matthias Noback

Refactoring is often mentioned in the context of working with legacy code. Maybe you like to define legacy code as code without tests, or code you don’t understand, or even as code you didn’t write. Very often, legacy code is code you just don’t like, whether you wrote it, or someone else did. Since the code was written the team has introduced new and better ways of doing things. Unfortunately, half of the code base still uses the old and deprecated way…

The need to be consistent is very strong in me, and from what I see, in many developers. And so we get started and improve every bit of code everywhere. We become frustrated if we don’t “get” time for this work from our managers (“they don’t get it”), or if we can’t finish it; look at all this code, it’s so outdated!

Consistency is a bad refactoring goal

I’ve had this experience many times, and it made me give up on consistency as a refactoring goal. Not because consistency isn’t good. It is, because the uniformity of a code base makes it easier to contribute to it. Fewer surprises means you’ll make fewer mistakes. The problem is consistency as a refactoring goal:

  1. It’s an all-or-nothing goal. Either the code base is consistent, or it isn’t. When you start the project, you have to finish it, or you’ll feel unsatisfied. This conflicts with what we established in the previous post: prepare to stop at any time.
  2. It doesn’t serve a goal for other stakeholders. Which means it will be very easy for managers to pull the plug on the refactoring project, if they find out it’s costing them valuable development hours.

Higher refactoring goals

When it comes to refactoring projects, the development team (or sometimes just one developer) is often the primary stakeholder. Some common refactoring goals that developers have are:

  • Being able to upgrade to PHP 7
  • Being able to run tests without an actual database
  • Being able to rely on static analysis for support during development

When discussing these goals in a meeting with business stakeholders, they will downplay them because they don’t seem relevant for their own cause. They aren’t right, of course, because what’s great about these “developer-oriented” goals is that they actually serve higher goals, goals that also serve business stakeholders:

  • Being able to keep the software running for years to come
  • Being able to develop new features faster
  • Being able to release fewer mistakes to production

Alignment

Very often when I find my refactoring projects being postponed or blocked, I realize it’s because I didn’t explain the higher goals. What is often really useful is to talk to other stakeholders or decision makers (and pardon the management speak):

  • What are the benefits they’ll notice, e.g. being able to work faster, release fewer mistakes, those are really good selling points for a refactoring project.
  • Explain that you can work on this for just a few hours each week and still get those benefits. This can take away the fear that this may be one of those endless projects that have to get cancelled in the end. It forces you to think about refactoring steps, and being able to stop (and start) at any time.
  • What are things you’ll unlock for the company, e.g. by deploying your application as a Docker image, deployments will be a single-step process, which finishes in a matter of seconds.

Conclusion

In summary, for a successful refactoring project you need to be able to stop and continue at any time. Establish refactoring goals that serve higher goals. Explain to business stakeholders that your development goals have benefits for them too.

Once the refactoring project gets the green light from the team, the next task is to determine refactoring steps. To be continued!

Successful refactoring projects – Prepare to stop at any time – Matthias Noback

Refactoring projects

A common case of refactoring-gone-wrong is when refactoring becomes a large project in a branch that can never be merged because the refactoring project is never completed. The refactoring project is considered a separate project, and soon starts to feel like “The Big Rewrite That Always Fails” from programming literature.

The work happens in a branch because people actually fear the change. They want to see it before they believe it, and review every single part of it before it can be merged. This process may take months. Meanwhile, other developers keep making changes to the main branch, so merging the refactoring branch is going to be a very tedious, if not dangerous thing to do. A task that, on its own, can cause the failure of the refactoring project itself.

Short-lived branches

So can’t we use a branch for refactoring? Of course we can. But it has to be a short-lived branch. How can you ensure that a branch is short-lived?

  1. It has small commits, created within small time intervals (e.g. minutes, not hours)
  2. Each commit passes all the tests (meaning the actual tests pass, and static analysis yields no errors)
  3. The branch can be merged and deployed at all times (and actually, should be merged regularly)

Following this set of rules is a great idea for any branch, not just refactoring branches. But it’s even more important there, since the changes are likely to span many, and remote parts of the code base, which makes the risk of merge problems bigger.

What often happens is that we change a method in a way that requires updating all its clients. It takes a lot of time to do this work, and so we end up with either a very large commit, or a commit that just takes a lot of time to make, meaning that we don’t follow the first rule of short-lived branches.

Something else that could happen is that we are just viciously updating code all around the code base, and we commit the changes because everything seems alright, but then our quality assurance tools tell us something is wrong. When we get the results back from CI, we add another commit that “Fixes tests” or “Makes PHPStan happy”. When working with short-lived branches, ensure that everything is okay before committing (or set up a pre-commit hook so you can’t forget to do this).

What if we have to stop now?

Creating small commits that pass all the tests, the result should indeed be that our branch can be merged at all times. This for me is closely aligned to a thought I always have in mind when programming: what if someone pulls the plug on this project today? I don’t want my effort to be wasted, I don’t want my branch to be deleted without merging. So when I work on something I always aim for it to be useful for the team, the company, its users, etc.

One way to make sure that you always add value to the project is to establish goals for which the following is true:

  • The bigger goal can be reached in a number of smaller steps
  • Each step is useful when considered on its own

We’ll take a closer look at refactoring goals in the next article.

Conclusion

Refactoring projects require short-lived branches, where every commit can be merged in the main branch immediately. You should be able to stop the refactoring project at any time, while still leaving the project in a better state.

Should we use a framework? – Matthias Noback

Since I’ve been writing a lot about decoupled application development it made sense that one of my readers asked the following question: “Why should we use a framework?” The quick answer is: because you need it. A summary of the reasons:

  • It would be too much work to replace all the work that the framework does for you with code written by yourself. Software development is too costly for this.
  • Framework maintainers have fixed many issues before you even encountered them. They have done everything to make the code secure, and when a new security issue pops up, they fix it so you can just pull the latest version of the framework.
  • By not using a framework you will be decoupled from Symfony, Laravel, etc. but you will be coupled to Your Own Framework, which is a bigger problem since you’re the maintainer and it’s likely that you won’t actually maintain it (in my experience, this is what often happens to projects that use their own home-grown framework).

So, yes, you/we need a framework. At the same time you may want to write framework-decoupled code whenever possible.

Here’s a summary of the reasons. If all of your code is coupled to the framework:

  • It will be hard to keep up with the framework’s changes. When their API changes, or when their conventions or best practices change, it takes just too much time to update the code base.
  • It’s hard to test any business logic without going through the front controller, that is, by making fake or real web requests to your application, analyzing the response html, or peeking into the database.
  • It’s hard to test anything at all, because nothing allows itself to be tested in isolation. You always have to set up a database schema, populate it with data, or boot a service container of some kind.

Pushing for a big and strong core of decoupled code, that isn’t tied to the database technology, or a particular web framework, will give you a lot of freedom, and prevents all of the above problems. How to write decoupled code? There’s no need to reinvent the wheel there either. You can rely on a catalog of design patterns, like:

  • Application services and command objects
  • Entities and repository interfaces
  • Domain events and domain event subscribers

None of these classes will use framework-specific things like:

  • Request, Response, Session, Token storage, or security User classes,
  • Service locators, configuration helpers, dependency resolvers,
  • Database connections, query builders, relation mappers, or whatever your framework calls them.

For me good rules of thumb to test the “decoupledness” of my business logic are:

  1. Can I migrate this application from a web to a CLI application without touching any of the core classes?
  2. Can I instantiate all the classes in the core of my application without preparing some special context or setting up external services?
  3. Can I migrate this application from an SQL database to a document database without touching any of the core classes?

1 and 2 should be unconditionally true, 3 allows some room for coupling due to the age-old problem of mapping entities to their stored format. For instance, you can have some mapping logic in your entity (i.e. instructions for your ORM on how to save the entities). But at least there shouldn’t be any service dependencies that are specific to your choice of persistence, e.g. you can’t inject an EntityManagerInterface or use a QueryBuilder anywhere in your code classes. Also, calling methods should never trigger actual calls to a database, even if it’s an Sqlite one.

If you do all of this, your framework will be like a layer wrapped around your decoupled core:

This layer contains all the technical stuff. This is where you find the acronyms: SQL, ORM, AMQP, HTTP, and so on. This is where we shouldn’t do everything on our own. We leverage the power of many frameworks and libraries that save us from dealing with all the low-level concerns, so we can focus on business logic and user experience.

A framework should help you:

  • Make a smooth jump from an incoming HTTP request to a call to one of your controllers.
  • Load, parse, and validate application configuration.
  • Instantiate any service needed to let you do your work.
  • Translate your data to queued messages that can be consumed by external workers.
  • Parse command-line arguments and pass them as ready-to-consume primitive-type values.
  • Turn your application’s data in

Truncated by Planet PHP, read more at the original (another 1854 bytes)

PHP Internals News: Episode 75: Globals, and Phasing Out Serializable – Derick Rethans

PHP Internals News: Episode 75: Globals, and Phasing Out Serializable

In this episode of “PHP Internals News” I chat with Nikita Popov (Twitter, GitHub, Website) about two RFCs: Restrict Globals Usage, and Phase Out Serializable.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode’s MP3 file, and it’s available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi I’m Derick. Welcome to PHP internals news, a podcast dedicated to explain the latest developments in the PHP language. This is Episode 75. In this episode, I’m talking with Nikita Popov about a few RFCs that he has been working on over the past few months. Nikita, would you please introduce yourself?

Nikita Popov 0:34

Hi, I’m Nikita, I work at JetBrains on PHP core development and as such I get to occasionally, write PHP proposals RFCs and then talk with Derick about them.

Derick Rethans 0:47

The main idea behind you working on RFCs is that PHP gets new features not, you end up talking to me.

Nikita Popov 0:53

I mean that’s a side benefit,

Derick Rethans 0:55

In any case we have a few to go this time. The first RFC is titled phasing out Serializable, it’s a fairly small RFC. What is it about?

Nikita Popov 1:04

That finishes up a bit of work from PHP 7.4, where we introduced a new serialization mechanism, actually the third one, we have. So we have a bit too many of them, and this removes the most problematic one.

Derick Rethans 1:19

Which three Serializable methods or ways of doing things currently exist?

Nikita Popov 1:24

The first one, which doesn’t really count is just what you get if you don’t do anything, so just all the Object Properties get serialized, and also unserialized, and then we have a number of hooks, you can use to modify that. The first pair is sleep and wake up. Sleep specifies which properties you want to serialize so you can filter out some of them, and wake up allows you to run some code, after unserialization, so you can do some kind of fix up afterwards.

Derick Rethans 1:52

From what I remember, if you use unserialize, where does the wake up the constructor doesn’t get called?

Nikita Popov 1:59

During unserialization the constructor, never gets called.

Derick Rethans 2:03

So wake up a sort of the static factory methods to re rehydrate the objects.

Nikita Popov 2:08

Exactly.

Derick Rethans 2:08

So that’s number one,

Nikita Popov 2:10

Then number two is the Serializable interface, which gives you more control. Namely, you have to actually like return the serialized representation of your object. How it looks like is completely unspecified, you could return whatever you want, though, in practice, what people actually do is to recursively call serialize. And then on the other side when unserializing you usually do the same so you call unserialize on the stream you receive, and then populate your properties based on that. The problem with this mechanism is exactly this recursive serialization call, because it has to share state, with the main serialization. And the

Truncated by Planet PHP, read more at the original (another 22227 bytes)