Local Whispers – Derick Rethans

Local Whispers

Tuesday, May 7th 2024, 15:15 BST

For most of the videos that I make, I also like to have subtitles, because sometimes it’s easier to just read along.

I used to make these subtitles with an online service called Otter.io, but they stopped allowing uploading of video files.

And then I found Whisper, which allows me to upload audio files to create subtitles. Whisper is an API from OpenAI, mainly known for ChatGPT.

I didn’t like having to upload everything to them either, as that means that they could train their model with my original video audio.

Whisper never really worked that well, because it broke up the sentences in weird places, and I had to make lots of edits. It look a long time to make subtitles.

I recently found out that it’s actually possible to run Whisper locally, with an open source project on GitHub. I started looking into this to see whether I could use this to create subtitles instead.

The first thing that their documentation tells you to do is to run: pip install openai-whisper.

But I am on a Debian machine, and here Python is installed through distribution packages, and I don’t really want to mess that up. apt-get actually suggests to create a virtual environment for Python.

In a virtual environment, you can install packages without affecting your system setup. Once you’ve made this virtual environment, there’s actually Python binaries symlinked in there, that you can then use for installing things.

You create the virtual environment with:

python3 -m venv `pwd`/whisper-local
cd whisper-local

In the bin directory you then have python and pip. That’s the one you then use for installing packages.

Now let me run pip again, with the same options as before to install Whisper:

bin/pip install -U openai-whisper

It takes quite some time to download. Once it is done, there is a new whisper binary in our bin directory.

You also need to install fmpeg:

sudo apt-get install ffmpeg

Now we can run Whisper on a video I had made earlier:

./bin/whisper ~/media/movie/xdebug33-from-exception.webm

The first time I ran this, I had some errors.

My video card does not have enough memory (2GB only). I don’t actually have a very good video card at all, and was better off disabling it, by instructing “Torch” that I do not have one:

export CUDA_VISIBLE_DEVICES=""

And then run Whisper again:

./bin/whisper ~/media/movie/xdebug33-from-exception.webm

It first detects the language, which you can pre-empt by using --language English.

While it runs, it starts showing information in the console. I quickly noticed it was misspelling lots of things, such as my name Derick as Derek, and Xdebug as XDbook.

I also noticed that it starts breaking up sentences in a odd way after a while. Just like what the online version was doing.

I did not get a good result this first time.

It did create a JSON file, xdebug33-from-exception.json, but it is all in one line.

I reformatted it by installing the yajl-tools package with apt-get, and flowing the data through json_reformat:

sudo apt-get install yajl-tools
cat xdebug33-from-exception.json | json_reformat >xdebug33-from-exception_reformat.json

The reformatted file still has our full text in a line, but then a segments section follows, which looks like:

"segments": [ { "id": 0, "seek": 0, "start": 3.6400000000000006, "end": 11.8, "text": " Hi, I'm Derick. For most of the videos that I make, I also like to have subtitles, because", "tokens": [ 50363, 15902, 11, 314, 1101, 9626, 624, 13, 1114, 749, 286, 262, 5861, 326, 314, 787, 11, 314, 635, 588, 284, 423, 44344, 11, 780, 50960 ], "temperature": 0.0, "avg_logpro

Truncated by Planet PHP, read more at the original (another 4617 bytes)

Xdebug Update: May 2024 – Derick Rethans

Xdebug Update: May 2024

Monday, May 6th 2024, 16:30 BST

London, UK

I have not written an update like this for a while. I am sorry.

In the last months I have not spent a lot of time on Xdebug due to a set of other commitments.

Since my last update in November a few things have happened though.

Xdebug 3.3

I released Xdebug 3.3, and the patch releases 3.3.1 and 3.3.2.

Xdebug 3.3 brings a bunch of new features into Xdebug, such as flamegraphs.

The debugger has significant performance improvements in relation to breakpoints. And it can now also show the contents of ArrayIterator, SplDoublyLinkedList, SplPriorityQueue objects, and information about thrown exceptions.

A few bugs were present in 3.3.0, which have been addressed in 3.3.1 and 3.3.2. There is currently still an outstanding issue (or more than one), where Xdebug crashes. There are a lot of confusing reports about this, and I have not yet managed to reproduce any of them.

If you’re running into a crash bug, please reach out to me.

There is also a new experimental feature: control sockets. These allow a client to instruct Xdebug to either initiate a debugging connection, or instigate a breakpoint out of band: i.e., when no debugging session is active. More about this in a later update.

Funding Platform

Last year, I made a prototype as part of a talk that I gave at NeosCon.io. In this talk I demonstrated native path mapping — configuring path mapping in/through Xdebug, without an IDE’s assistance.

In collaboration with Robert from NEOS, and Luca from theAverageDev, we defined a project plan that explains all the necessary functionality and work.

Adding this to Xdebug is a huge effort, and therefore I decided to set up a way how projects like this could be funded.

There is now a dedicated Projects section linked from the home page, with a list of all the projects. The page itself lists a short description for each project.

For each project, there is a full description and a list of its generous contributors. The Native Xdebug path Mapping project is currently 85% funded. Once it is fully done, I will start the work to get this included in Xdebug 3.4. You could be part of this too!

Xdebug Videos

I have created several videos since November.

Two for Xdebug:

And several for writing PHP extensions, as part of a new series:

If you have any suggestions, feel free to reach out to me on Mastodon or via email.

Truncated by Planet PHP, read more at the original (another 1644 bytes)

PHP Security and Compliance: Trends to Watch in 2024

PHP security and compliance are top concerns in 2024. Explore upcoming PHP security trends as we discuss findings from our most recent PHP Landscape Report.

Statement on glibc/iconv Vulnerability – PHP: Hypertext Preprocessor

EDIT 2024-04-25: Clarified when a PHP application is vulnerable to this bug.Recently, a bug in glibc version 2.39 and older (CVE-2024-2961) was uncovered where a buffer overflow in character set conversions to the ISO-2022-CN-EXT character set can result in remote code execution. This specific buffer overflow in glibc is exploitable through PHP, which exposes the iconv functionality of glibc to do character set conversions via the iconv extension. Although the bug is exploitable in the context of the PHP Engine, the bug is not in PHP. It is also not directly exploitable remotely. The bug is exploitable, if and only if, the PHP application calls iconv functions or filters with user-supplied character sets. Applications are not vulnerable if: Glibc security updates from the distribution have been installedOr the iconv extension is not loadedOr the vulnerable character set has been removed from gconv-modules-extra.confOr the application passes only specifically allowed character sets to iconv. Moreover, when using a user-supplied character set, it is good practice for applications to accept only specific charsets that have been explicitly allowed by the application. One example of how this can be done is by using an allow-list and the array_search() function to check the encoding before passing it to iconv. For example: array_search($charset, $allowed_list, true) There are numerous reports online with titles like “Mitigating the iconv Vulnerability for PHP (CVE-2024-2961)” or “PHP Under Attack”. These titles are misleading as this is not a bug in PHP itself. If your PHP application is vulnerable, we first recommend to check if your Linux distribution has already published patched variants of glibc. Debian, CentOS, and others, have already done so, and please upgrade as soon as possible. Once an update is available in glibc, updating that package on your Linux machine will be enough to alleviate the issue. You do not need to update PHP, as glibc is a dynamically linked library. If your Linux distribution has not published a patched version of glibc, there is no fix for this issue. However, there exists a workaround described in GLIBC Vulnerability on Servers Serving PHP which explains a way on how to remove the problematic character set from glibc. Perform this procedure for every gconv-modules-extra.conf file that is available on your system.Once an update is available in glibc, updating that package on your Linux machine will be enough to alleviate the issue. You do not need to update PHP, as glibc is a dynamically linked library.PHP users on Windows are not affected.There will therefore also not be a new version of PHP for this vulnerability.

Moving on from Mocha, Chai and nyc. – Evert Pot

I’m a maintainer of several small open-source libraries. It’s a fun activity.
If the scope of the library is small enough, the maintenance burden is
typically fairly low. They’re usually mostly ‘done’, and I occasionally just need to
answer a few questions per year, and do the occasional release to bring it
back up to the current ‘meta’ of the ecosystem.

Also even though it’s ‘done’, in use by a bunch of people and well tested,
it’s also good to do a release from time to time to not give the impression
of abandonment.

This weekend I released a 2.0 version of my bigint-money library, which
is a fast library for currency math.

I originally wrote this in 2018, so the big BC break was switching everything
over to ESM. For a while I tried to support both CommonJS and ESM
builds for my packages, but only a year after all that effort it frankly no
longer feels needed. I was worried the ecosystem was going to
split, but people stuck on (unsupported) versions of Node that don’t
support ESM aren’t going to proactively keep their other dependencies updated,
so CommonJS is for (and many others) in the past now. (yay!)

Probably the single best way to keep maintenance burden for packages low is
to have few dependencies. Many of my packages have 0 dependencies.

Reducing devDependencies also helps. If you didn’t know, node now has a
built-in testrunner. I’ve been using Mocha + Chai for many many
years. They were awesome and want to thank the maintainers, but node --test
is pretty good now and has pretty output.

It also:

Is much faster (about twice as fast with Typescript and code coverage
reporting, but I suspect the difference will grow with larger code bases).
Easier to configure (especially when you’re also using Typescript. Just use tsx --test).
It can output test coverage with (--experimental-test-coverage).

Furthermore, while node:assert doesn’t have all features of Chai, it has
the important ones (deep compare) and adds better Promise support.

All in all this reduced my node_modules directory from a surprising 159M
to 97M, most of which is now Typescript and ESLint, and my total dependency
count from 335 to 141 (almost all of which is ESLint).

Make sure that Node’s test library, coverage and assertion library is right
for you. It may not have all the features you expect, but I keep my testing
setup relatively simple, so the switch was easy.

Concealing Cacophony – Derick Rethans

Concealing Cacophony

Tuesday, April 16th 2024, 14:30 BST

London, UK

Over the last few weeks I have been publishing a series of videos on writing PHP extensions.

I record these videos through OBS, and then slice and dice them with Kdenlive. This editing is necessary to make up for my mistakes, shorten the time we wait for things to compile, and to remove the noise of me hammering away on my keyboard.

Editing takes a lot of time, and I still wasn’t always pleased with the result as there was still a fair amount of noise while I am talking.

For the PHP Internals News podcast, I used a set of noise cancellation filters, which worked wonders. But it turns out that Kdenlive does not come with one built in.

I had a look around on the Internet, and learned that there is a LADSPA Noise Suppressor for Voice plugin. LADSPA is an open API for audio filters and audio signal processing effects. LADSPA plugins can be used with Kdenlive.

Some Linux distributions have a package for this LADSPA Noise Suppressor for Voice, but my Debian distribution bookworm does not.

I found instructions that explain how to build the plugin from source. These instructions worked after some tweaks. I ended up creating the following script:

#!/bin/bash sudo apt install cmake ninja-build pkg-config libfreetype-dev libx11-dev libxrandr-dev libxcursor-dev
git clone https://github.com/werman/noise-suppression-for-voice /tmp/noise
cd /tmp/noise
cmake -Bbuild-x64 -H. -GNinja -DCMAKE_BUILD_TYPE=Release
sudo ninja -C build-x64 install

After running this script, and restarting Kdenlive, I found the installed plugin when I searched for it.

With the plugin loaded, I now have much clearer sound, and I also don’t have to edit the sections where I am typing, as the plugin automatically handles this.

I will still have to edit out my mistakes.

I then also had a look at how it worked. It turns out that this plugin uses neural networks to cancel the noise.

In the background, it uses the RNNoise library which implements an algorithm by Jean-Marc Valin, as outlined in this paper. There is an easier to read version of how the algorithm works on his website.

The data to train the model is also freely available, and uses resources from the OpenSLR project. Noise data is also available there. From what I can tell, all this data was contributed under reasonable conditions, and not scraped from the internet without consent. That is important to me.

Hopefully, from the third video in the series, you will find the sound quality much better.

Tukio 2.0 released – Event Dispatcher for PHP – Larry Garfield

Tukio 2.0 released – Event Dispatcher for PHP

I’ve just released version 2.0 of Crell/Tukio! Available now from your favorite Packagist.org. Tukio is a feature-complete, easy to use, robust Event Dispatcher for PHP, following PSR-14. It began life as the PSR-14 reference implementation.

Tukio 2.0 is almost a rewrite, given the amount of cleanup that was done. But the final result is a library that is vastly more robust and vastly easier to use than version 1, while still producing near-instant listener lookups.

Some of the major improvements include:

Larry
14 April 2024 – 2:24pm

Read more about Tukio 2.0 released – Event Dispatcher for PHP

When to Rewrite vs. Refactor Your Web App

Need to decide between refactoring vs. rewriting your PHP app? Get a breakdown of the benefits and drawbacks of each approach in this blog.

Check licenses of composer dependencies – Rob Allen

With some commercial projects, it can be useful to know that all your dependencies have licences that your organisation deems acceptable.

I had this requirement for a few clients now and came up with this script that we ran as part of our CI which would then fail if a dependency used a license that wasn’t allowed.

This proved to be reasonably easy as composer licenses will provide a list of all packages with their license, and more usefully, the -f json switch will output the list as JSON. With a machine-readable format, the script just came together!

At some point, we discovered that we needed to allow exceptions for specifically authorised packages, so I added that and haven’t changed it since.

check-licenses.php

<?php $allowedLicenses = ['Apache-2.0', 'BSD-2-Clause', 'BSD-3-Clause', 'ISC', 'MIT', 'MPL-2.0', 'OSL-3.0'];
$allowedExceptions = [ 'some-provider/some-provider-php', // Proprietary license used by SMS provider
]; $licences = shell_exec('composer licenses -f json');
if ($licences === null || $licences === false) { echo "Failed to retrieve licenses\n"; exit(1);
} try { $data = json_decode($licences, true, 512, JSON_THROW_ON_ERROR);
} catch (JsonException $e) { echo "Failed to decode licenses JSON: " . $e->getMessage() . "\n"; exit(1);
} // Filter out all dependencies that have an allowed license or exception
$disallowed = array_filter( $data['dependencies'], fn(array $info, $name) => ! in_array($name, $allowedExceptions) && count(array_diff($info['license'], $allowedLicenses)) === 1, ARRAY_FILTER_USE_BOTH
);
if (count($disallowed)) { $disallowedList = array_map( fn(string $k, array $info) => sprintf("$k (%s)", implode(',', $info['license'])), array_keys($disallowed), $disallowed ); printf("Disallowed licenses found in PHP dependencies: %s\n", implode(', ', $disallowedList)); exit(1);
} exit(0);

Running check-licenses.php

If all dependencies are allowed, then check-licenses will output nothing and exit with status code 0:

$ php bin/check-licenses.php
$ echo $?
0

If at least one dependency is not allowed, then check-licenses will list the packages that have licenses that ar not allowed and exit with status code 1:

$ php bin/check-licenses.php
Disallowed licenses found in PHP dependencies: foo/bar (GPL-3.0)
$ echo $?
1

Maybe it’s useful to others too. If you use it, put it in your CI system.

How to Install (and Configure) Ansible to Deploy a PHP Application

Get an overview of how to install and configure Ansible to deploy a PHP application in this blog.