Welcome to Philippe’s collection of texts. I’m adding here ideas and information with a long shelf life that I’d like to have a permalink for. The goal is also to have it hosted outside a platform.
In the spirit of free software, information here is provided as-is in the hope that it’ll be useful. Opinions my own.
By design, this site does not allow commenting. You can reach me on Mastodon and LinkedIn.
Posts
-
SELinux Starter - The missing manuals
SELinux is a very fine-grained access control system for Linux, which can be used to significantly increase the security of the OS. I had some difficulty building a mental model of SELinux, and I’m simply documenting this for myself and hopefully others here.
What to expect: SELinux’s basic concepts, some examples, and also links to in-depth documentation to go further. After reading the text, you should be able to understand how the access control works, and how to figure out why a particular access is authorized or denied.
-
Despair and action
Today I’m writing to share the techniques I’ve been using to deal with the rapid changes in the world’s situation, with the hope that it can help other people. It’s written from the perspective of a privileged European person, and is very focused on what I personally see as major issues: the possible comeback of fascism and authoritarianism in Western countries, and all kinds of threats on individual and collective freedom, especially mediated through digital tools.
If you don’t share my situation or my views I hope you’ll still find something useful in what I wrote.
-
Phones and Sovereignty
The battle over application stores has been raging for a while now, and I personally use the legal battle between Epic Games and Apple as a starting point. The original grievance from Epic was that they weren’t allowed to use any other payment system than Apple’s in their iOS version of Fortnite, meaning that Apple was automatically getting a 30% cut out of all of Epic’s revenue in Fortnite. The same was happening on the Android platform, which Epic also challenged. In both cases, Epic won in court setting a useful legal precedent.
-
The Very Basics Of Cryptography
Cryptograph underpins most of today’s internet infrastructure. This post goes over the very basics of modern symmetric and asymmetric cryptography to give enough information to be able to understand the technical implications of cryptography on other topics. It is meant to be understandable by someone with no technical knowledge.
A good part of cryptography is encryption: a process by which a message is translated into a form that can only be read by its intended recipient. We’ll talk mostly about that.
-
Some Comments on ATIH's article on Safety Laws
I recently read Online Safety Regulation: The Duty of Care Framework and Implementation Blind Spots from All Tech Is Human, which, through the writing of Emma Hatheway gives us a balanced take on the state of online safety laws in the UK and the US. The article is way too well written for my humble prose to give it justice: go read it. I simply point out a few arguments that I found particularly enlightening.
-
Ansible Starter - The missing manuals
I found it difficult to get into Ansible by reading the official documentation. This post is an attempt at addressing that by:
- providing trimmed-down examples to focus on what’s essential
- explaining the core concepts that are needed to understand what is going on
- highlighting the important entry points in the documentation to go further
It is not a complete documentation! It only provides a minimal (hopefully solid) foundation to reduce the confusion and speed up further learning. It assumes a basic knowledge of YAML. You can look at Yaml Starter if you need a refresher.
-
YAML Starter - The missing manuals
This page describes the very basics of YAML, as a quick reference.
-
A Personal Take on Tech Debt
In software engineering the concept of “technical debt” is ubiquitous. Everybody who worked on the same codebase for more than a few months knows how it feels: this impression of wasting time because you have to rewrite some part of the code, or because you need to put some effort just to keep the lights on. At the same time, my experience is that most software engineers disagree on the exact definition of technical debt. This post is an attempt at understanding why that is.
-
The Rise Of Version Control Systems
The history of VCS tells us interesting things about usability and why a piece of software is successful. I saw a few generations of version control systems so far, each one improving on its predecessor in major ways, with sometimes surprising consequences.
Disclaimer: This piece has no intention of being comprehensive or particularly accurate, though I researched it a bit. It’s a bit of history seen from my personal experience.
-
A Personal Take On Data Engineering
Most of the posts I’ve written so far talk about one aspect or another of data engineering. My definition can differ a bit from other people’s so I thought I’d write my understanding of it.
In short: data engineering is how you make data useful: it enables other people to do their jobs. In my case it’s mostly been training ML models, but it can also be to get some insights from the data itself - for example for science. The job involves all the steps from acquiring data to having usable datasets. Read on.
-
There's Nothing Like Real Usage
Prototypes are great tools to derisk an idea: their purpose is to quickly make sure your idea can work, without getting into the details. Often though there is a big gap between that and a great product, because you need to answer the question: is the product actually useful?
If regardless of the answer you’re facing months of developments - for example you’re building a video call app and need a very scalable backend - you’d better make sure the end result will be usable from the user’s perspective as early as possible.
-
Datasets are never finished
So, you’ve launched a great ML model in production, and it works great. What’s left? Quite a lot actually.
Obviously you want to improve the model so that it works even better. So you spend a lot of time trying new model architectures, new hyperparameters, etc., to improve your metrics (since you have a great test set).
But after a while, improvements on the metrics don’t seem to correlate with a better product. What could be going wrong?
-
A Tooling Tidbit
A good recipe for a successful product is to reduce friction, that is reduce mental load for the users. It’s something to be taken seriously: if your product requires too much mental effort, users will simply not use it.
This post is about a very small example that I stumbled upon while configuring my own environment: I wanted to solve a minor but vexing problem (to me): how to create a directory using the shell and immediately cd into it.
-
Test Set and Random Sampling
Last time I talked about the test set defining your metric. I’ll point out some interesting consequences.
If you search on the web for “train/valid/test splitting”, you’ll essentially find elaborate explanations of how to use
np.randn(). It’s clear for everyone that once you have a big pile of data points, you just split them randomly, let’s say 10% in the validation and test sets, and 80% in the training set. End of the story, let’s get to the interesting part, that is: training a model.Wait wait wait wait. If you just randomly split your data points, it means that the data distribution in the test set is determined by whichever distribution you happen to have in the data you started from… hence you have no control over your metric.
-
Your Test Set Is Your Metric
Your test dataset is your model’s performance metric.
Wait, isn’t that obvious? It’s one of these obvious things which can have serious consequences if overlooked.
-
On Static, Dynamic Typing and Ways of Thinking
Despite the title, this post is not about the never-ending war between proponents of static typing and dynamic typing. It’s an attempt at going beyond that and understand what is really going on below the worn-out technical arguments. I stand firmly on the dynamic-typing side here: I’ve always had more fun with languages like Python, Javascript, and Lua than with C++, Go or Java. I’m competent in all styles, and spent a lot of time writing war-ready code in statically-typed languages, but the mental effort is very different from when I write Python or Javascript.
Why is it so?
-
GenAI is a new world
Why I think recent GenAI / LLMs are a step into a new (computer) world.
My LinkedIn feed is full of posts from excited software engineers talking about LLMs in relation to programming languages, API, algorithms, etc. The scientific literature is regularly claiming to have proved “guarantees” about such and such training algorithm and modeling technique. A lot of effort is put towards making LLMs reliable, consistent, etc.
I think all of this misses a very important point: I believe LLMs are fundamentally of a different nature than algorithms.
subscribe via RSS
Yoyonax