development: Figure out versioning scheme #14

Open Jookia opened this issue on 24 Aug - 10 comments

@Jookia Jookia commented on 24 Aug

This project is never going to be complete, but to release at all we need to figure out some versioning scheme.

My proposal is this: Release using the date as the software version number, and have a system of stability promises. A promise has a unique name and version, starting from 1. The promise can be increased to add new functionality or features, but it can never be reduced to remove functionality or change behavior. In the case we ever need to break or remove functionality, we create a new promise and start from version 1 again. To retain backwards compatibility we keep the old promise for a specific amount of time.

Compared to standard versioning like semantic versioning or just major.minor.patch this is a little strange, but it tries to work around the reality that most software can only have one version installed at a time in any given system. So when backwards compatibility is broken suddenly you can't use software that requires guarantees or behavior from the older software. By versioning promises instead of software, promises can be provided by third-party software like compatibility layers or other implementations entirely.

Maybe this is overly complicated, it might just be better to version interfaces separately from the software itself and then support those interfaces.

Right now NewLang doesn't provide any stable interfaces. But when things settle down or become useful, it will have to do so.

@Jookia Jookia commented on 9 Sep

Okay, so I've thought about this for way too long. I still haven't fully this figured out, but might as well throw these out here and come to some kind of conclusion.

Version numbers are intended to convey some information. With projects like Linux, Chrome, Firefox, systemd, etc- it's mostly just to compare whether a version is newer and identify which version a user is running to know if there's a bug or fix in that version. A lot of projects take a step further of using the version number to convey some information about compatibility.

So let's look at some use cases: Use case one, figuring out if your software is up to date.

This should be easy, just seeing if one number is bigger than another. But sometimes there's alphas, sometimes there's betas, sometimes there's stable branches, sometimes there's long time support branches. On top of this sometimes downstream modifications by distributions.

So let's look at some versions:

  • systemd 249 (249.4-1-arch)
  • Python 3.9.6
  • Mozilla Firefox 91.0.2

The Python and Firefox versions don't indicate which Arch package I have installed, which means it no longer uniquely identifies the software build.

systemd's latest is 249, but it has a stable release 249.4. Python's latest is 3.9.7, but 3.6.15 was also released the same day. Firefox's latest is 92.0 and 91.1.0.

There's a cycle of development, distribution and downstream modification that makes it hard to know what exact version of software you have, let alone if it's the most up to date version or correct version.

Even then you can't really find out if the software is up to date without researching online. So version numbers aren't very helpful for this.

Use case two: Figuring out if the your is compatible with some other software.

Often times you want to know if there's been some breaking change that means you have to wait to update. Like with Python 2 to 3. In that case Python used the major version to indicate there was a breaking change. A lot of software actually does this.

Usually they have at least two numbers in their version: major and minor. So with like Python 3.9.6, 3 is the major and 9 is the minor. It works with these rules:

  • If the major version differs, old software may not work
  • If the minor version differs, new software may not work
  • If anything else differs it should be fine

So a program that needs python 2.7 or python 3.2 may not work with 3.1.
This is a bit sketchy at best, but it works badly in practice.

Say I'm working on a new Python package to make sounds or something. I name it 'Sounds For Python', and since I developed it using Python 3.9.6 I say it will work with Python versions between 3.9 and 4.0. So 3.10, 3.11 and whatever will work. But 3.8 and 4.0 may not work.

A ton of people really like my package so they use it. Then Python 4 comes out and it makes one backwards compatible change: It removes the ability to eat all of a computer's RAM or something. So now I have to manually review and change my versions to be 'between 3.9 and 5.0'.

If I don't change my version support, people might be stuck with using Python 3 just to use my package. But what if one of the other packages they use now requires Python 4? Now they have to pick between using my package or the other package, or using out of date packages which might require older packages themselves. Multiply that by hundreds of packages all depending on each other, things get very confusing very fast.

Something even worse with this scheme is that scheme only specifies some part of the program's compatibility. So an example is the python-crytography project which updated and stayed compatible, but started to require Rust to run its tests. librsvg had something similiar where they rewrote the program in Rust. Programs would still work with them, but they never promised to run on certain platforms so that promise wasn't part of the version.

Another thing on top of that is sometimes programs build with optional features. These are often added to version strings, like if you run curl --version you get a listing like this:

Protocols: dict file ftp ftps gopher gophers http https imap imaps mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: alt-svc AsynchDNS GSS-API HSTS HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP UnixSockets zstd

So I run version 7.78 but that doesn't tell me if it's compatible with say, https.

So what to do? I don't really know. I think we need to drop the idea of using a version number to encode anything more than which version a piece of software is. So what I'd like to see is something like this:

NewLang 2021-09-09-1-arch-1 Build e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
Packaged by Arch Linux. Run 'pacman -Syu' to get the latest version
Features: Platform_Linux_amd64 Parser_0 Interp_0

The first line gives version information:

  • The first word is the program name, NewLang
  • The second word is a unqiue human-readable source code identifier. It can be anything, it's meaningless outside the context of the distributor. This must be changed when any modification happens to the source code that differs from the mainline version
  • The third word is the word Build
  • The fourth word is a unique identifier of the final build or package of the program. This is used to compare builds of the program, again within the context of the distributor.

The second line is a human-readable note about who distributed the program and how to get a newer version.

The third line lists every single feature the build provides. In this case it supports running on 64-bit Linux, version 0 of the parser and version 0 of the interpreter. Software that needs a feature must check this before using it to ensure compatibility.

@xogium xogium commented on 9 Sep

I think the arch-1 is just to indicate how many time the arch pkgbuild was bumped.

Other than this, how do you figure if you've got what it takes ? And that long commit id, what does it mean ? I kind of don't get this beyond the version number as a date.

@Jookia Jookia commented on 9 Sep

Yes, the pkgbuild gets bumped sometimes and that changes the overall version.

The long ID isn't a commit ID. It's a build ID generated during the build. So you might build the same package twice and get multiple versions, like if users build our software. They'd each have a unique build ID. It should probably be a UUID instead of a hash.

The ideas here are that we need to know three things:

  • What source code the program/package was built from
  • Which build of the code the program/package is
  • What features it supports so other programs/packages can check if they're compatible
@Jookia Jookia commented on 9 Sep

After some discussion in private, it seems the build ID is a bit confusing. Specifically what its purpose is, why is it a hash.

All it is is a unique way to identify the build of the program- this is useful when troubleshooting, to confirm you have the same build instead of wasting time assuming it. It's also very useful during development for changes that aren't marked by a commit. When copying a build to somewhere like a device you can check to make sure the version matches, or take notes about specific build IDs.

A more user friendly way for this might be build date. But that might be confusing since the version already includes the date. Maybe some other unique identifier that gets appended to the version might help?

After more discussion with Xogium, it seems like the build ID is more trouble than it's worth, especially with a screen reader.

Re-examining the problem, a simpler solution might be some sort of 'build word'- a random word that gets added to each build.
This solves the issue of checking if you actually have the same build as someone. It will collide with enough builds to compare (globally perhaps), but for cases where you have a limited set of builds to check (like package variants from a single distributor or different development builds from a friend) it should be more than adequate.

It's also a little confusing to have a separate build version. So stapling it to the version would seem fine.

The new format would look a bit like this:

NewLang 2021-09-09-1-arch-1-irritate
Packaged by Arch Linux. Run 'pacman -Syu' to get the latest version
Features: Platform_Linux_amd64 Parser_0 Interp_0

The source version would be 2021-09-09-1-arch-1 and the build would append a random word, such as irritate.

I can imagine concerns coming from this from people who want reproducible builds, since it introduces randomness to the build. Maybe this could be based off of a hash of the build instead. Builds identical down to the bit would get the same build word added.

Okay, after some more thought about this, it might just be better to use a random number instead of a word. Four digits gives 9999 combinations (excluding 0000), which gives around a 40% collision rate at 100 builds, which is way more than this should be used for (say 10 builds comparing at most).

So the version would be something like NewLang 2021-09-09-1-arch-1-4826 and disabling this feature would just set the random number to 0000. That way people wanting deterministic builds are satisfied.

Worth noting that sometimes the software won't be built but under development. So in that case it can just be 0000 too.

After spending a few months away from this, I think this might just be over-complicating things. Version numbers are meant to tell you if you have a newer version than another, and often indicate some major change for marketing purposes.

For developers, version numbers are basically used to know if your software is compatible with some version. You can't really predict that using a number and it quickly gets in to dependency hell. Querying actual functionality in a system is a good idea with feature flags, but ideally you never have to do that.

Reading a random pick on my system

systemd 251 (251-1-arch)

This convention seems fine and I think we should just copy it. Have a version number, and a variant downstream can use. In this case we see that the variant is 251-1-arch, which means it's based on systemd-stable 251.1 with Arch Linux patches.

Then it has a bunch of feature flags.

As for figuring out exact versions down to the build, having builds be reproducible solves this as you can just check the hashes against a list of builds you've made. So that problem solves itself.


Tested and documented
No one
2 participants
@Jookia @xogium