Improve boot speed #9

Open Jookia opened this issue on 9 Aug 2020 - 3 comments

@Jookia Jookia commented on 9 Aug 2020

The boot speed to login prompt is currently 17 seconds. Things to try:

Userspace:

  • Drop the initrd (saves 6.5 seconds)
  • Loosen ordering of systemd units (saves 2.7 seconds, use systemd critical-chain dbus.service)
  • Investigate why udev is extremely busy setting up our very very basic hardware
  • Figure out if generators running before the root mount is slowing us down

Kernel:

  • Don't compress kernel or try using LZO compression
  • Drop serial module from kernel, turn it in to a module?

Bootloader:

  • Staple device tree to kernel somehow to avoid talking to MMC again
  • Drop watchdog, PSCI, ST PMIC and mmc0 (SD card) support from Barebox
  • Turn off ATF and barebox printing?

I think at this stage we're paying a lot for compression and probing hardware before we get to the login shell.

Xogium's done some work optimizing the bootloader, which has saved at least 0.56 seconds. So the current theoretical boot is around 7.5 seconds. The kernel itself reaches init at around 1.9 seconds from power on. So the remaining work will probably have to starting the shell within two seconds from systemd start, to get the 4 second boot we want.

So over the past while Xogium has spent some time trying to reduce boot speed with the goal of booting to whatever shell we make and being able ready to play or record audio in under 4 seconds. There's a lot of obstacles here, such as checking filesystems, mounting filesystems, probing devices using udev, logging things to storage, etc. A lot of it can be done in parallel.

But unfortunately it looks like systemd isn't the right tool for the job here. systemd really wants to read all units at boot, which wastes a second. The earliest we can run a unit is around 3 seconds, with actual login happening around 8 seconds in to boot. In theory maybe we could save a second if systemd fixes the 'read all units at boot' bug. But even then the system is spending a lot of IO on systemd-journald and feels a bit sluggish.

Looking at alternative boot systems, OpenRC takes longer and isn't an improvement. busybox and runit have shown promise in the ability to tune what happens at boot. Mounting the root filesystem and opening a shell takes 1.7 seconds, with device probing and networking done afterwards in the background.

However, we're hitting problems with this setup:

  • busybox init doesn't support ordering starting scripts and services
  • We can't specify a service dependencies in runit
  • We can't have runit or busybox init report failures to rauc
  • We don't have a way to check and mount filesystems

The large one is especially trouble as it requires pivoting to a new root. All stock boot options I can find (systemd, busybox) require restarting all system services when pivoting, which negates any benefit of parallel startup. It's a missed opportunity since ideally we could start the shell, a logging daemon, start probing devices all while we set up the rootfs.

Ideally the system management would go like this:

  • Load root as a ramdisk or from read-only partition
  • Starts and waits for logger
  • Starts udev, filesystem mounts/checks, rauc and possibly shell
  • Starts any services on the new filesystem
  • If things look good, mark the system good with RAUC
  • Look after and restart the services, but fail really broken ones
  • Regularly check that services are okay and restart them if needed
  • On shutdown, stop all services except for logger
  • Stop logger
  • Unmount and sync filesystems and reboot

This might be doable with shell scripts and a modified runit. It would go like this:

  • Run init script
  • Get runit to start the logger
  • Get runit to start udev, filesystem mounts/checks, rauc and possibly shell
  • Filesystem mounts will get runit to start services from new filesystem
  • These new services will be run in pivoted root, existing services stay in old root
  • rauc will mark the system good if all services come up and are stable
  • A watchdog program we write will test and kill services if they don't work
  • On shutdown, shutdown script will stop all services
  • shutdown script will stop logger
  • shutdown script will unmount and sync filesystems

Some specific implementation snags and how to fix:

  • /var/log will need to be a tmpfs at first then rotated to pivot_root, copying boot logs
  • runit should only restart services 3 times so rauc can know if the system is stable
  • runit will need to support 'post-start' scripts for things like udev settling

As a prototype we might be able to avoid modifying runit by having a bit more code in the 'run' files that:

  • Increment a counter and self-disable the service if it's been too many times
  • Fork off a post-start script that checks and runs a program

If this turns out to work well it might be a better idea to roll this in to our own custom init written in Lua that can handle all this properly.

Okay, so after doing some prototyping with runit I found out that it's slow to do something basic like start a getty.

I also spent a while trying to re-clock the SDMMC2 speed to 104mhz to see if that would help with DDR52. It didn't seem to do anything.

Labels

Priority
default
Milestone
No milestone
Assignee
No one assigned
1 participant
@Jookia