this post was submitted on 22 Jul 2024
55 points (95.1% liked)

Linux

5382 readers
27 users here now

A community for everything relating to the linux operating system

Also check out [email protected]

Original icon base courtesy of [email protected] and The GIMP

founded 1 year ago
MODERATORS
 

In light of the CrowdStrike-Microsoft outage/disaster that has been wreaking havoc on corporate Windows systems around the world since Friday, systemd lead developer Lennart Poettering pointed out how such a situation on Linux systems could be averted by leveraging systemd's Automatic Boot Assessment functionality.

System's Automatic Boot Assessment feature can allow for reverting to a previous version of the OS or kernel automatically when a system consistently fails to boot. With the systemd-boot bootloader and related tooling within systemd and leveraging the Boot Loader Specification, systemd Automatic Boot Assessment would make for much easier recovery in case of an incident like what happened with Microsoft Windows systems running CrowdStrike software last week.

top 21 comments
sorted by: hot top controversial new old
[–] [email protected] 14 points 4 months ago* (last edited 4 months ago) (2 children)

Lennart Poettering is no doubt smart, but learning all the ins and outs of systemd with terrible documentation and half-baked solutions, and just "trusting" it to do everything from UEFI booting, immutable partitions, system imaging, networking, home directory and resource management, init and daemon processes, sockets, etc. using "INI-like" files... hmm, I'd almost prefer another global outage.

[–] [email protected] 18 points 4 months ago (2 children)

Because documentation was so great for sysv and everything else back in the day…

[–] [email protected] 16 points 4 months ago (1 children)

Sysv didn't have to have a lot of documentation. It was simple to understand what it did, and the underlying system was mostly shell scripting. It didn't try to be and do everything.

I don't hate systemd. I prefer it now for the most part. I really do think Lennart Poettering is incredibly skilled and intelligent. I am just frustrated that so much gets pushed without adequate resources and support to weigh what is production-ready, and what is bleeding edge. I've already had systemd bite me in the ass at least once where they made a significant unannounced change to systemd-cryptsetup. I had to go find answers by reading through pull request and GitHub issue comments, and it wasn't easy to find either. The community acted like it wasn't a big deal that it caused systems to no longer boot. Move fast & break things isn't the message that will win over larger companies.

[–] [email protected] 3 points 4 months ago

You are looking for a LTS

[–] [email protected] 8 points 4 months ago (1 children)

If you wanted documentation, you read the init scripts.

[–] onlinepersona 2 points 3 weeks ago* (last edited 3 weeks ago)

The "self-documenting" crowd is back in boys.

Anti Commercial-AI license

[–] [email protected] 5 points 4 months ago

It doesn't do everything. Systemd is broken down into parts. You don't need to use systemd-boot to use systemd-resolvd.

[–] [email protected] 8 points 4 months ago (2 children)

It was a config file. The CrowdStrike code would already have been in the kernel for quite some time. Would you not need a previous version of the system without those kmods (or whatever they're using)? That is unlikely.

[–] [email protected] 6 points 4 months ago (1 children)

Yeah, you'd need to snapshot their data directory and roll that back. The previous kernel module may well have had the bug already, just not a malformed config file to trip it.

Also, if the driver booted ok, but then panicked soon after, would that count as a bad boot? The description seems to indicate the boot counters get reset as soon as a boot succeeds.

[–] [email protected] 5 points 4 months ago* (last edited 4 months ago) (1 children)

As someone else pointed out, even on an immutable system where you can swap out the system layer, the update would have likely been somewhere in the mutable /var directory in the userspace, since it was some kind of definition update.

I believe SteamOS uses ABA to ensure continuous operation in the case of a bad update, but an image rollback would only work if you could include the offending file/directory for anything that's not in the system layer.

[–] [email protected] 2 points 4 months ago (1 children)

I think having an A partition and a B partition (I'm assuming that's how SteamOS works) wouldn't help in this case. If the A partition downloaded the definition file, crashed and failed to reboot; the bootloader could failover to the B partition - which would then download the definition file, crash and fail to reboot. It would have to keep rolling back to a last known good snapshot until the update got withdrawn.

You could have an ephemeral set up that wipes /var and /etc and recreates them every boot. I don't think these EDR tools would like that very much though.

[–] [email protected] 1 points 4 months ago (1 children)

You could potentially block your network by disabling your router or something, so it couldn't download the bad update, but you'd have to know that was a step to prevent it (which most people didn't until it was too late).

Ostree-based systems are handy for replacing the system layer, but configs live (mostly) in userspace, and they persist.

[–] [email protected] 2 points 4 months ago

Well at that point, just don't install any kernel mode EDR software at all.

NixOS can be set up for impermanence where all config is recreated every boot and nothing persists besides the nix store. There's helpers for ephemeral home also, so you can have something like TailsOS. I'm sure you could do that with other distros but you'd need absolute discipline to have everything the machine needs provisioned at boot.

[–] [email protected] 2 points 4 months ago

Maybe it could run a recovery program on repeated boot failure

[–] [email protected] 8 points 4 months ago (1 children)

That would be really good especially for immutable systems

[–] [email protected] 10 points 4 months ago

Already exists in SteamOS (which is immutable-ish). Each update is downloaded onto a second inactive partition, and the system switches at boot. If it can't boot, it switches back to the previous system partition and blacklists that update. It tries again when the next update rolls out.

Users never know the difference and should theoretically always have a bootable system that way.

[–] [email protected] 4 points 4 months ago

Or any of another three systems that don't depend on systemd.

Snapper and timeshift both make bootable backups when paired with grub-btrfs or refind-btrfs (depending on which booter you're using).

[–] [email protected] 2 points 4 months ago (1 children)
[–] technohacker 4 points 4 months ago

Just gotta invoke skynetctl

[–] [email protected] 0 points 4 months ago

the year of ...

ahh fuck it, gonna take another 20 years haha

[–] [email protected] -1 points 4 months ago

Please, please let this dude implement the systemd into Microsoft Windows :-D

And when systemd is implemented into Windows, Linux community can then rip out systemd and allow SysVInit, runit and other init systems to become modular across all of Linux again. If systemd gets purged in the process then all the better for choice in Linux.