this post was submitted on 16 Sep 2024
34 points (100.0% liked)
FreeAssembly
75 readers
6 users here now
this is FreeAssembly, a non-toxic design, programming, and art collective. post your share-alike (CC SA, GPL, BSD, or similar) projects here! collaboration is welcome, and mutual education is too.
in brief, this community is the awful.systems answer to Hacker News. read this article for a solid summary of why having a less toxic collaborative community is important from a technical standpoint in addition to a social one.
some posting guidelines apply in addition to the typical awful.systems stuff:
- all types of passion projects and contributions are welcome, including and especially those that aren't programming or engineering in nature
- this is an explicitly noncommercial, share-alike space
- don't force yourself to do work you don't enjoy, or demand it of others
(logo credit, with modifications by @[email protected])
founded 7 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I have ~~two~~ three stories.
Company X: Our testbed server room was supported by redundant rooftop AC units, many yards apart. During a storm, a lightning bolt forked (split) One tip.of the bolt hit AC unit one and the other hit AC unit two, killing both cooling units. To make things worse, the server manufacturer did not add a temperature safety shutdown to the units and instead configured them to fan faster the hotter they got. By the time I got there the cable management was warping and melting due to heat.
Company Y: The main datacenter was on tower 2 and the backup datacenter was on tower 1. Most IT staff was present when the planes hit.
EDIT:
Company Z: I started work at a company where they gave me access to a "test" BigIP (unit 3) to use as my own little playground. Prior to my joining the company was run by devs doubling as IT. I deleted the old spaghetti code rules so that I could start from scratch. So, after verifying that no automation was running on my unit (unit 3), I deleted the old rules. Unfortunately the devs/admins forgot to disengage replication on "unit 2" when they gave me "unit 3". So production "unit 2" deleted its rules and told production "unit 1" to do the same. Poof...production down and units offline. I had to drive four hours to the datacenter and code the entire BigIP from scratch and under duress. I quit that job months after starting. Some shops are run so poorly that they end up fostering a toxic environment.
Well, that one dark "redundancy" story...
I don't understand why they had redundancy so physically close.
Whatever affects one has a high risk of affecting the other.
Different regions is a thing for a reason.
It's probably good to situate in time when thinking about these things. The twin towers were how a lot of companies became examples of what location redundancy really means. These days people are keeping that lesson well in mind, but back then, not so much.