this post was submitted on 08 Dec 2024
313 points (98.5% liked)

A Comm for Historymemes

1575 readers
51 users here now

A place to share history memes!

Rules:

  1. No sexism, racism, homophobia, transphobia, assorted bigotry, etc.

  2. No fascism, atrocity denial, etc.

  3. Tag NSFW pics as NSFW.

  4. Follow all Lemmy.world rules.

founded 2 years ago
MODERATORS
 
all 28 comments
sorted by: hot top controversial new old
[–] [email protected] 170 points 3 weeks ago (4 children)

Let's be clear, this isn't the single programmer's fault. Everybody will eventually make a mistake. The fact that it wasn't caught by mitigating measures such as reviews, tests, and audits is the real error we can learn from here.

[–] [email protected] 86 points 3 weeks ago (1 children)

A Proton-M booster carrying a GLONASS satellite crashed shortly after takeoff at Baikonur in 2013. The failure was caused by a gyroscope package that had been installed upside down. The receptacle had a metal indexing pin that should've prevented the incorrect installation. The worker simply pushed so hard that it bent out of the way.

When you make a foolproof design, God makes a better fool.

[–] [email protected] 3 points 3 weeks ago (4 children)

How did someone like this land a job at NASA?

[–] [email protected] 18 points 3 weeks ago* (last edited 3 weeks ago) (1 children)

Ah yes, it's on the internet, so it must be American.

  • Kosmodrom Baikonur (located in Kazakhstan) is the primary launch site of Roskosmos (Russia)
  • The Proton is a Soviet-made heavy launch rocket, still used today (not related to Rocket Lab's Electron and Neutron families (which are also not American))
  • GLONASS is the Soviet/Russian equivalent of the GPS

I think it's safe to say that the guy did not land a job at NASA.

[–] gens 1 points 3 weeks ago (1 children)

Didn't nasa make the same mistake ? Because I remember that they put arrows on the slots because someone put a sensor upside down.

[–] [email protected] 2 points 3 weeks ago (1 children)

I can't recall anything like that. The only other crash I remember that was caused by a sensor was the Schiaparelli lander, and it was an ESA mission.

[–] gens 1 points 3 weeks ago* (last edited 3 weeks ago)

I remember it from a youtube video from one of those engineering channels (might have been "real engineering") probably a year ago. I only remember it because I thought "wow they have to have so many safeties" and that it is good to draw on parts and such instead of just relying on technical drawings.

I don't remember, but it might not have crashed (multiple sensors), and it might not have had a latch/notch. But it was a long time ago.

Edit: I still remember the big yellow arrow.

[–] [email protected] 9 points 3 weeks ago* (last edited 3 weeks ago)

I know a story about a certain fighter jet we built in the United States. Programmers for the radar had everything set and they ran the tests over and over and the radar was fucking up. Don't want to put in to many details but end result was about $100m dollars in research losses to find out the mechanic who installed the antenna on the front of the fighter turned it a quarter turn to far and it must have stripped the threads and bent the antenna slightly. Took over a month for them to catch it. They just kept assuming the programming was wrong because the antenna looked right to the eye from as close as the standard person got

[–] [email protected] 4 points 3 weeks ago

Probably by being qualified, and also by being a human being who sometimes makes mistakes and had a bad day.

[–] [email protected] 36 points 3 weeks ago

I think it was a different era, to borrow an awful phrase. In 1962 they were still figuring out best practices for reviews, tests, and audits. Even today, lone hero outputs can get pretty far when processes aren't follow.

[–] towerful 18 points 3 weeks ago (1 children)

Which they did learn from!
I guarantee every mistake like this at any good company leads to a leap forward in tooling for simulation, testing, code building, review, merging, local dev environments etc.
The good companies share their work (via open sourcing their solution, blogging their learnings) or by contribute to existing solutions.
NASA's ROI cannot be measured. The amount of industries their R&D has touched is massive

[–] [email protected] 4 points 3 weeks ago

But did leadership recognize that, or did the programmer catch the blame?

[–] [email protected] 41 points 3 weeks ago (4 children)

Don’t forget this one https://en.m.wikipedia.org/wiki/Mars_Climate_Orbiter

An investigation attributed the failure to a measurement mismatch between two measurement systems: SI units (metric) by NASA and US customary units by spacecraft builder Lockheed Martin.

Oops

[–] [email protected] 25 points 3 weeks ago (1 children)

Mars Climate orbiter holds the record I think for coding problem and spacecraft failure. That one cost $460m.

A great runner up would be the loss of the maiden flight of the new Ariane 5 rocket at $370m:

"On June 4th, 1996, the very first Ariane 5 rocket ignited its engines and began speeding away from the coast of French Guiana. 37 seconds later, the rocket flipped 90 degrees in the wrong direction, and less than two seconds later, aerodynamic forces ripped the boosters apart from the main stage at a height of 4km. This caused the self-destruct mechanism to trigger, and the spacecraft was consumed in a gigantic fireball of liquid hydrogen.

The disastrous launch cost approximately $370m, led to a public inquiry, and through the destruction of the rocket’s payload, delayed scientific research into workings of the Earth’s magnetosphere for almost 4 years. The Ariane 5 launch is widely acknowledged as one of the most expensive software failures in history. What went wrong?

The fault was quickly identified as a software bug in the rocket’s Inertial Reference System. The rocket used this system to determine whether it was pointing up or down, which is formally known as the horizontal bias, or informally as a BH value. This value was represented by a 64-bit floating variable, which was perfectly adequate.

However, problems began to occur when the software attempted to stuff this 64-bit variable, which can represent billions of potential values, into a 16-bit integer, which can only represent 65,535 potential values. For the first few seconds of flight, the rocket’s acceleration was low, so the conversion between these two values was successful. However, as the rocket’s velocity increased, the 64-bit variable exceeded 65k, and became too large to fit in a 16-bit variable. It was at this point that the processor encountered an operand error, and populated the BH variable with a diagnostic value."

source

The kicker on this one was the bug was copied from the previous successful Ariane 4 rocket code, but the Ariane 4 never experienced it because the Ariane 4 first stage was dropped in each flight before the bug would show itself, so it was never an issue there. Because the Ariane 5 had a slightly different flight profile it was in the air a longer period of time...enough time to experience the bug and cause a loss of the rocket in flight.

[–] towerful 16 points 3 weeks ago

Static type checking ftw

[–] [email protected] 16 points 3 weeks ago* (last edited 3 weeks ago) (1 children)

I'll keep it going:

Don't forget about the time Initech had it's credit union hacked with a virus that was supposed to only take a negligible percentage of each transaction but the programmer figured he must have "put the decimal in the wrong place or something."

The group got away under pretty mysterious circumstances...

[–] [email protected] 8 points 3 weeks ago

Didn't their corporate office burn down afterwards? Suspicious indeed...

[–] [email protected] 3 points 3 weeks ago (2 children)

Why the fuck is/was NASA using the US customary system? Science is always done in metric, even in the US.

[–] [email protected] 15 points 3 weeks ago* (last edited 3 weeks ago)

It says NASA was using metric, Lockheed Martin used imperial. Read it again.

[–] [email protected] 6 points 3 weeks ago

IIRC they had outsourced to a contractor and that contractor was using imperial

[–] [email protected] 1 points 3 weeks ago* (last edited 3 weeks ago)

~~It was just a simple transposition right? 2.45 (wrong) vs 2.54 (right)~~

E: never mind, I was wrong

[–] [email protected] 17 points 3 weeks ago

Always loved the story of what they saw in the source code of software they used in historic NASA missions from decades past.

https://interestingengineering.com/science/code-moon-landings-released-surprising-hilarious

Turns out, the programmers back then were just as unsure about what they were doing as much as programmers are today ... except the guys back then had computers less powerful than a modern smart watch controlling a missile that was aimed at the moon.

[–] [email protected] 6 points 3 weeks ago (1 children)

I also heard about a fuckup with the European space agency who had hired an American to work on a particular bit of the project. He used an imperial measurement somewhere and it caused the whole thing to fail.

[–] [email protected] 3 points 3 weeks ago

That's why there are SI-units.

[–] [email protected] 4 points 3 weeks ago

That man's name? Filbert Einstein. No relation.

[–] [email protected] 2 points 3 weeks ago

this is why I hate working with hardware.