Let's be clear, this isn't the single programmer's fault. Everybody will eventually make a mistake. The fact that it wasn't caught by mitigating measures such as reviews, tests, and audits is the real error we can learn from here.
A Comm for Historymemes
A place to share history memes!
Rules:
-
No sexism, racism, homophobia, transphobia, assorted bigotry, etc.
-
No fascism, atrocity denial, etc.
-
Tag NSFW pics as NSFW.
-
Follow all Lemmy.world rules.
Banner courtesy of @[email protected]
A Proton-M booster carrying a GLONASS satellite crashed shortly after takeoff at Baikonur in 2013. The failure was caused by a gyroscope package that had been installed upside down. The receptacle had a metal indexing pin that should've prevented the incorrect installation. The worker simply pushed so hard that it bent out of the way.
When you make a foolproof design, God makes a better fool.
How did someone like this land a job at NASA?
Ah yes, it's on the internet, so it must be American.
- Kosmodrom Baikonur (located in Kazakhstan) is the primary launch site of Roskosmos (Russia)
- The Proton is a Soviet-made heavy launch rocket, still used today (not related to Rocket Lab's Electron and Neutron families (which are also not American))
- GLONASS is the Soviet/Russian equivalent of the GPS
I think it's safe to say that the guy did not land a job at NASA.
Didn't nasa make the same mistake ? Because I remember that they put arrows on the slots because someone put a sensor upside down.
I can't recall anything like that. The only other crash I remember that was caused by a sensor was the Schiaparelli lander, and it was an ESA mission.
I remember it from a youtube video from one of those engineering channels (might have been "real engineering") probably a year ago. I only remember it because I thought "wow they have to have so many safeties" and that it is good to draw on parts and such instead of just relying on technical drawings.
I don't remember, but it might not have crashed (multiple sensors), and it might not have had a latch/notch. But it was a long time ago.
Edit: I still remember the big yellow arrow.
I know a story about a certain fighter jet we built in the United States. Programmers for the radar had everything set and they ran the tests over and over and the radar was fucking up. Don't want to put in to many details but end result was about $100m dollars in research losses to find out the mechanic who installed the antenna on the front of the fighter turned it a quarter turn to far and it must have stripped the threads and bent the antenna slightly. Took over a month for them to catch it. They just kept assuming the programming was wrong because the antenna looked right to the eye from as close as the standard person got
“Baikonur”
Probably not NASA…
https://www.space.com/21811-russian-rocket-crash-details-revealed.html
Probably by being qualified, and also by being a human being who sometimes makes mistakes and had a bad day.
I think it was a different era, to borrow an awful phrase. In 1962 they were still figuring out best practices for reviews, tests, and audits. Even today, lone hero outputs can get pretty far when processes aren't follow.
Which they did learn from!
I guarantee every mistake like this at any good company leads to a leap forward in tooling for simulation, testing, code building, review, merging, local dev environments etc.
The good companies share their work (via open sourcing their solution, blogging their learnings) or by contribute to existing solutions.
NASA's ROI cannot be measured. The amount of industries their R&D has touched is massive
But did leadership recognize that, or did the programmer catch the blame?
Don’t forget this one https://en.m.wikipedia.org/wiki/Mars_Climate_Orbiter
An investigation attributed the failure to a measurement mismatch between two measurement systems: SI units (metric) by NASA and US customary units by spacecraft builder Lockheed Martin.
Oops
Mars Climate orbiter holds the record I think for coding problem and spacecraft failure. That one cost $460m.
A great runner up would be the loss of the maiden flight of the new Ariane 5 rocket at $370m:
"On June 4th, 1996, the very first Ariane 5 rocket ignited its engines and began speeding away from the coast of French Guiana. 37 seconds later, the rocket flipped 90 degrees in the wrong direction, and less than two seconds later, aerodynamic forces ripped the boosters apart from the main stage at a height of 4km. This caused the self-destruct mechanism to trigger, and the spacecraft was consumed in a gigantic fireball of liquid hydrogen.
The disastrous launch cost approximately $370m, led to a public inquiry, and through the destruction of the rocket’s payload, delayed scientific research into workings of the Earth’s magnetosphere for almost 4 years. The Ariane 5 launch is widely acknowledged as one of the most expensive software failures in history. What went wrong?
The fault was quickly identified as a software bug in the rocket’s Inertial Reference System. The rocket used this system to determine whether it was pointing up or down, which is formally known as the horizontal bias, or informally as a BH value. This value was represented by a 64-bit floating variable, which was perfectly adequate.
However, problems began to occur when the software attempted to stuff this 64-bit variable, which can represent billions of potential values, into a 16-bit integer, which can only represent 65,535 potential values. For the first few seconds of flight, the rocket’s acceleration was low, so the conversion between these two values was successful. However, as the rocket’s velocity increased, the 64-bit variable exceeded 65k, and became too large to fit in a 16-bit variable. It was at this point that the processor encountered an operand error, and populated the BH variable with a diagnostic value."
The kicker on this one was the bug was copied from the previous successful Ariane 4 rocket code, but the Ariane 4 never experienced it because the Ariane 4 first stage was dropped in each flight before the bug would show itself, so it was never an issue there. Because the Ariane 5 had a slightly different flight profile it was in the air a longer period of time...enough time to experience the bug and cause a loss of the rocket in flight.
Static type checking ftw
I'll keep it going:
Don't forget about the time Initech had it's credit union hacked with a virus that was supposed to only take a negligible percentage of each transaction but the programmer figured he must have "put the decimal in the wrong place or something."
The group got away under pretty mysterious circumstances...
Didn't their corporate office burn down afterwards? Suspicious indeed...
Why the fuck is/was NASA using the US customary system? Science is always done in metric, even in the US.
It says NASA was using metric, Lockheed Martin used imperial. Read it again.
IIRC they had outsourced to a contractor and that contractor was using imperial
~~It was just a simple transposition right? 2.45 (wrong) vs 2.54 (right)~~
E: never mind, I was wrong
Always loved the story of what they saw in the source code of software they used in historic NASA missions from decades past.
https://interestingengineering.com/science/code-moon-landings-released-surprising-hilarious
Turns out, the programmers back then were just as unsure about what they were doing as much as programmers are today ... except the guys back then had computers less powerful than a modern smart watch controlling a missile that was aimed at the moon.
I also heard about a fuckup with the European space agency who had hired an American to work on a particular bit of the project. He used an imperial measurement somewhere and it caused the whole thing to fail.
That's why there are SI-units.
That man's name? Filbert Einstein. No relation.
this is why I hate working with hardware.