this post was submitted on 06 Oct 2024
50 points (96.3% liked)

Technology

59299 readers
4357 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

I've never run a big system like this, but like the lead character in the story, I always figured exponential backoff would be enough. Turns out there's more.

top 4 comments
sorted by: hot top controversial new old
[–] [email protected] 3 points 1 month ago
[–] [email protected] 3 points 1 month ago

Very interesting, thanks for this article. It's funny how I notice ever more repetition of phenomena through different branches of engineering; metastable failure caused by feedback loops is possible both in mechanical and electrical engineering. Named differently though, resonance and ringing, respectively.

[–] RagnarokOnline 2 points 1 month ago

Loved this read, thanks for sharing. A good illustration of how chasing an issue with a quick solution can lead to bigger issues.

[–] [email protected] 2 points 1 month ago* (last edited 1 month ago)

A circuit breaker could prematurely cut off all requests to a service, even if only one shard was failing.

They only circuit break retries ?

If a single node is down, then it should not receive traffic via k8s or whatever you use to route based on liveness probe.

Why does your software need to retry anyways? I prefer not implementing live retries, stuff breaks sometimes. Tasks will retry themselves.

You can circuit break the connection to other services so that you stop contacting them if they are down. Giving them some breathing room.

The Wikipedia implem looks simple and good enough to me: https://en.m.wikipedia.org/wiki/Circuit_breaker_design_pattern