this post was submitted on 01 Apr 2025
4 points (100.0% liked)
Software Reliability Engineering
55 readers
2 users here now
👋 Welcome to the SRE Community!
Whether you're just learning about Site Reliability Engineering/Software Reliability Engineering or you're a seasoned on-call warrior, you're in the right place.
SRE (Site Reliability Engineering - AKA Software Reliability Engineering) is a discipline that uses software engineering principles to ensure that systems are reliable, scalable, and resilient. It’s about balancing feature velocity with system stability—keeping things running even when they shouldn’t.
💬 What can you post here?
Here are a few ideas to get started:
- War stories from production incidents and what you learned
- Cool tools for observability, monitoring, alerting, and automation
- Best practices around on-call, SLOs, blameless postmortems, and chaos engineering
- Questions about reliability engineering and career advice
- Infrastructure as code, CI/CD pipelines, and deployment strategies
- Memes. Tasteful ones. SREs need to laugh too 😅
🌐 Be excellent to each other
This community is part of the programming.dev network. Please make sure to:
- Read and follow the programming.dev code of conduct
- Keep discussions respectful and inclusive
- Assume good faith, and be generous in your interpretations
founded 6 days ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
This week (likely most of the month) I am rebuilding a backup process which appears to currently rely on Syncthing and hasn't been tested in years. Everybody got to put in their opinions, so I need to design something which has a single source of truth, also has 3-2-1 replica counts, takes snapshots of the production DBs without degrading them, and burns a monthly DVD on a workstation in the office. Once I'm done with that, I get to look at the office VPN's performance problems.