awesome-sre

by dastergon

dastergon / awesome-sre

A curated list of Site Reliability and Production Engineering resources.

5.6K Stars 710 Forks Last release: Not found Creative Commons Zero v1.0 Universal 533 Commits 0 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

Awesome Site Reliability Engineering Awesome

A curated list of awesome Site Reliability and Production Engineering resources.

What is Site Reliability Engineering?

"Fundamentally, it's what happens when you ask a software engineer to design an operations function." - Ben Treynor Sloss, VP Google Engineering, founder of Google SRE

Contributing

Please take a look at the contribution guidelines first. Contributions are always welcome!

Contents

Culture

Education

Books

Hiring

Reliability

Monitoring & Observability & Alerting

On-Call

Post-Mortem

Capacity Planning

Service Level Agreement

Performance

Programming

Misc Articles

Real-time Messaging

Blogs

  • Brendan Gregg's Blog - Highly Technical Blog Posts About Systems Internals, Performance and SRE.
  • Everything Sysadmin - Blog Posts About SysAdmin/DevOps/SRE by Tom Limoncelli.
  • High Scalability - Technical Blog Posts About Systems Architecture.
  • rachelbythebay - Techincal Blog Posts.
  • Production Ready - A mailing list about building resilient infrastructure and tools.
  • Susan J. Fowler - Various blog posts about SRE, Software Engineering and Microservices.
  • SysAdvent - One article for each day of December, ending on the 25th article.
  • Operations for Developers - A collection of resources for developers to strengthen their Ops skills.
  • Stephen Thorne's Blog - Blog Posts About SRE
  • Increment - A digital magazine about how teams build and operate software systems at scale.
  • GopherSRE - Blog Posts about Go and SRE.
  • Cindy Sridharan - Blog posts about distributed systems and their management.
  • Blameless Blog - Blog posts about SRE culture and practices.
  • Resilience Roundup - Weekly analysis of Resilience Engineering and Human Factors research designed for software systems
  • Squadcast Blog - Blog posts about SRE best practices, reliability, on-call and incident management.

Newsletters

  • DevOpsLinks - A weekly newsletter about SRE, SysAdmin and DevOps news, tools, tutorials and opinions.
  • KubeWeekly - The weekly newsletters for all things Kubernetes. KubeWeekly is curated by Bob Killen, Chris Short, Craig Box, Kim McMahon and Michael Hausenblas
  • SRE Weekly - Weekly Site Reliability Newsletter.
  • O’Reilly Systems Engineering and Operations Newsletter - Weekly systems engineering and operations news and insights from industry insiders.
  • ChaosEngineering.news - Chaos Engineering newsletter. All things Chaos Wngineering, directly to your inbox!

Conferences & Meetups

Twitter

SRE Tools

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.