Open In App

Site Reliability Engineering

Site Reliability Engineering,

it is a practice that tech giants are practicing now a days where operation problems of an organization are treated as software engineering problem, in other way when a developer is assigned to solve operations problem. Basically, SREs are software engineers who build various softwares to make better reliable systems. The question that arises is isn’t that DevOps? or which is better SRE vs



DevOps

?



History :

This term was first coined by

Ben treynor

, a software engineer at google in 2003, this practice started lot earlier than DevOps movement. Shortly, after implementing SRE at their premises treynor’s team shortly launched SRE ebook to aware the industry about the practice.

Key principles of Site Reliability Engineers:

1. Service Level Objectives (SLOs):

SLOs specify the desired degree of dependability that a service need to accomplish. These are quantifiable, precise targets which help in coordinating technical and business goals. They rank actions to satisfy customer expectations and act as a basis for decision-making.

2. Budgets for errors:

Error budgets, which indicate the maximum number of errors or downtime permitted in a specified period of time, are linked to SLOs. They offer a numerical indicator for the required level of system reliability. Error budgets let SREs decide when to invest in new features and when to concentrate on enhancing reliability.

3. Observation and Warning:

Prompt issue detection and response depend heavily on effective monitoring and alerting. SREs make sure that relevant information is gathered by monitoring systems and that warnings are clear, actionable and free from false positives.

4. Automation:

SRE’s core concept of automation places an emphasis on reducing down on manual labor and boosting operational effectiveness. SREs free up teams to concentrate on more strategic and creative work by automating tedious and prone to error chores. To guarantee a dependable and scalable system, this involves automating monitoring, issue response and deployment procedures.

5. The importance of reliability in culture:

SRE practices must be successful in creating a culture of reliability. This involves developing an attitude that values dependability and willingness to grow from mistakes.

Responsibilities Of Site Reliability Engineers (SREs) :

SRE vs DevOps : Which is better?

There’s a great analogy to understand the two terms better. So, here it goes, let’s consider DevOps as an

interface

i.e. similar to abstract class containing methods without definitions, and SRE as a

concrete class

implementing DevOps.

Interface DevOps{
Reduce Organizational silos();
Accepting failures();
Implement gradual changes();
Leverage Automation();
Measure Everything();
}

Now, SRE as a concrete class will implements DevOps, alongwith defining all methods as :

Article Tags :