Site Reliability Engineering
Site Reliability Engineering, it is a practice that tech giants are practicing now a days where operation problems of an organization are treated as software engineering problem, in other way when a developer is assigned to solve operations problem. Basically, SREs are software engineers who build various softwares to make better reliable systems. The question that arises is isn’t that DevOps? or which is better SRE vs DevOps?
History :
This term was first coined by Ben treynor, a software engineer at google in 2003, this practice started lot earlier than DevOps movement. Shortly, after implementing SRE at their premises treynor’s team shortly launched SRE ebook to aware the industry about the practice.
Responsibilities Of Site Reliability Engineers (SREs) :
- SREs are accountable and take on-call duties for the systems that are running in production.
- SREs are responsible for developing software(s) that improves the reliability of systems.
- They are responsible for performing post incident reviews of the systems that fails.
SRE vs DevOps : Which is better?
There’s a great analogy to understand the two terms better. So, here it goes, let’s consider DevOps as an interface i.e. similar to abstract class containing methods without definitions, and SRE as a concrete class implementing DevOps.
Interface DevOps{ Reduce Organizational silos(); Accepting failures(); Implement gradual changes(); Leverage Automation(); Measure Everything(); }
Now, SRE as a concrete class will implements DevOps, alongwith defining all methods as :
- Reducing the organizational silos, by sharing the ownership among software engineers, product team and SREs by using same set of tools.
- Accepting Failures, as no system is 100% reliable so faults will be there, SREs do Blameless post-martems of systems and generate metadata for the same.
- Implementing small changes, smaller the change is, easier it is to identify the problem or faster it is to fix the change or rollback. Thereby, reducing the cost of failure.
- Leveraging Automation, automating manual tasks, wherever possible on the production system such as user creation, installing packages, alerting or logging etc.
- Measuring Everything, at the end monitoring the right things that has implemented, as on the end of the day you should have numbers or clear metrics that supports success.
So, SRE and DevOps are not competing standards, rather they go hand in hand together. So, it is SRE with DevOps.
Please Login to comment...