SRE Teams #9: Delivery Center
👋 Hello and welcome to this week’s edition of SRE Teams — a newsletter where I share how interesting companies are implementing Site Reliability Engineering and DevOps practices.
Illustration from undraw.co
I got the chance to speak with Daniel Minella. He manages the SRE and security teams at Delivery Center. He has been with the team since the beginning. The team marked the first steps of the company to the reliability culture.
Company
DeliveryCenter has about 600 employees. Their primary mission is to be OneStepToSell to restaurants and marketplaces. They connect the restaurants with many food apps, managing everything through a single platform. One hundred twenty people work in the experience area, split between product, technology, data, and growth.
Team
They have one hundred people in engineering. The SRE team has nine people. So it's a 1:11 engineer to SRE ratio. Product engineering and SRE are part of the same organizational unit. They orient the SRE roadmap on business and scalability.
Stack
They use GCP as their cloud provider and Kubernetes for orchestration. The main language and framework are Ruby on Rails, some apps use Elixir and Phoenix, and a few use NodeJS and Golang. They use Google Stackdriver and NewRelic for observability. The engineering team achieved a good maturity to make technology decisions, always grounding new technologies decisions on business needs. The architecture and reliability teams take part in every discussion.
Delivery
They started with developers owning CI & CD. Many times the process didn't have a good implementation, with manual deployments. As the SRE team grew, they stepped up the process by automating it and adding more transparency. Today they have an automated Pipeline using Github Actions for CI and ArgoCD for CD. Product teams can now understand every step of the delivery process and are 100% in charge of it. The SRE team provides support when necessary.
Operations
Operations have been their primary area of focus. They are structuring how teams' on-call shifts. Their goal is for product teams to be the first responders, escalating to the reliability team when needed. The SRE team also works with product engineers to improve the observability of applications. It's improving a lot the observability of the platform. Their vision is for observability to be a continuous process improvement and not a one-time thing.
Recent Success
The SRE team structure played a central role in moving the company towards a DevOps culture. Their goal has always been to improve the reliability of the platform. It's been a big success. Today SRE creates standards and policies. They focus on making sure product teams adopt good reliability practices by default.
Recent challenge
Worrying too much about state of art. They used to spend a lot of time chasing the latest in SRE and how they could put it in place. Things like observability, DevSecOps, and so on. The paradigm of tomorrow. They learned that an implemented practice is worth more than two designed solutions. So they focus on getting guides, examples, and presentations out and start adopting new standards fast. It's generating good results, and this is one of their key goals for the quarter.
Advice
Site Reliability Engineering is a cultural state cultivated through automation and processes. It's like DevOps, and alignment with the business is critical. The investments in SRE must align with the business goals of the company for it to be successful.
If you liked the challenges, Devliery Center has many open positions for SRE and other engineering roles.
Thanks
If you're enjoying SRE Teams, I'd love it if you shared it with a friend or two. I try to make it one of the best emails you get in the week, and I hope you're enjoying it.
That’s it for this week! Hit me up if you have any thoughts, feedback, or insights to share. Otherwise, see you next week!