SRE Teams #11: Natura
How to empower developers to ship and operate code at scale without depending on the Platform team.
👋 Hello and welcome to this week’s edition of SRE Teams — a newsletter where I share how interesting companies are implementing Site Reliability Engineering and DevOps practices.
It’s been a while since I last mentioned Runops here. We released a ton of new features and improvements. So, if your team is struggling to manage access to databases, Kubernetes, AWS, and others, we can help. Let’s chat.
Now, on to today’s team.
I got the chance to speak with Renzo, Rafael, and Marcelo. Head of Platform, Tech Lead, and SRE. We talked about how they created an internal No-Ops platform. They empower developers to ship and operate code at scale without depending on the Platform team.
Company
Natura & Co is the largest beauty group in the world. They have more than 100 million consumers from many channels. The direct sale channel alone counts 1.7 million sales consultants spread across Latam.
Team
Supporting 350 developers, the platform team has ten people collaborating with DevOps engineers allocated within squads. DevOps squads help developers on their day-to-day journey and collect requirements, problems, and opportunities. They then bring this information to the COE team, turning these needs into features of the No-Ops platform.
Stack
They built COE using Nodejs for the backend API and React for the web application. The NoOps platform enables any engineer to manage the stack components used by their applications.
They use Terraform/Terragrunt and ArgoCD for Infra As Code and GitOps; Elastic and Kibana for logs; Grafana and Prometheus for metrics; DetectSecret and Sonar for vulnerability scans on containers and code; and Jenkins for CI & CD. Applications run on Kubernetes using AWS EKS.
Nobody has write access to cloud accounts. Instead, they invested heavily to create abstractions for developers to follow you build it; you run it with the tools they provide.
Delivery
Product teams can build and launch new applications from scratch without the platform team. The NoOps platform lets developers define environments, cloud providers, ingress rules, and other properties. A Jenkins pipeline bootstraps the application. It creates Git repositories and sets up monitoring, logging, and alerting. In the end, the pipeline commits Kubernetes definitions to Git. ArgoCD rollouts definitions. They have more than 1 thousand applications in ArgoCD.
Ops
The team invests in documentation and communication to drive the adoption of the platform. Engineers add docs for every feature added and create video tutorials for critical components. The DevOps Trail is a program designed to onboard new engineers into the platform. It is a week-long program covering cost management, monitoring, and troubleshooting tools. They gather feedback from developers and do one-off workshops on topics in high demand.
Alerts page the SRE team; they are the guardians of the operation. The SRE team sets alerts baselines for all applications. Product engineers refine alerts with business-specific requirements. They use Pagerduty to notify engineers. Dynatrace, Grafana, and Elastic help them troubleshoot problems.
Success
The new platform generated excellent business outcomes in recent months. A few key projects were Voice Commerce: a product that enabled consumers to buy products using Google Assistant. They built and delivered the product from scratch 100% on the NoOps platform. Another recent project was the automation of the deployment of Machine Learning workloads. The data team was spending hours manually setting up infrastructure to run ML jobs. Now they run on EKS fully automated by the platform.
Challenge
Adapting the platform to new business requirements and technologies is a big challenge. The business is moving fast; new team members join every week and add new technologies to the stack. The COE platform must adapt and support these technologies to empower these teams to move fast. The platform can't be a bottleneck.
Advice
Keeping a solid team is critical to keep up with these challenges. The ability to get everyone on board with the constant inflow of new challenges is fundamental. The group breathes innovation and leverages new technologies to solve problems. One of the critical things that powers this environment is not to compromise on culture fit. They have a few positions open for quite some time. However, they can't close them due to difficulty finding candidates with the right technical and cultural fit.
If you enjoyed these challenges, reach out to Renzo or Rafael to learn more about the open positions. You can find their available jobs here.
Thanks
If you're enjoying SRE Teams, I'd love it if you shared it with a friend or two. I try to make it one of the best emails you get in the week, and I hope you're enjoying it.
That’s it for this week! Hit me up if you have any thoughts, feedback, or insights to share. Otherwise, see you later!