SRE Teams #6: Leroy Merlin
Data-driven technology decisions and how communication is critical for moving a company towards DevOps.
👋 Hello and welcome to this week’s edition of SRE Teams — a weekly email where I share how interesting companies are implementing Site Reliability Engineering and DevOps practices.
Last week I published an article touching on some of the problems we are solving with Runops. Let me know what you think:
One-off scripts: DevOps last mile
Note: sorry for the delay in this weeks’ interview, it was supposed to go out yesterday, but there is a lot going on at Runops.
I got the chance to speak with Vidal and Padilha; they are part of the SRE Team at Leroy Merlin running the E-commerce operations in Brazil. We talked about making data-driven decisions for adopting technologies and how communication is critical for moving a company towards DevOps.
Leroy Merlin is a French headquartered home improvement and gardening retailer. They serve several countries in Europe, Asia, South America, and Africa. The company focus on six main sectors: DIY, building, gardening, sanitary equipment, renewable energy, and interior decoration.
Stack
The team uses a business-oriented process to make technology decisions. SRE plays a vital role, running experiments with many tools and technologies. A rigorous process ensures that product teams will adopt tools, improving speed, quality, or communications. They are always targeting improvements in business outcomes.
The E-commerce runs on a single application that is getting broken down into smaller services. Not to make it look cool in the microservices trend. But because they are facing challenges with the speed of delivery with a single codebase. The company is growing, and engineering has to keep up with the pace. They understand that microservices come with trade-offs, increasing systems’ complexity. Microservices solve the problem of organizing people to build software. It doesn't make software better. It allows a bigger group of people to work on the same software problem.
Microservices solve the problem of organizing people to build software. It doesn't make software better. It allows a bigger group of people to work on the same software problem.
They use PHP in the backend of the main application—one of the largest Laravel use cases in Latam. The frontend uses React, with some initiatives for server-side rendering using typescript and Next.js. The SRE team uses Golang to build automations. Applications use No-SQL databases; MongoDB is the primary storage mechanism. But they also rely on Elasticsearch and Redis for some functionalities. Messaging uses SQS, and they are starting a migration to Apache Kafka. Applications run on AWS Elastic Beanstalk.
Team
The SRE team has one year. They assembled a multi-disciplinary group to avoid biases. Engineers from different backgrounds make decisions impacting the whole company. A diverse team makes sure they leverage different perspectives. SRE has seven people, a mix of engineers from infrastructure, backend, and frontend engineering. Most people were product engineers before. They wanted to change and build products to serve their peers instead of customers. This team supports 40 product engineers building the company's E-commerce. They expect to double in size next year.
CI/CD
Jenkins automates delivery pipelines. They even automated some Jenkins maintenance tasks. Whenever a new update is available, a Slack bot notifies the team and proceeds with the update after approval. They are investigating other tools, as Jenkins has a significant maintenance overhead. They did some experiments with Drone, and it didn't work out for their use case. They are now evaluating Github Actions.
They built many automations to improve developers' experience with Elastic Beanstalk. One of the most interesting was the integration with Hashicorp Vault. The E-commerce app started hitting EB limits on environment variables. To increase the security and solve the limit problem, they decided to store secrets in Vault. They created an EB extension that injects secrets into environment variables whenever the application is starting.
Ops
A big focus of the team is improving developers' experience. They do it by providing chat interfaces for everyday operations, from onboarding new engineers to creating EB images in AWS. They are studying creating an internal CLI. The goal is to make it easier to bootstrap new resources and access environments.
They are migrating the monitoring, logging, and metrics stack to Datadog. After many experiments with different tools, Datadog came out as the best developer experience for their use case. The easier it is for developers to use operational tooling, the more they take part in operations. This approach allowed them to increase the developer's ownership of operations over time.
Teams are in charge of creating and supporting alerts and metrics for their apps, and logging improved a lot in recent months. Developers are more and more taking ownership of operations. SRE plays a crucial role in this change. They run initiatives to bridge the knowledge gaps in many areas across the teams, from DOJOs to tooling presentations. They work close to product teams to empower and help them leverage tools for increased speed and reliability.
Recent Success
In one year, the team had quite a few successes. The biggest was the monitoring overhaul. The company needed to have a reliable solution in place before they could start creating microservices. They need to measure existing systems to help with splitting decisions. Building microservices is almost impossible without good observability. Another area of success for the team is communication. Evangelizing product engineers on the DevOps culture. Marketing tooling to increase adoption. And explaining the SRE role to other company areas.
Recent Challenge
The process of breaking up the E-commerce application into smaller services is challenging. They use feature toggles and automations to separate deployment from releases/rollouts. But the application’s size started to slow down teams building the product, reaching a point where different groups started depending on one another for releasing new features. Breaking the monolith was necessary.
Advice
The advice is to bet on an SRE team with multi-disciplinary backgrounds. The diversity will make for a team with greater empathy for more areas of the company. Communication is the basis for everything. Bringing people from different areas will help communicate and solve problems with higher effectiveness. Solving other people's problems is a very different thing from solving your problems.
The SRE Team @ Leroy is hiring! Reach out to Padilha if you want to join this great team and project.
Thanks
If you're enjoying SRE Teams, I'd love it if you shared it with a friend or two. I try to make it one of the best emails you get on the week, and I hope you're enjoying it.
That’s it for this week! Hit me up if you have any thoughts, feedback, or insights to share. Otherwise, see you next week!