Hi! Before we start, I wanted to welcome everyone that joined last week. We are growing fast, thanks to all of you that are sharing. As more people subscribe, I get more energy to keep doing it and improving the content. Thank you.
On with the interview.
I got the chance to speak with Daniel Serodio. He is leading the Production Engineering team at Creditas. We talked about how the team's tooling enabled the company to scale fast without breaking things.
Creditas is the leading 100% digital lending and consumer solutions platform in Latin America. It is built around three ecosystems: real estate, automotive, and employee salaries & benefits. They are changing the credit market in Brazil and Mexico. The team has more than 1600 people with offices in Brazil, Spain, and Mexico; and valued at $750 million on its Series D round.
Team
The company has around 300 developers across product teams. The Production Engineering team has 10 people. They are in charge of the platform developers use to build and run applications. Nucleon, the latest version of the platform, is a group of tools that boosts developers’ experience.
Tooling
A key component of the platform is cliditas, their internal CLI. Its goal is to group the interactions of developers in a single place. Reducing the cognitive load of learning a different tool for each job. Building, deploying to many environments, getting logs and metrics, and more; are all done using the cli.
Nucleon is an evolution of the platform. Before, developers would define their needs using Terraform. The platform team provided modules to simplify the interface and automations to apply definitions.
Developer Experience
One interesting fact about Nucleon is how it made the migration process smooth. Changing platforms is challenging. There is friction with the product to absorb the extra time to adopt a new tool. But with a better developer experience, engineers started using the new platform on their own.
cliditas architecture is interesting. It's written in Go. Using a mix of client-side logic and remote calls to APIs built by the team. An add-on component wraps each new command using DLCs. On the client, it uses kapp, AWS CLI, kubectl, and kustomize.
Stack
Containers run on Kubernetes with EKS. The team manages monitoring and logging tools in the cluster: Prometheus, Grafana, and EKK. The company has a few apps in Ruby but adopted Kotlin as the main language. They found Kotlin as a better alternative to Ruby. Scaling the systems and engineering teams motivated the move. They use a template repository with Spring Boot to bootstrap every new service. In total, they have around 170 services.
Delivery & Ops
Engineers have full autonomy to ship new code within their squads. Developers also operate applications; the support teams reach to product teams when necessary. Incidents escalate to Production Engineering for platform-specific issues or when dev teams need help with critical production incidents.
Developers use Newrelic to troubleshoot incidents. For apps using Nucleon, Prometheus, and Grafana are also available. Apps created with cliditas are born with pre-configured dashboards and alerts. Kibana is the tool for querying logs, but they can also use cliditas.
Recent Success
The rapid and organic adoption of Nucleon is a big success. Developers are not only using it, but they also help each other in the journey. This made adoption even faster. The tool marketed itself and spread across the company. Making sure developers know the tools exist and can use them is hard. Marketing of platform tools is important. Using developer UX to drive adoption was key.
Making sure developers know the tools exist and can use them is hard.
Using developer UX to drive adoption was key.
Marketing of platform tools is important.
Recent Challenge
Scaling the platform to keep up with the fast growth of the company. The fast growth in the last months made the team adapt with little time. From the systems to the processes and interfaces with developers. Aligning the strategies of the platform with the business is critical to ensure growth is well supported.
Advice
Invest in the developer experience. Treat developers as actual end-users of a product. Talk to them, understand their workflows and problems. The team got a fantastic result from putting the time to nail the experience with internal tooling. And it paid off. Rapid adoption of the new model means less time supporting the previous version.
Invest in the developer experience.
Treat developers as actual end-users of a product.
Talk to them, understand their workflows and problems.
If you want to learn more, they have a blog post (in Portuguese) detailing their journey through different versions of the platform. And they are also hiring for many engineering positions. Apply here if you liked the challenges the Production Engineering team is solving. Or here for other engineering positions.
Recap
If you are a new subscriber, make sure you check the previous 2 SRE Teams:
Reducer
Check out Reducer if you want to get a brief analysis of the bonus content. It's a newsletter where I share everything I learned in the week, including articles and book analysis. The next issue will be out in a few hours; subscribe here.
See you next week;