Site Reliability Engineer in Madrid or Remote

Tinybird

Category
Salary
0 - €90,000
Workplace
Remote
Hours
Full-Time
Internship
No
Share offer

Job Description

What are we looking for?

We are looking for someone to help us scale and to keep our software and infrastructure reliable and elastic as we scale. Someone who knows how to make hardware and software play together, and that participate as part of the oncall team, to understand not only our product, but also the issues our clients face.

We run our stack in Linux. We try to keep things simple. Technologies we use:

  • OpenResty: SSL termination and load balancing.
  • Varnish: load balancing and, sometimes, caching.
  • Redis: metadata store.
  • Python: most of our backend uses Python except some small bits that rely on C++ for hot paths.
  • ClickHouse: our main data store.
  • Zookeeper: for ClickHouse replicas coordination.
  • We use Grafana, Loki and Mimir for monitoring and alerting.
  • Terraform: Cloud provisioning (virtual machines, networks, Kubernetes clusters).
  • Ansible: Deploys and software and config provisioning.

Our number of machines is still manageable, but the number keeps growing as we keep adding customers.

This is not about managing infrastructure but about making sure that our software uses the hardware resources wisely and flexibly. This means you will not only have to worry about automating machines, but about helping the product team to design and develop the architecture of the system as a whole. That will require you to work with our backend code and to understand how ClickHouse works.

Some challenges and things we want to improve:

  • High-availability and elasticity: as we keep adding customers, we need to architect our system to be more efficient and flexible.
  • Observability: from specific resource usage to a bird's eye view of the whole platform. This requires good knowledge of storage, networking, and computing.
  • Disaster recovery: improving our tooling to manage and discover problems, but also improving our on-call procedures.

As a specific challenge: when our customers grow, we need to upgrade their accounts. Now, we do it manually—not in the traditional sense of manual because we have tools that automate much of the process, but we need to take care of that one customer at a time: deciding what machines we need to spin-up, how much compute capacity we will provision, etc. Ideally, our architecture should allow our customers to upgrade themselves and assign more resources to them dynamically and seamlessly in the most dynamic, safe and transparent way possible.

What will we value?

  • Experience designing, building and running distributed Cloud architectures and large scale web based applications. That is, in so many words, what you will be responsible for at Tinybird.  
  • Programming skills and willingness to dive into our codebase, ClickHouse source code, or any other software we use in order to figure out how things work. At Tinybird, we work mostly with Python and C++.
  • Accountable and enthusiastic to take on the responsibility of designing and managing the platform, and an urge to take on things that may be broken. Unafraid to break stuff because you own it and can fix it if need be.
  • Bias for action, iteration and delivery. Conscious that often decisions can be reversed quickly and that speed is of the essence in business and technology.
  • That you think in terms of systems and you are attuned to edge cases, failure modes, behaviors, specific implementations.
  • Comfortable collaborating and communicating asynchronously, but expect direct communication within the team on a daily basis.
  • Build software with empathy, ensuring it's intuitive and maintainable. Document key insights and solutions to make it easy for everyone to understand and use without needing extensive documentation.
  • Experience with OpenResty, Varnish, Redis, Terraform or Ansible would be great for you to get up and running quickly, but we don’t bring you here to tell you what the right technologies are: rather we expect you to recommend the right one for each challenge.
  • Experience with ClickHouse and/or rolling out database systems at scale would be a huge plus.

Some bits about the way we work

  • We are a fully remote company, we have worked like that for many years. All of our previous companies were remote friendly companies.
  • We will provide you with up to €2400 to get the right setup at home if you need it.
  • We are just starting up so your work will impact everything we do. We also believe in full transparency and you will always know what is going on.

Here you have our company principles.

A bit more about the hiring process

  • Selected candidates will be invited to schedule a screening call with our tech team.
  • Next, you will be invited to schedule a second interview.
  • Following successful interviews, you will be invited to schedule a final meeting with at least a member of the founding team.
  • Successful candidates will subsequently be made an offer via phone or video call.

Compensation

  • A competitive package, including Stock Options.
  • Up to 90K€ depending on experience.
  • 22 days of holiday a year (plus your birthday and public holidays).
  • Freedom to work from wherever suits you best. This time, we are looking for people based in timezones closer to UTC.
 

About Tinybird

.

Other devops jobs that might interest you...