Description du poste:


Job description

We are looking for a Senior Site Reliability Engineer to join Stuart and help us make our platform more robust, handle failures gracefully, and early detect issues by the mean of automation, proper alarming, and chaos engineering.

🚀 The SRE mission is to make the platform as reliable as possible, trying to reduce the number and severity of incidents affecting the platform. We need to make sure that all the services are efficiently monitored with the right thresholds set for alarms to be meaningful, and that most of the remediation work is automated rather than manual. Further reliability of the platform is provided by introducing controlled errors in it (chaos engineering principles) and testing different disaster recovery scenarios. SREs are the stewards of reliability and they provide the technical and documentation instruments for other Engineering teams to build reliable software.

🤝 The SRE team is a new team at Stuart and you will have the opportunity to see how the team grows further, and have a word in how it does it. You will be part of the Infrastructure department under the Reliability area, together with the Engineering Support team. Other areas of the department are Cloud Engineering, Security, and IT.

What will I be doing? 🤓

  • Help the other engineering teams to build reliable, observable, and performant products.
  • Drive and help other teams to set SLOs and SLAs and track them via SLIs.
  • Design the Stuart observability stack, implement it and guide other teams to adopt it.
  • Contribute to Stuart systems reliability and performance.
  • Write playbooks for alarms, and then automate them so manual intervention is not required.
  • Document knowledge and practices in a clear way, so other departments can benefit from it.
  • Collaborate with the Engineering Support team on incident management.
  • Conduct and lead post-mortem meetings; follow-up on the action items.
  • Lead the way towards the chaos engineering path.
  • What do we need from you? 😎

  • 4+ years of experience in a similar position in an always-up, always-available mission-critical service.
  • You can come from a Systems or Software Engineering background, we will like you exactly the same!
  • Love for automation: you don’t want to repeat the same job twice.
  • You have defined and implemented metrics to monitor software components.
  • You think a false alarm is a harmful alarm.
  • Good Terraform experience.
  • You have working experience in at least one programming language.
  • Deep Linux knowledge.
  • Kubernetes power user. Bonus point: you managed a K8s production cluster.
  • Very good cloud environments knowledge (we use AWS).
  • You like teaching and pass best-practices to others, and write thorough documentation.
  • Both written and spoken fluency in English.
  • Don’t worry, we don’t expect you to tick every single item here! But it should give you a feeling of what kind of experience we are looking for.

    The stuff you wanna know 😉

  • Family-friendly work-life balance - work from home and flexible hours 
  • Option to work remotely anywhere in Portugal 🇵🇹
  • Ticket Restaurant by Edenred (€7 daily) 🥗
  • Unlimited access to Udemy for all your learning and development needs 
  • Stuart Academy with regular workshops, Stu-Classes, and Stu-Talks 
  • Stuart is putting Mental Health Awareness first! Wellness Allowance (€40 monthly) to use in any gym or sport class 🧘
  • Private healthcare - Medicare🧑‍⚕️
  • Work in an international, dynamic and passionate environment with a company culture focused on learning and development 
  • At Stuart, we believe that employees today want to evolve in collaborative, high-growth environments where they can demonstrate their abilities and thrive both professionally and personally. We are convinced that employees need to find alignment between their inner values and their company’s culture and mission to unlock their full potential. We work to create a culture of empowerment, continuous learning and growth where everyone can bring expertise, own projects and easily measure their impact 🙌

    Stuart is proud to be an equal opportunity workplace dedicated to promoting diversity. We don’t discriminate on the basis of race, religion, colour, national origin, gender, sexual orientation, age, marital status or disability status 💙

    Please note: Our Talent Acquisition Team is international coming from across the world 🌍 We kindly ask you to please submit your CV and application in English so that it can be reviewed correctly (unless the job posting is in a language other than English). Thank you 🤗

    Want to learn more about us? Visit 

      5 autres jobs qui pourrait t'intéresser:

      RemoteFR t'aide à trouver ton prochain job 100% remote : Crée ton profil

      Poster une annonce 100% télétravail

      Vous recrutez en télétravail?

      Ciblez des milliers de travailleurs remote en postant sur le 1er site d'annonces full-remote en France!