Site Reliability Engineer
Site Reliability Engineer
The Discipline ❤️
Site Reliability Engineering at MoonPay is responsible for providing a resilient, secure, production-ready platform that enables MoonPay to safely deploy applications and services in a self-serve, repeatable manner. We believe that SRE should support both our product delivery and operational teams by surfacing data from our production environment and driving meaningful change based upon what we learn from it.
Current Tech Stack 💻
- Typescript as our programming language of choice
- Node.js as our backend platform
- TypeORM, TypeDI, TypeGraphQL and routing-controllers as our backend libraries
- React and NextJS hosted on Vercel as our frontend
- Google Cloud Platform to host our services
- Postgres as our core database
- Redis for caching
- Bull to manage background tasks
- DataDog for logging and monitoring
- ArgoCD for continuous deployment on Kubernetes
- GitHub to manage our source code
- Jest to run our tests ✅
What you’ll do 👀
In the short term we need to increase the resiliency and reliability of our current PaaS solution with things such as:
- Improving the maintainability of our infrastructure as code
- Building dashboards, monitoring & alerting mechanisms with Datadog
- Load testing and performance tuning our production services
- Lifecycling and maintenance of our Kubernetes clusters
In the medium to long term you’ll get to:
- Implement new and shiny technologies on top of Kubernetes as you see fit to ensure our tech can scale with the business.
- Develop and integrate solutions with a bias for automation in order to improve and maintain reliability across the production estate and make recovery easier.
- Design and track metrics for site uptime and performance ensuring high levels of visibility are maintained.
- Own the deployment pipelines and continuously improve our monitoring and alerting capabilities.
- Collaborate closely with all other engineering functions to provide timely feedback from our environments.
- Support Engineering on their journey to deliver better software, faster and more safely (think “It’s OK to deploy on Fridays” 😎).
You should apply if ✅
- You have strong systems administration skills, know the difference between a container and a virtual machine, and know your way around a Linux terminal
- You have platform engineering/SRE experience at leading startups or fast growing tech companies
- You have either had experience with some of our tech stack or are confident you can cross train and up skill quickly
- You have experience working in a regulated industry
- You are confident working with and guiding developers on monitoring and logging of complex systems at scale
- You have worked on complex projects
- You can work collaboratively with different teams i.e. Security, Data, Engineering
- You want to forge and own MoonPays reliability & recovery processes
- You’ve got at least a basic understanding of complex reliability structures, theories, principles, and best practices
Research has shown that women are less likely than men to apply for this role if they do not have solid experience in 100% of these areas. Please know that this list is indicative and that we would still love to hear from you even if you feel you only are a 75% match. Skills can be learnt, diversity cannot.
We promote a diverse and inclusive culture at MoonPay.
Unfortunately, we are unable to offer visas of any kind at this time!
Our interview process takes place on Google Hangouts and tends to consist of the following stages:
- Recruiter call (20-30 minutes)
- Hiring Manager Screen (30-45 minutes)
- System Design (45 minutes)
- Technical Deep Dive (45 minutes)
- Values Interview (30 minutes)
Please let us know if you require any accommodations for the interview process, and we’ll do our best to provide assistance.