Site Reliability Engineer - Data Platform
atKraken Digital Asset Exchange
Nov 09
As one of the largest and most trusted digital asset platforms globally, we are empowering people to experience the life-changing potential of crypto.
Trusted by over 8 million consumer and pro traders, institutions, and authorities worldwide - our unique combination of products, services, and global expertise is helping tip the scales towards mass crypto adoption.
But we’re only just getting started. We want to be pioneers in crypto and add value to the everyday lives of billions.
Kraken is backed by investors including, Money Partners Group, Hummingbird Ventures, Blockchain Capital, and Digital Currency Group, among others.
"We are empowering people to live simply, efficiently and more connected to others. We put our clients’ best interests first and foremost. They are at the heart of our company and drive everything we do. We believe in having a laser focus when pursuing our strategic goals and participate only in markets where we can make a significant contribution. We believe in complete transparency, deep collaboration and we never forget that people come first. Having this mindset allows us to grow and advance at a rate which others cannot."​ Jesse Powell, CEO of Kraken
Did you know that Kraken is now over 2,500+ fully-remote people from around the world? And we're hiring hundreds more in 2022 #werehiring #cryptocareers
Building the Future of Crypto Our Krakenites are a world-class team with crypto conviction, united by our desire to discover and unlock the potential of crypto and blockchain technology.What makes us different? Kraken is a mission-focused company rooted in crypto values. As a Krakenite, you’ll join us on our mission to accelerate the global adoption of crypto, so that everyone can achieve financial freedom and inclusion. For over a decade, Kraken’s focus on our mission and crypto ethos has attracted many of the most talented crypto experts in the world.Before you apply, please read the Kraken Culture page to learn more about our internal culture, values, and mission.As a fully remote company, we have Krakenites in 60+ countries who speak over 50 languages. Krakenites are industry pioneers who develop premium crypto products for experienced traders, institutions, and newcomers to the space. Kraken is committed to industry-leading security, crypto education, and world-class client support through our products like Kraken Pro, Kraken NFT, and Cryptowatch.Become a Krakenite and build the future of crypto!Proof of workThe teamJoin our Data Infrastructure team and play a pivotal role in upholding the reliability, scalability, and efficiency of our robust Data platform. As a Senior Site Reliability Engineer (SRE) specialized in Data Infrastructure, you will collaborate closely with diverse cross-functional teams to conceive, execute, and oversee the foundational data infrastructure that empowers our array of applications and services.As a key member of our Data Infrastructure team, you will be at the forefront of ensuring the unfaltering availability and performance of our platform. Your profound proficiency in cloud technologies, infrastructure as code, automation, monitoring/alerting, logging, user and machine AuthNZ, and certificate management will be instrumental in upholding the exceptional operational standards we set for our services.This role is destined to candidates based in the Americas.
The Opportunity
- Architect and implement data infrastructure solutions (self service)Â that support the needs of 10+ business units and over 100 engineering and data analysts
- Utilize Infrastructure as Code (IaC) principles to design, provision, and manage both on-premises and cloud (AWS) infrastructure components using tools such as Terraform
- Collaborate with teams to ensure seamless integration of data-related services with existing systems.
- Develop and maintain automation scripts using bash/shell scripting and to automate operational tasks and deployments.
- Enhance and manage CI/CD pipelines to facilitate consistent software deployments across the data infrastructure.
- Enable engineering self-service under tight security requirements using ChatOps and GitOps methodologies
- Implement robust data monitoring and alerting solutions to proactively detect anomalies and performance issues.
- Manage user and machine authentication and authorization mechanisms to ensure secure access to data and resources.
- Evangelize and implement role-based access control (RBAC) and permissions for a multitude of user groups and machine workflows across different environments
- Design and deploy MLOps platforms, using AWS Sagemaker and GitOps methodologies.
- Manage and maintain real-time streaming data architecture using technologies like Kafka and Debezium Change Data Capture (CDC).
- Ensure the timely and accurate processing of streaming data, enabling data analysts and engineers to gain insights from up-to-date information.
- Utilize Kubernetes to manage containerized applications within the data infrastructure, ensuring efficient deployment, scaling, and orchestration.
- Implement effective incident response procedures and participate in on-call rotations.
- Troubleshoot and resolve incidents promptly to minimize downtime and impact.
- Collaborate with data analysts, engineers, and cross-functional teams to understand requirements and implement appropriate solutions.
- Document architecture, processes, and best practices to enable knowledge sharing and support continuous improvement.
- Enable environments for ML experimentation
- Create and manage MLOps flows for training, validation and deployment of models
- Implement efficient, reproducible production deployment of ML models for inference
Skills you should HODL
- Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
- Proven experience (5+ years) working as a Site Reliability Engineer, Infrastructure Engineer, or similar roles, with a focus on data infrastructure and security.
- Experience with real-time data processing technologies, such as Kafka and Debezium
- Strong expertise in cloud technologies, particularly AWS and (HashiCorp nice to have).
- Proficiency in Infrastructure as Code tools such as Terraform and Atlantis.
- Experience with containerization and orchestration tools, particularly Kubernetes.
- Solid understanding of bash/shell scripting and proficiency in at least one programming language.
- Familiarity with CI/CD deployment pipelines and related tools.
- Knowledge of HashiCorp products like Vault, Nomad, and Consul is a plus.
- Strong problem-solving skills and the ability to troubleshoot complex systems.
- Expertise in zero-trust architecture and service meshes is a plus
- Experience with data-related technologies (databases, airflow, data warehousing, data lakes) is a plus.
Listed in: Web3 Jobs, Remote Crypto Jobs, Security Web3 Jobs, Developer Crypto Jobs, Engineering Web3 Jobs, Exchange Web3 Jobs, Senior Crypto Jobs, NFT Web3 Jobs, Data Crypto Jobs, Sre Web3 Jobs, Kubernetes Crypto Jobs, Full Time Web3 Jobs.