Senior Site Reliability Engineer (Remote, APAC)
Shopify is a leading global commerce company, providing trusted tools to start, grow, market, and manage a retail business of any size. Shopify makes commerce better for everyone with a platform and services that are engineered for reliability, while delivering a better shopping experience for consumers everywhere. Shopify powers millions of businesses in more than 175 countries and is trusted by brands such as Allbirds, Gymshark, PepsiCo, Staples, and many more.
The Site Reliability team is part of the Infrastructure organization that builds, operates, and improves the heart of Shopify’s technical platform, and unlocks the power of planet-scale infrastructure for all of Shopify’s merchants, buyers, and developers.
Shopify has many critical components, and sometimes they fail. Members of our Site Reliability team are the ones ensuring we can get back to normal operation as fast as possible when that happens. Site Reliability sets the foundation for building and running resilient systems at Shopify. This is a team of engineers with both in-depth operational knowledge of the entire Shopify stack, as well as strong programming fundamentals, who act as first responders and leaders during an incident.
Our goal is to drive incidents to resolution as quickly as possible, and guide teams to build a more resilient Shopify. We build whatever systems and tools are necessary to ensure Shopify is resilient, and that incident response and resolution is fast and reliable. We continuously seek out ways to automate away manual toil involved with keeping Shopify running.
Commerce happens 24/7, and we have built out a globally distributed team that can respond whenever necessary. Our team hires across 4 different regions: Asia-Pacific (APAC), North America West, North America East, and Europe, the Middle East, and Africa (EMEA), in a follow-the-sun support model that provides 24/7 coverage for incident management.
This is a remote position available in Australia, Japan, and Singapore.
Shopify is now permanently remote and working towards a future that is digital by default. Learn more about what this can mean for you.
What we can offer you:
- The opportunity to run Shopify’s planet-scale systems by enabling engineering teams to create resilient systems.
- Work focusing on a unique set of interesting and challenging problems that can’t be easily found elsewhere.
- The flexibility to define what resiliency and site reliability engineering mean for Shopify.
- The means to grow the capacity of our worldwide distributed site reliability engineering teams, and consult with other engineering groups on how to build low-latency, highly resilient systems.
- A direct impact on our millions of merchants’ ability to generate revenue for their livelihood, their families, and their employees through the business they’ve built from the ground up on our platform.
You’ll work on things like:
- Collaborating with high-caliber engineering teams across Shopify to help them create resilient systems.
- Acting as a force multiplier across and within engineering departments.
- Managing ongoing incidents, using your understanding of Shopify to involve the right teams, and to resolve issues as quickly as possible.
- Cleaning up the noise in our signals, ensuring we can get an understanding of our systems and debug problems easily.
- Responding to automated alerts and executing playbooks.
- Setting standards with teams for building resilient, debuggable systems.
- Ensuring we never fail for the same reason twice.
- Following up on each meaningful incident to learn and to extract appropriate action items so teams know what to do next.
- Helping teams build tools to automate the toil of on-call duties.
- You are based in Australia, Japan, or Singapore.
- Experience handling multiple on-call shifts for mission-critical systems, and responsibility for the tools and processes used to debug and correct failures.
- You've navigated more than one incident through to the retrospective process.
- You know what good observability looks like, but more importantly, how to get there.
- Strong programming fundamentals—ideally in a variety of languages—primarily in backend software development.
- Comfort with hands-on development, navigating through multiple programming languages, digging deep in the stack, and using cloud infrastructure (for example, Google Cloud Platform, Amazon Web Services, Azure, Kubernetes, Docker).
- Experience with mentorship and helping teammates level up their craft and technical skills.
- You understand the meaning of continuous improvement and evolving systems.
- You reject the idea that on-call rotations have to be a terrible, disruptive experience.
- You understand how to improve difficult situations through short and iterative projects.
- A commitment and drive for quality, technical excellence and results.
- If you don’t know all this stuff, don’t worry, we’ll teach you!
- Experience working with a variety of open-source software, including Nginx, Redis, Memcached and MySQL.
- Familiarity with network and web protocols, from IP to HTTP.
Our belief is that a strong commitment to diversity & inclusion enables us to truly make commerce better for everyone. We encourage applications from Indigenous peoples, racialized people, people with disabilities, people from gender and sexually diverse communities, and/or people with intersectional identities. Please take a look at our Sustainability Reports to learn more about Shopify’s commitments to our communities, and our planet.
At Shopify, we understand that experience comes in many forms. We’re dedicated to adding new perspectives to the team - so if your experience is this close to what we’re looking for, please consider applying.
Your application has been successfully submitted.
Anyone, anywhere, can start a business