Staff Software Engineer (Site Reliability Engineering)
As the world's leader in digital payments technology, Visa's mission is to connect the world through the most innovative, reliable and secure payment network - enabling individuals, businesses, and economies to thrive. Our advanced global processing network, VisaNet, provides secure and reliable payments around the world, and is capable of handling more than 65,000 transaction messages a second. The company's dedication to innovation drives the rapid growth of connected commerce on any device, and fuels the dream of a cashless future for everyone, everywhere. As the world moves from analog to digital, Visa is applying our brand, products, people, network and scale to reshape the future of commerce.
At Visa, your individuality fits right in. Working here gives you an opportunity to impact the world, invest in your career growth, and be part of an inclusive and diverse workplace. We are a global team of disruptors, trailblazers, innovators and risk-takers who are helping drive economic growth in even the most remote parts of the world, creatively moving the industry forward, and doing meaningful work that brings financial literacy and digital commerce to millions of unbanked and underserved consumers.
You're an Individual. We're the team for you. Together, let's transform the way the world pays.
Due to the COVID-19 pandemic and the evolving visa/travel restrictions in place, we are currently only able to extend offers to candidates with the right to work in Singapore. We are keeping the situation under close review and will adjust accordingly should the restrictive measures be lifted.
Digital and Developer platform group in VISA is one team that strongly works towards next-gen payments and believes in its slogan “"It's Everywhere You Want to Be," for making payments accessible everywhere and for everyone. This group innovates technology that improves the lives of millions of people around the world for the payment ecosystem. The desired candidate will be part of this journey of our team and will be contributing to achieve the same. This role is in Site Reliability Engineering (SRE) team which focusses on the digital products from reliability, availability, performance and efficiency perspective.
- Understand the end-to-end product topology from infrastructure and application perspective. Identify risks early on and ensure they are addressed before they become actual problems, whenever possible.
- Run the production environment by monitoring availability and taking a holistic view of system health.
- Measure and optimize performance, and solve issues across the entire stack: hardware, software, application, and network.
- Identify parts of the system that do not scale or are instable, provide alleviating measures and drive long term resolution of these problems.
- Becoming SME on DDP products and analyzing complex systems from a reliability and resilience perspective.
- Engage with stakeholders to regularly interact and discuss the roadmaps and robust supportability aspects. Should be able to drive agenda for better operational functioning and understand agile way of performing tasks and initiatives.
- Represent the SRE organization in design reviews and operational readiness exercises for new and existing services. Performing code bug fixes in production and recommending any architectural improvements during issue/incident analysis.
- Actively look for opportunities to improve the availability, reliability, and performance of the system by applying the learnings from monitoring and observation
- Design and implement creative solutions to operations problems, incidents, or outages such that these problems remain fixed and, as a result, driving down the burden of toil.
- Providing technical assistance to perform and run blameless root cause analyses on incidents and outages aggressively looking for answers that will prevent the incident from ever happening again
- Provide Level 3 on-call support (within working hours only, over weekends once in a quarter)
- Spread SRE culture, create standard SRE documentation and report templates. Provide guidance and technical expertise to junior team members and encourage the learning culture within the group and fostering innovation
- 5 or more years of relevant work experience working in in Java based large scale and highly available environment
- with a Bachelor’s Degree or at least 2 years of work experience with an Advanced degree (e.g. Masters, MBA, JD, MD) or 0 years of work experience with a PhD in in Computer Science or other technology field
- Experience supporting production Windows and/or Linux environments, including process management, user management, distilling log files, and debugging performance issues
- Ability to develop tools and scripts to support automation need.
- 6 or more years of relevant work experience working in in Java based large scale and highly available environment with a Bachelor’s Degree or at least 4 years of work experience with an Advanced degree (e.g. Masters, MBA, JD, MD) or 3 years of work experience with a PhD in in Computer Science or other technology field
- 5+ years of development experience with Java, SQL, Automation, bug fixing, handling Production & Application operations
- Experience working with any log analysis tools and observability applications like Grafana, Tableau, Splunk.
- Excellent knowledge of Docker and Kubernetes, including design, build and maintenance of k8s environments.
- Knowledge of one or more platforms like Kafka, NginX, CDN, Redis/Hazelcast, Middleware software, Elastic, various SQL and noSQL database platforms is also expected
- Good working knowledge of TCP/IP, routing, and data centers.
- Linux systems engineering capabilities and network analysis expertise are great to have.
- Strong work ethic, leadership skills, excellent judgment and good time management in prioritizing work, and the ability to work in fast paced, team-oriented environment.
- Outstanding analytical, problem-solving skills and willingness to investigate complex problems, a proactive approach to spotting problems, areas for improvement, and performance bottlenecks
- Strong critical and strategic thinking skills to handle both the big picture and crucial technical decisions
- Ability to read and understand production code in any language so that you have a deeper understanding of our technology and ways to optimize it
- Experience in designing, integrating, developing web services and REST/JSON APIs.
- Knowledge in Java or related technologies would aid in bug fixing, understanding the products supported better and to support integration related issues.
- Need to have an excellent systems and product architecture understanding from application components and infrastructure perspective such as network, load balancer, firewall, gateway services etc.
- Experience supporting and working on web and mobile applications and troubleshooting problems in a cross-functional environment.
- Strong collaboration skills and ability to take ownership of problems when navigating ambiguity
- Willingness to work outside with a strong technical aptitude and excellent communication skills.
Our engineers do more than just write and test code:
- We count on your curiosity and creativity, to want to understand customer requirements and our processes and want to come up with creative solutions.
- While you’ll have the skill to see and understand the big picture, you’re able to stay focused on the task at hand to achieve immediate goals.
- You’re great at systematic and accurate research wanting to uncover the smallest detail.
- You have amazing work ethics that will help us all work extremely well together.
- You have the passion to understand people and always strive harder to improve our products and services!
- You have excellent interpersonal skills and above all, you are a team player!
Additional Information -
Work Hours This position requires the incumbent to provide 6 hours of on-call support during weekdays (not more than 2 days in a week) between 9am to 9pm on rotational basis. In either case the incumbent does not exceed the work time of 9 hours in a day. This also includes weekend on-call support which generally comes once in 2-3 months. Compensatory leaves are eligible for weekend on-call support.
Travel Requirements This position requires the incumbent to travel for work less than 10% of the time.
Mental/Physical Requirements This position will be performed in an office setting. The position will require the incumbent to sit and stand at a desk, communicate in person and by telephone, frequently operate standard office equipment, such as telephones and computers, and reach with hands and arms.
Your application has been successfully submitted.
Leading global payment solutions