Senior Software Engineer, Cluster Management System
Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.
Cluster Management System is responsible for critical software that configures and runs Google data centers.
As part of the team, you will be responsible for analysing, designing and implementing features affecting all services at Google. Our typical projects run the scope of distributed computing, node operating system development, new features and platform support.
As a core infrastructure team our work has an impact on the entire company and makes Google engineers operational toil.
The Cluster Management team runs as a group of teams executing on various aspects of the system. Manages different challenges: machine level agents, autoscaling applications up to spatial flexibility solutions for cluster management systems.
You will focus on cluster management system prime, applying data analysis and parallel programming techniques to scale up to support workloads.
Behind everything our users see online is the architecture built by the Technical Infrastructure team to keep it running. From developing and maintaining our data centers to building the next generation of Google platforms, we make Google's product portfolio possible. We're proud to be our engineers' engineers and love voiding warranties by taking things apart so we can rebuild them. We keep our networks up and running, ensuring our users have the best and fastest experience possible.
Responsibilities
- Develop tools to measure, quantify and fine tune the production load.
- Identify and drive optimizations and improvements to cluster management system user experience and Implement, debug and enhance various Software (SW) components for efficient workloads scheduling.
- Design large-scale systems, making the right trade-offs for reliability and maintainability.
- Communicate with partners across Alphabet to gather requirements and drive adoption efforts.
- Work closely with engineers/teams, provide mentorship to junior engineers in the team.
Minimum qualifications:
- Bachelor’s degree or equivalent practical experience.
- 5 years of experience with software development in one or more programming languages.
- 3 years of experience testing, maintaining, or launching software products, and 1 year of experience with software design and architecture.
- 3 years of experience with developing large-scale infrastructure, distributed systems or networks, or experience with compute technologies, storage or hardware architecture.
- Experience developing and debugging multithreaded software applications.
- Experience refactoring, production-grade software systems.
Preferred qualifications:
- Master's degree or PhD in Computer Science or related technical field.
- 5 years of experience with data structures/algorithms.
- 1 year of experience in a technical leadership role.
- Experience coding in C/C++.
- Experience developing accessible technologies.
- Experience in concurrency, multi-threading and synchronization.