Storage Infrastructure Engineer, Machine Learning
Hydra is the next generation storage system for ML post-training data. The system is currently used mainly by Gemini and growing rapidly. It powers major Gemini tools such as Gemini Data Studio, Gemini Atlas Leaderboard, and critical OneRecipe/Gemini researcher workflows.
Gemini and Google AI research are evolving very fast, which translates to exponentially increasing data sizes and rapidly evolving user requirements. We are working on improving the performance and reliability of the existing Critical User Journeys (CUJs) to speed up Gemini AI research, supporting new features to meet new AI research requirements, refining Hydra APIs and data models to allow rapidly increasing data sizes, designing the system in extensible ways for faster introduction of new features and reduction in development churn, etc.The US base salary range for this full-time position is $141,000-$202,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.
Gemini and Google AI research are evolving very fast, which translates to exponentially increasing data sizes and rapidly evolving user requirements. We are working on improving the performance and reliability of the existing Critical User Journeys (CUJs) to speed up Gemini AI research, supporting new features to meet new AI research requirements, refining Hydra APIs and data models to allow rapidly increasing data sizes, designing the system in extensible ways for faster introduction of new features and reduction in development churn, etc.The US base salary range for this full-time position is $141,000-$202,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.
Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.
Responsibilities
- Improve Hydra’s read/query/write patterns throughout its storage layer and serving stack. This includes analyzing Spanner hotspotting/lock contention/resource usage/issues, improving Hydra’s Spanner/Lightning schema design, improving Hydra’s Spanner/F1 queries, supporting new features in F1 Query (e.g., full-text search predicate pushdown) for Hydra’s use cases, etc.
- Implement data isolation models to allow Hydra to be used by all Google Product Areas for their post training needs.
- Collaborate with various DeepMind/CoreML/CloudML/Search teams and AI researchers to gather requirements for feature development and performance optimization.
- Implement new features to meet rapidly evolving Google AI research needs. For instance, Hydra is currently working on better supporting ML model eval, LLM distillation, semantic search, etc.
- Improve extensibility of the system by leveraging technologies such as Spanner remote Universal Dashboard Framework (UDF).
Minimum qualifications:
- Bachelor’s degree or equivalent practical experience.
- 2 years of experience with coding in C++, Python and g3Python.
Preferred qualifications:
- Master's degree or PhD in Computer Science, or a related technical field.
- 2 years of experience in software architecture.
- 2 years of experience in customer engagement.
- 2 years of experience in project execution.