Research Scientist Intern, PyTorch Distributed (PhD)

Meta•Apply•Published 11 hours ago•First seen 11 hours ago

Description

Meta is seeking a Research Scientist Intern to join our Meta PyTorch Distributed Team. Our team’s mission is to make PyTorch faster and easier to use in order to create and maintain a state-of-the-art machine learning framework that is used across Meta and the entire industry. The key challenges in the team are composing multiple distributed training features to support growing model complexity, jointly optimizing computation and communication to maximize hardware utilization, and automating parallelizations to boost usability. Our internships are twelve (12) to twenty-four (24) weeks long and we have various start dates throughout the year.

Responsibilities

Apply relevant AI and machine learning techniques to advance the state-of-the-art in machine learning frameworks. Collaborate with users of PyTorch to enable new use cases for the framework both inside and outside Meta. Develop novel, accurate AI algorithms and advanced systems for large scale distributed training and inference. Leverage graph-based and compiler-based technologies to optimize distributed training and distributed inference use-cases

Qualifications

Currently has, or is in the process of obtaining, PhD degree in the field of Computer Science or a related STEM field Experience in one or more of the following machine learning/deep learning domains: Large scale training and inference ML Systems Research, ML theory: Basic knowledge about ML models in different modalities like LLM (Large Language Models), Vision (VITS, MVITS) and Multimodal and how scale impacts performance, ML systems: AI infrastructure, machine learning accelerators, high performance computing, machine learning compilers, GPU architecture, machine learning frameworks, distributed systems, on-device optimization Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment Experience or knowledge on training models at scale using PyTorch/TensorFlow/JAX Experience or knowledge on working with a distributed GPU cluster Intent to return to degree program after the completion of the internship/co-op Proven track record of achieving significant results as demonstrated by grants, fellowships, patents, as well as first-authored publications at leading workshops or conferences such as NeurIPS, MLSys, ASPLOS, PLDI, CGO, PACT, ICML, or similar Experience working and communicating cross functionally in a team environment

Compensation: $7,650/month to $12,134/month + benefits