Software Engineer, Infra PyTorch (PhD)

Meta•Apply•Published 2 months ago•First seen 1 hours ago

Description

This role is about developing the core PyTorch 2.0 technologies, innovating and advancing the state-of-the-art of ML compilers, and accelerating PT2 adoption through direct engagements with OSS and industry users.The PyTorch Compiler team is dedicated to making PyTorch run faster and more resource-efficient without sacrificing its flexibility and ease of use. The team is the driving force behind PT2, a step function change in PyTorch’s history that brought compiler technologies to the core of PyTorch. PT2 technologies have gained industry-wide recognition since their first release in March 2023. The team is committed to building the PT2 compiler that withstands the test of time while striving to become the #1 ML framework compiler in the industry. Our work is open source, cutting-edge, and industry leading.

Responsibilities

Develop the PT2 compiler (e.g., TorchDynamo, TorchInductor, PyTorch Distributed, PyTorch Core) Improve PyTorch performance via systematic solutions for the entire community Explore the intersection of the PyTorch compiler and PyTorch distributed Optimize Generative AI models across the stack (pre-training, fine-tuning, and inference) Collaborate with users of PyTorch to enable new use cases of PT2 technologies both inside and outside Meta

Qualifications

Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta Currently has or is in the process of obtaining a PhD degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta Research or industry experience in developing compilers, ML systems, ML accelerators, GPU performance, and similar Advanced in Python or C++ programming Experience in developing PyTorch/PT2, Triton, MLIR, JAX, XLA, TVM is a huge plus Knowledge in GPU architecture, ML accelerator performance, and developing high-performance kernels Experience in building OSS communities and extensive social media presence in the ML Sys domain Experience with training models, end-to-end model optimizations, or applying ML to systems Knowledge of communication collectives, PyTorch distributed, and parallelism Experience in developing inside other ML frameworks like Caffe2, TensorFlow, ONNX, TensorRT First-authored publications at peer-reviewed conferences (e.g. NeurIPS, MLSys, ASPLOS, PLDI, ICML, or similar)

Compensation: $58.65/hour to $181,000/year + bonus + equity + benefits