ML Compiler and Performance Engineer

MetaApplyPublished 1 days agoFirst seen 23 hours ago
Apply

Description

Meta's Training and Inference Accelerators (MTIA) team is developing novel HW to enable efficient execution of AI training and inference workloads. In this role, you will have end-to-end responsibility for the performance of in-production AI models in their transition from stock HW to MTIA chips, with a focus on models that require multi-node compute. To learn more about MTIA, explore the links below: - https://ai.meta.com/blog/meta-training-inference-accelerator-AI-MTIA/ - https://ai.meta.com/blog/next-generation-meta-training-inference-accelerator-AI-MTIA/ - https://dl.acm.org/doi/full/10.1145/3695053.3731409

Responsibilities

Identifying bottlenecks and quantifying opportunities for improving performance In-depth, end-to-end performance analysis and reporting Developing optimizations to address identified bottlenecks Optimizing compute/communication overlap Work closely with other compiler teams as well as client teams (Recommendation Systems, Generative AI, etc)

Qualifications

Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience Experience developing and deploying optimizations at the level of PyTorch/Aten or comparable stacks A Masters degree and 4+ years in-domain experience A PhD degree and 2+ years in-domain experience Experience optimizing multi-node distributed compute Experience optimizing runtimes and/or kernels for accelerator platforms

Compensation: $74.04/hour to $217,000/year + bonus + equity + benefits