Research Co-Op intern

Dell•Apply•Published 1 days ago•First seen 2 hours ago

What you’ll achieve

As a Research Intern at Dell Technologies Office of CTO (OCTO), you will be part of a world class research team delving deep into software and hardware aspects of next generation Generative AI techniques, working on developing highly performance systems that push the state-of-art. For this specific project you will use your deep expertise in machine learning based analytics to develop characterization and performance forecasting techniques for high end graphical processing units (GPUs) when used for inference serving at high levels of concurrency. You will be responsible for developing techniques that use of AI/ML approaches to estimate GPU performance for inferencing tasks based on use of state of art (SOTA) Gen AI models. Specifically, you will be working with the AI Techniques Research team on the Future of AI Inferencing project.

You will:

Design, evaluate and optimize advanced inference serving frameworks using proprietary or open-source inference serving engines using transformers, multimodal encoders, diffusion models etc. emphasizing high inference efficiency and user SLO attainment.
Explore and apply model optimization techniques such as quantization, distillation, LoRA based adapters to meet inference performance and scalability targets.
Use your expertise to model the performance of various high-performance GPUs that are used for inference serving for several hundreds to thousands of concurrent users.
Work with a team that believes in fostering growth and learning and encourages innovation and excellence in research.

Take the first step towards your dream career. Every Dell Technologies team member brings something unique to the table. Here’s what we are looking for with this role:

Essential Requirements:

Pursuing Ph.D or Masters degree in Computer Science or related field with a focus on AI/ML techniques applied to LLM training and inference.
Proficiency in machine learning techniques with experience applying various techniques to real-world problems.
Experience with Generative AI approaches especially architectural aspects for transformer based LLMs, multimodal models and SOTA architectures such as MoE.
Solid understanding of inference serving optimization approaches for KV Cache management, prefill optimization, batching techniques etc.
In-depth knowledge of one or more inference serving engines such as Tensor-RT LLM, vLLM, SGLang or comparable.

Desirable Requirements

Good grasp of architecture of multimodal, reasoning models and ways to optimize their performance.
Familiarity with CUDA libraries and its internals for optimizing inference.

Compensation

Dell is committed to fair and equitable compensation practices. This role offers an hourly rate of $39.02/hr-$42.54/hr

The Government of Canada only makes this program available to Canadian citizens. It is not open to international students.

Here’s our story; now tell us yours

We believe that each of us has the power to make an impact. That’s why we put our team members at the center of everything we do. If you’re looking for an opportunity to grow your career with some of the best minds and most advanced tech in the industry, we’re looking for you.

Dell Technologies is a unique family of businesses that helps individuals and organizations transform how they work, live and play. Join us to build a future that works for everyone because Progress Takes All of Us.

Dell Technologies is committed to the principle of equal employment opportunity for all employees and to providing employees with a work environment free of discrimination and harassment. Read the full Equal Employment Opportunity Policy here.