Loading…
Wednesday, October 9 • 11:30am - 11:55am
PRO FEATURED SESSION (AI): Distributed Training Platform At Facebook

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Large scale distributed training has become an essential element to scaling the productivity for ML engineers.  Today, ML models are getting larger and more complex in terms of compute and memory requirements.  The amount of data we train on at Facebook is huge.  In this talk, we will learn about the Distributed Training Platform to support large scale data and model parallelism.   We will touch base on Distributed Training support for PyTorch and how we are offering a flexible training platform for ML engineers to increase their productivity at facebook scale.

AI DevWorld 2019 Speakers
avatar for Kutta Srinivasan

Kutta Srinivasan

Technical Lead for AI Infrastructure, Facebook
Kutta is a Technical Lead for AI Infrastructure at Facebook, supporting Machine Learning efforts across the company with a focus on the end-to-end platform for ML Training. The team's mission is to enable product groups to scale ML / Deep Learning systems to solve problems in ranking... Read More →
avatar for Mohamed Fawzy

Mohamed Fawzy

Senior Software Engineering Manager - Facebook AI Infrastructure, Facebook
Mohamed Fawzy leads the Distributed AI Group at Facebook. The Distributed AI group is focused on scaling and product-ionizing machine learning training at facebook scale by building platform and infrastructure to support large scale distributed training on heterogeneous hardware... Read More →


Wednesday October 9, 2019 11:30am - 11:55am PDT
AI DevWorld -- Main Stage Theater