Accelerating ML Inference at Scale with ONNX, Triton and Seldon | PyData Global 2021

Accelerating ML Inference at Scale with ONNX, Triton and Seldon Speaker: Alejandro Saucedo Summary Identifying the right tools for high performant production machine learning may be overwhelming as the ecosystem continues to grow at break-neck speed. In this session showcase how practitioners can optimize & productionise ML models in scalable ecosystems without having to deal with the underlying infrastructure. We’ll be optimizing a GPT-2 with ONNX and deploying to Triton using Seldon & Tempo. Description Identifying the right tools for high performant production machine learning may be overwhelming as the ecosystem continues to grow at break-neck speed. In this session we aim to provide a hands-on guide on how practitioners can productionise optimized machine learning models in scalable ecosystems using production-ready open source tools & frameworks. We will dive into a practical use-case, deploying the renowned GPT-2 NLP machine learning model using the Tempo SDK, whi

4 views