Serving BERT Models in Production with TorchServe
Speakers: Adway Dhillon, Nidhin Pattaniyil
Summary
This talk is for a data scientist or ML engineer looking to serve their PyTorch models in production.
It will cover post training steps that should be taken to optimize the model such as quantization and torch script.
It will also walk the user in packaging and serving the model through Facebook’s TorchServe.
Description
Intro (10 mins).
Introduce the deep learning BERT model.
Walk over the notebooks on Google Collab Setup.
Show the end model served along with sample inference.
Review Some Deep Learning Concepts (10 mins) - Review sample trained PyTorch model code - Review sample model transformer architecture - Tokenization / pre and post processing
Optimizing the model (30 mins) - Two modes of PyTorch: eager vs script mode
Benefits of script mode and PyTorch JIT - Post training optimization methods: static and dynamic quantization, distillation - Hands on: - Quantizing model - Converting the Bert model with torch script
Deploying the model (30 mins) - Overview of deployment options : Pure flask app vs model servers like Torch Serve / TF-Serving - Benefits of Torch Serve: high performance serving, multi model serving, model version for A/B testing, server side batching, support for pre and post processing - Exploring the built in model handlers and how to write your own - Managing the model through management api - Exploring built and custom metrics provided by Torch Serve - Hands on : - Package the given model using Torch Model Archive - Write a custom handler to support pre processing and post processing
Lessons Learned: (10min) - share some performance benchmarks of model served at Walmart Search - future next steps
Q&A (5 mins)
Adway Dhillon's Bio
Software and Machine Learning Engineer at Walmart Labs
GitHub: https://github.com/adwaydhillon/
LinkedIn: / adwaydhillon
Nidhin Pattaniyil's Bio
Senior Machine Learning Engineer at Walmart Labs
GitHub: https://github.com/npatta01/
Twitter: / npatta01
LinkedIn: / nidhinpattaniyil
Website: https://npatta01.github.io//
PyData Global 2021
Website: https://pydata.org/global2021/
LinkedIn: / pydata-global
Twitter: / pydata
www.pydata.org
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.
00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.
Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVi...
Смотрите видео Serving BERT Models in Production with TorchServe | PyData Global 2021 онлайн, длительностью часов минут секунд в хорошем качестве, которое загружено на канал PyData 20 Январь 2022. Делитесь ссылкой на видео в социальных сетях, чтобы ваши подписчики и друзья так же посмотрели это видео. Данный видеоклип посмотрели 3,520 раз и оно понравилось 50 посетителям.