BASELINE • September 2020

An ML Newsletter from Novetta

Welcome to the September 2020 BASELINE, Novetta’s Machine Learning Newsletter, where we share thoughts on important advances in machine learning technologies. This month we cover the following topics:

  • Smaller NLP models
  • Big developments for major hardware and cloud companies
  • New model deployment options for AWS

Better, faster, smaller NLP models

As new Natural Language Processing (NLP) models boast performance gains over their predecessors, models continue to get larger. OpenAI’s GPT-3, empirically the current leader in NLP models, is comprised of 175 billion parameters, surpassing Microsoft’s T-NLG model (17.5 billion) and Google’s famous BERT model (340 million). Because such extremely large models are inaccessible to most users, Google developed PRADO, a model architecture that uses fewer than 200K parameters to achieve state-of-the-art performance on a variety of NLP tasks. Google has now expanded on this concept with pQRNN, further increasing model performance while reducing model size.

PRADO and pQRNN are unique in the ways that they generate word token representations on their own and optimize storage of contextual knowledge within the model. This results in models that are better fit to training data and associated tasks, achieving state-of-the-art performance while reducing model size by orders of magnitude. With PRADO being recently open sourced, state-of-the-art NLP performance for internet-of-things (IoT) devices and network constrained environments is now a very real possibility.

NVIDIA’s acquisition of Arm 

If successful, NVIDIA’s intention to acquire Arm would combine two of the largest companies in computing, bringing NVIDIA to fields beyond AI and GPUs. While NVIDIA is a leader in AI training, Arm’s specialty is in edge and IoT computing. It’s hard to remember, but NVIDIA’s dominance in AI is a relatively recent development. By combining forces with Arm, NVIDIA now has a pathway into IoT and mobile computing. If NVIDIA is able to achieve similar success in edge and IoT it could have a major impact on the use of deep learning models in applications with limited computing power.

AWS Inferentia Expanded

Launched at AWS re:Invent 2019, AWS Inferentia is a high performance chip designed for machine learning models after they have been trained, providing a cost effective way to deploy machine learning models. AWS Inferentia has expanded to a handful of new regions this month, including US-East and US-West, bringing the capability closer to “home” for many. AWS Inferentia instances are a sensible alternative to more costly GPU instances when keeping high-performance models running in production.

This research was performed under the Novetta Machine Learning Center of Excellence.


Authors:

Brandon Dubbs
Brian Sacash