BASELINE • November 2021

Newsletter from Novetta

Welcome to the November 2021 BASELINE, Novetta’s Machine Learning Newsletter, where we share thoughts on important advances in machine learning technologies. This month we cover the following topics:

  • Solving math word problems
  • Training machine learning models on any GPU
  • A hierarchical transformer model that improves efficiency
  • AI Ethics: Algorithmic amplification of political content on Twitter

Solving Math Word Problems with Verification Models

Large language models have proven to be successful at learning many tasks involving natural language, but they still struggle to reliably perform well on grade school mathematical word problems, which often require multiple steps. One reason for this is that a single error in a multi-step problem can lead to a cascading effect where solutions veer off course with no means of correction. Researchers at OpenAI have proposed training verifier models in tandem with models that generate solutions as verification is a simpler task than solution generation in general. Verifiers are trained to output the probability that a given candidate solution is correct on the basis of whether or not the solution reaches the correct final result. The researchers showed that the use of verifiers significantly increased efficiency (equivalent to increasing model size by 30x). An example math problem is shown in the figure below. This methodology shows the promise of using deep learning methods for more complex reasoning tasks and that the use of verification models is one way to boost performance and scalability of models.

Figure: An example math problem on the left, and a solution that the model generated and evaluated for correctness on the right (source).

Training Machine Learning Models on Non-NVIDIA GPU

Traditionally, NVIDIA’s GPUs have been the hardware of choice for training deep learning models. This is largely due to NVIDIA’s proprietary Compute Unified Device Architecture (CUDA) platform that directly works with Tensorflow and PyTorch. However, non-NVIDIA graphics cards are getting more appealing for ML researchers with Microsoft’s release of PyTorch-DirectML, a package which allows training on any GPU that supports DirectX12. Using PyTorch-DirectML only requires one change to existing PyTorch code after importing the library, as shown below. Without such libraries, training models on GPUs requires a lot of time and subject matter expertise. DirectML expands the toolkit of ML researchers, enabling them to more easily train models on a wider variety of hardware.

Figure: The only change that needs to be made from regular PyTorch code is by adding “”.

Hourglass: A New Transformer Model Optimized for Efficiency

Transformer architectures have become state-of-the-art in natural language processing (NLP), however, this superior performance comes with a large computational cost since the amount of time and memory required grows quadratically with the length of the input. Researchers from OpenAI, Google Research, and the University of Warsaw recently developed Hourglass, a hierarchical transformer model, to address this problem. They tried several combinations of upsampling and downsampling traditional transformer activation layers in order to create a hierarchical architecture that would match the performance of state-of-the-art transformer models while lowering the computational cost. The resulting model performed well on both language modeling and image generation tasks, demonstrating its success across different domains. Improving the tradeoff between efficiency and accuracy increases the accessibility and application of large NLP and computer vision models in production. In this way, Hourglass paves the way for using transformer architectures more efficiently.

Figure: A comparison of maximum memory used while training the Transformer-XL and Hourglass transformer models on the Enwik8 dataset

AI Ethics Feature
Algorithmic Amplification of Political Content on Twitter

There has recently been debate surrounding the role of social media in shaping culture and politics as increased polarization has been observed on popular platforms. Twitter recently published a study looking at whether their recommendation algorithms amplify political content on users’ feeds. On Twitter, users have access to a home feed and a timeline. Their home feed displays a stream of tweets based on accounts they follow and interact with. The content that appears on the home feed is algorithmically determined, while tweets displayed on their timelines are not. The study looked at algorithmic amplification across users’ feeds between April 1st and August 15th 2020, in seven countries (Canada, France, Germany, Japan, Spain, the UK, and the US). They examined whether tweets from elected officials and news outlets were differentially amplified on users’ home feeds versus their timelines, and whether this amplification was biased toward a particular political party. The study found that tweets containing political content were amplified by their algorithm, regardless of party association. In all countries but Germany, they observed that their algorithm further amplified tweets from the political right more than the political left. Twitter’s goal in publishing these findings is not to fight algorithmic amplification, which they state is itself not problematic. Instead, their intention is to open a discussion on how best to eliminate algorithmic bias and promote equal amplification of content from both sides of the political spectrum. This is especially relevant for machine learning engineers because it demonstrates how the algorithms they design can have unintended effects with far-reaching impact.

This research was performed under the Novetta Machine Learning Center of Excellence.

Shauna Revay, PhD
Brian Sacash
Carlos Martinez