BASELINE • June 2021

An ML Newsletter from Novetta

Welcome to the June 2021 BASELINE, Novetta’s Machine Learning Newsletter, where we share thoughts on important advances in machine learning technologies. This month is the first of two special editions, as we’ve invited each member of our summer ML intern class to contribute an article. This month we cover the following topics:

  • An open-source dataset for evaluating multilingual translation systems
  • A tiny machine learning algorithm that can process real-time events on IoT devices
  • A platform that teaches reinforcement learning agents to use Android devices
  • A new approach to reinforcement learning that uses transformers

Facebook Releases the FLORES-101 Dataset for Public Use

Creating effective multilingual translation systems has been an open problem in Natural Language Processing (NLP) for data scientists and machine learning researchers. However, until now, there has not been a standard way to evaluate the effectiveness and quality of those models as accuracy among different language pairs may vary. With the release of the FLORES-101 Dataset by Facebook AI, researchers now have a better approach to evaluate their systems. The Flores-101 Dataset contains the same set of sentences in 101 different languages, with a particular focus on low-resource languages. These languages are underrepresented in most available datasets. Researchers will be able to evaluate their model’s translations with 10,100 language pairs by translating from one language to another and assessing the results. The dataset contains many other unique characteristics such as multiple domains of texts, metadata, and server-side evaluation that will help researchers assess the quality of their models and accelerate the progression of translation systems.

Big Problems Need Tiny Solutions: Introducing Tiny ML

Making sense of events in real-time from hardware sensors often depends on complex event processing (CEP), a manual and often inaccurate process of defining the patterns these systems are looking for. ML methods are able to enhance CEP by allowing for automated, incremental updates to CEP algorithms. The drawback to this approach is that these ML algorithms traditionally are large and are unable to run on low powered devices in real time. Researchers have proposed a novel method of combining ML methods with CEP in order to better detect patterns in real time on low power devices at the edge. In their application they created an algorithm, TinyOL, which specializes in online learning at the edge to detect conditions that could cause wear on industrial machines. When compared to a traditional CEP system, TinyOL saved communication costs, reduced latency to allow real-time processing, and protected privacy as data was not relayed to the cloud. Despite its small size, the system was able to detect and analyze up to two thousand events per second and was able to detect environmental conditions that caused machine breakdowns. This algorithm has the potential to be applied to many Internet of Things (IoT) challenges and can help decrease the latent period of information that many current IoT applications face. This is especially important in situations where it is necessary to detect conditions in real-time with low power consumption, such as in autonomous vehicles or tracking devices.

This figure represents the use-case this experiment was designed to solve, where environmental conditions are watched to determine the likelihood of machine breakdown.

AndroidEnv: A Reinforcement Learning Platform for Android

Reinforcement Learning (RL) is a branch of machine learning that trains agents to learn from trial and error to maximize total reward. AndroidEnv is an open-source platform created by DeepMind for RL research built on top of the Android ecosystem. RL agents interact with the environment, a simulation of an Android device, by sending localized touch and lift events in real time through a universal touchscreen interface. The goal is to train the RL agents to interact and make decisions within the OS to execute real tasks where the reward function can be based on actions such as successfully scrolling or sending an email. DeepMind showed that by using AndroidEnv, they were able to successfully browse the internet, open the YouTube app, set an alarm, or play a game. AndroidEnv will be a useful tool for RL researchers as it is a centralized platform which has an infinite range of tasks and can provide real-time feedback.

This graphic shows the RL agent calling a phone number and subsequently creating a new contact through touch and lift events and gestures.

Decision Transformer: Actions speak louder than words

Transformer-based models have revolutionized the field of natural language processing (NLP) with their ability to learn from sequential data. Now researchers have applied transformers to reinforcement learning (RL) with a new method they call the Decision Transformer. Unlike traditional RL approaches which compute optimal actions based on reward functions, Decision Transformer is able to learn patterns directly from sub-optimal sequences of actions. For example, they trained a GPT model (which is traditionally used for text generation) with the goal of finding the shortest path between two points by training just on random walks. When tested on both discrete and continuous tasks, Decision Transformer’s performance was comparable or better than other common RL methods like Temporal Difference Learning. This approach was especially effective when it came to recognizing long-term patterns, such as picking up a treasure key at the beginning of a long video game quest and using it later. The researchers further tested the model on a broader range of RL applications such as imitation learning and goal reaching. Decision Transformer performed well on these tasks, which highlights the adaptability of the model to a wide range of problems. By applying transformer-technology to RL, researchers have created a model which is more stable than traditional RL training methods and is able to strategically plan long-term actions. This has the potential to improve RL training for applications where predicting future events is important, such as logistics, AI-controlled AirForce combat systems, or stock market investment.

This figure shows the Decision Transformer’s ability to correctly predict the motion of a human-like figure compared to that of a standard neural network.

This research was performed under the Novetta Machine Learning Center of Excellence.


Authors:
Jamiel Capatayan
Kathryn Hill
Elliott Pryor
Shreya Ramesh