BASELINE • April 2021

An ML Newsletter from Novetta

Welcome to the April 2021 BASELINE, Novetta’s Machine Learning Newsletter, where we share thoughts on important advances in machine learning technologies. This month we cover the following topics:

  • Pre-training of Vision Transformers without natural images
  • Teaching a robot to walk using Reinforcement Learning algorithm trained in simulation
  • Querying databases for human-readable summaries
  • Extracting human-interpretable reports from log files

Vision Transformer Pre-Training without Natural Images

Pre-training models for computer vision applications lowered the bar for the amount of data needed to fine-tune a successful vision model. Recently, the same transformer models that helped revolutionize natural language processing have been applied to computer vision tasks with the use of Vision Transformers (ViT). Most ViT models rely heavily on large, natural image datasets for pre-training. Many times this also involves some level of human annotation which takes time and can lead to biases or incorrect labels. In this paper, researchers instead pretrained a ViT using generated fractal images (see picture) whose labels are determined via mathematical formulas. The pretrained ViT based on fractal images achieved similar accuracy to those trained using ImageNet, and surpassed self-supervised training methods such as SimCLRv2 and MoCov2. One of the main benefits of removing the need for large, natural image datasets is the ethical concerns that have arisen from their use. For example, the large Instagram-3.5B dataset brings up privacy concerns. Additionally, these uncurated datasets do nothing to correct for biases that might be present in human-related imagery. The ability to train a ViT without natural images also removes the need to perform time-consuming human annotations. This approach thus improves model ethics and lowers pre-training costs without sacrificing performance.

Reinforcement Learning Algorithm from UC Berkeley Crosses the Sim2Reality Gap

Teaching a robot to perform a task involves training a reinforcement learning (RL) algorithm which acts as a controller for the robotic system. When done in the physical realm, this process can be expensive and potentially dangerous as the robot’s actions in the early stages are unpredictable. Thus, researchers will usually train the RL algorithm in a simulated environment. The downside of using a simulation is that there is a gap between the training environment and the real world, known in robotics as the Sim2Reality gap. Therefore, the challenge is to develop training mechanisms that will allow skills learned in the simulation to be transferred to real systems. Researchers at UC Berkeley have recently developed an RL controller for bipedal locomotion which has a demonstrated ability to cross the Sim2Reality gap. They trained the system to walk, turn, and squat in a simulated environment, using the Hybrid Zero Dynamics gait library to increase skill diversity and domain randomization techniques, which add robustness to the training environment. When the software was transferred to a bipedal robot (‘Cassie’ by Agility Robotics), this training method showed better performance, increased robustness, and more sophisticated recoveries when compared to a baseline walking controller. This advancement helps provide a framework that others can use to better bridge the gap between simulated training environments and real-life robotics systems.

Below is a demonstration of the controller’s ability to actuate the Cassie robot in real life, using the skills it gained from being trained in simulated environments.

Watch original video for full demonstration.

Let your database write it’s own report

Despite the success of OpenAI’s GPT-2 language model in text generation tasks, generating text based on structured data (such as summaries of database entries) has remained a challenge. The Amazon Alexa AI team has developed a two-step process called DataTuner in order to solve this problem. First, input text data is given extra structure in the form of tags representing subjects, predicates, and objects. This fuels GPT-2 for text generation where newly generated text is compared to the original input and assigned a score for completeness. In the second step, outputs are ranked to determine if any information has been added, repeated, or omitted during the process. A semantic classifier reranks the newly generated text based on an accuracy score for a final result. At the 2020 International Conference on Computational Linguistics (COLING), DataTuner was evaluated by human annotators and was found to exceed human-written text in some cases. With new data-to-text processes like DataTuner, analysts can feasibly query databases for human readable summaries, lowering the barrier to extracting value from highly-structured datastores.

The picture below is an example of a knowledge graph where the relationships have been translated via the DataTuner model to a snippet of generated text.

Computer Logs: Use GPT-3 to get to the point already

Many network analysts have probably found themselves scrolling through what can feel like endless log files trying to determine the cause of a program or system error. A company that focuses on root cause analysis, has used OpenAI’s GPT-3 language model to read through computer generated log events and automatically generate human-readable reports, summarizing the root cause of an issue. The model scans the Internet for discussions of similar incidents and extracts human readable descriptions. While the model sometimes restates key log events and is less accurate for topics that lack discussion in the public domain, it is frequently able to write human interpretable descriptions for most common issues. As seen in this month’s article on DataTurner (above), powerful language models are starting to become intermediaries between humans and computers. In addition to saving time wrangling log events, it stands to reason that similar models could eventually help in the areas of cyber security, warning of vulnerabilities before they are exploited.

GPT-3 generated summaries describing various issues after analyzing log files.

This research was performed under the Novetta Machine Learning Center of Excellence.


Mady Fredriksz
Jefferson Ridgeway
Brian Sacash
Shauna Revay, PhD