BASELINE • August 2021

An ML Newsletter from Novetta

Welcome to the August 2021 BASELINE, Novetta’s Machine Learning Newsletter, where we share thoughts on important advances in machine learning technologies. This month is the start of our new AI Ethics feature. Each edition we will be covering a topic in AI Ethics in addition to the latest advances in machine learning. This month we cover the following topics:

  • Two new benchmark datasets for evaluating conversational AI systems
  • A generative model that produces portraits with greater control over the features
  • A general deep learning architecture for multimodal and multitask use cases
  • Where to start with AI Ethics

Two New Datasets for Conversational NLP: TimeDial and Disfl-QA

Building conversational AI systems becomes difficult when encountering nuance in natural language. Modern systems often fail to account for temporal world knowledge or rely too heavily on keywords within the surrounding text. Moreover, human speech is often disfluent — prone to mistakes, corrections, or thoughts abandoned mid-sentence in favor of new ideas. It is difficult to quantify the failure of modern AI to tackle temporal-based or disfluent speech without appropriate challenge sets that evaluate them against these concepts. A Google research team seeks to address this by providing two new benchmark datasets, TimeDial and Disfl-QA. The first contains over 1,000 dialogs with fill-in-the-blank multiple choice questions about things happening in real time. The second includes questions containing disfluent speech for the AI models to understand and answer. With these datasets, Google has been able to quantify how ML models perform on these tasks compared to their human counterparts. This work will help to drive the development of models that can better handle the nuances of natural language which could benefit a variety of applications such as improving interactions with chatbots or more accurate sentiment analysis of social media text.

The first image above is an example of a multiple-choice task from TimeDial which requires the model to understand the current time in order to answer correctly. The second image is an example of the disfluent forms of questions that occur in Disfl-QA.

SofGAN: A Better Portrait Generator

Generative Adversarial Networks (GANs) are often used to generate realistic portrait-like images. Recent models like StyleGAN provide the ability to control the desired features, such as the subject’s gender, race, and hair color. However, many GANs offering such customization struggle with feature entanglement, where changing one feature of an image changes other, unrelated features along with it. For example, when changing the age of a subject in an image, their earrings could grow longer. A recent paper attempts to address this with a new model: SofGAN. This portrait image generator trains on the geometry and texture (fine detail) of images independently, rather than in parallel like with most standard GANs, which minimizes feature entanglement and allows for greater control over the features. The model also generates images in segments instead of all at once which helps to avoid a common problem seen in GAN training called mode collapse (where the model gets stuck generating images that are too similar). The authors found that SofGAN outperformed the current state-of-the-art in measures of image quality and diversity. Overall, SofGAN’s advancement in feature disentanglement allows practitioners to have greater control over the images they generate, adding more diversity without complicating training or sacrificing image quality.

Above demonstrates how SofGAN can generate an image piece by piece from a user-drawn segmentation map (using a user-friendly interface created by the authors).

Perceiver IO: A General Deep Learning Architecture

Most existing state-of-the-art (SOTA) model architectures are highly specialized, meaning they have been engineered for a specific use case or type of data. As a result, practitioners hoping to apply a SOTA model to a custom use case often have to adapt their data to be compatible with its specialized architecture. Even then, the model may still be too specialized to adequately solve their problem. Many times this dilemma is exacerbated in cases where more than one type of data needs to be fed into the model. To address this, researchers at DeepMind have developed Perceiver IO, a transformer-based model that can handle many different types of input data including text, images, video, audio, and their multimodal combinations. It does this by using attention to first encode input data into small vectors which are compatible with a general architecture. This allows the model to scale efficiently since it controls the effective input size. In addition, this general architecture also allows Perceiver IO to perform well on a wide variety of tasks such as text prediction, optical flow estimation, multimodal autoencoding, image classification, and even gaming (Starcraft II). This work brings us one step closer to achieving true generality in deep learning models and demonstrates how attention, which is normally considered a computationally expensive mechanism, can be used to increase efficiency.

Above is an example of Perceiver IO’s output from optical flow estimation. Each pixel’s color represents the speed and direction of the motion at that point in the video (as shown in the included legend).

AI Ethics: Where to Start?

As more questions arise concerning the ethical implications of machine learning models, it becomes increasingly important to concisely define the types of questions that ML practitioners should be asking about their models. Rachel Thomas, one of the co-founders of Fastai, recently released a series of videos on ML ethics and defines two sets of questions that are important for ML practitioners to reflect on when developing and deploying their models: deontological questions and consequentialist questions. Deontological questions consider the moral standing of an action from a human rights perspective, while consequentialist questions stem from the view that actions must be judged on the basis of their consequences. These methods of questioning one’s actions provide a starting point for evaluating the ethical implications of ML projects. Below are some examples of deontological and consequentialist questions Rachel Thomas provides:

Deontological questions:

  1. What rights of others & duties to others must we respect?
  2. How might the dignity & autonomy of each stakeholder be impacted by this project?
  3. What considerations of trust & of justice are relevant to this design/project?
  4. Does this project involve any conflicting moral duties to others, or conflicting stakeholder rights? How can we prioritize these?

Consequentialist questions:

  1. Who will be directly affected by this project? Who will be indirectly affected?
  2. Will the effects in aggregate likely create more good than harm, and what types of good and harm?
  3. Are we thinking about all relevant types of harm/benefit (psychological, political, environmental, moral, cognitive, emotional, institutional, cultural)?
  4. Do the risks of harm from this project fall disproportionately on the least powerful in society?

This framework laid out by Rachel Thomas is a way to begin defining and categorizing the types of questions ML practitioners ask about their models. These types of questions can help frame the lens that we use to evaluate them and their ethical implications.

This research was performed under the Novetta Machine Learning Center of Excellence.

Xena Grant
Annie Ghrist
Mady Fredriksz
Carlos Martinez