BASELINE • December 2019

An ML Newsletter from Novetta

Welcome to the December 2019 installment of BASELINE, Novetta’s Machine Learning Newsletter, where we share thoughts on important advances in machine learning technologies. This month we discuss the following areas:

  • At AWS re:Invent, new services and features relevant to our data science, image processing, and IoT teams were announced.
  • Netflix open-sourced a platform to simplify the engineering side of data science.
  • Improvement in quality of deepfakes continues, while Facebook and Kaggle launched a competition to detect deepfakes.

SageMaker Improvements and AutoML

At AWS re:Invent, new Amazon SageMaker capabilities and features were announced intended to address the needs of experienced and junior data scientists. We have been investigating good solutions for tracking model training, so the introduction of model performance monitoring and experiment tracking is welcomed. Additionally, we do a lot of work with PyTorch, so the potential to get a 20% speed increase by using the new optimized version of PyTorch could help with our rapid prototyping. However, the most interesting release is Autopilot, Amazon’s version of automated machine learning (AutoML) for structured or tabular data, which trains 50 different models in parallel. Being able to quickly and easily see the results of so many models will help us quickly identify the best solution, balancing the tradeoff between inference time and accuracy.

AWS Announces Amazon Rekognition Custom Labels

Amazon Rekognition is an AWS service that performs object and scene detection and face recognition from images and videos. AWS didn’t save all of the best announcements for re:Invent – in November AWS announced Amazon Rekognition Custom Labels, which allows users to train their own machine learning models that are tailored to their specific use case. While we have always appreciated how easy it was to use Rekognition, it didn’t always meet the needs of our customers who would want to detect objects that weren’t covered by Rekognition. Now with Custom Labels, our customers will be able to much more easily train models to meet their use cases.

Metaflow Open-Sourced by Netflix

Given the background of many data scientists, data science projects at many organizations lack the structure often found in software development projects. Netflix is looking to help address this by releasing Metaflow, which simplifies things like tracking the results of experimentation, debugging, and scaling machine learning models. As the field of machine learning advances, deploying models at scale is a larger focus for practitioners than ever before. To ease adoption, the open source python library is built upon tools data scientists are already familiar with, such as Jupyter notebooks. Metaflow is also tightly integrated with AWS, which enables users to get access to the compute resources and storage they need.

NVIDIA Released StyleGAN2 

NVIDIA has released its upgraded version of StyleGAN, appropriately titled StyleGAN2, which achieves state of the art performance on image generation of faces. While version 2 still uses a generative adversarial network (GAN), the NVIDIA team has introduced improvements such as faster training, smoother interpolation, and fewer artifacts. Check out the paper to see some of the very realistic generated images. The code is also open-sourced, enabling other researchers to explore the improvements on their own.

Facebook Announces Deepfake Detection Challenge

Detecting when an image or video is authentic or simulated is becoming an increasingly important and challenging task. In order to combat deepfakes, Facebook AI has partnered with the data science competition site Kaggle to host the Deepfake Detection Challenge competition. A new dataset of 100,000 videos, which includes real videos and deepfakes of the same person, will be used for the competition in which competitors must design models to help distinguish the deepfakes from the authentic videos. The winner of the competition is required to open-source their solution to help advance the field.

This research was performed under the Novetta Machine Learning Center of Excellence.


Authors:

Shauna Revay, PhD
Brian Sacash
Matt Teschke