An ML Newsletter from Novetta
Welcome to the October 2021 BASELINE, Novetta’s Machine Learning Newsletter, where we share thoughts on important advances in machine learning technologies. This month we cover the following topics:
- Detecting DeepFakes with emotions
- OpenAI’s approach to summarizing entire books
- Facial analysis in the wild using entirely synthetic face images
- AI Ethics: Singapore’s robot watchmen
Detecting DeepFakes with Emotions
With the incredible success of DeepFakes in recent years, the possibility of bad actors spreading misinformation through highly realistic forged videos is turning into a reality. As a result, researchers are racing to find robust and generalizable techniques for detecting fake videos. A team from Drexel University and the Milan Polytechnic Institute presented a new detection technique at CVPR (June 2021) which leverages unrealistic emotions conveyed by faces and voices in DeepFake videos. The fundamental idea driving their technique is that emotions are conveyed in both facial expressions and vocal intonations. The team analyzed emotion features extracted from audio and video in DeepFakes and found that audio and video emotion features that captured emotion evolution, that is changes in emotions over time, were distinct from those extracted from real videos. The team trained three DeepFake Detectors on temporal audio, temporal video, and temporal audio/video features that could detect DeepFakes from the DeepFake Detection Challenge dataset with over 95% accuracy. This suggests that DeepFakes do not imitate realistic changes in emotions over time. These findings are significant because today’s most successful DeepFake techniques synthesize video on a frame by frame basis, and do not take into consideration temporal changes in emotion. This work exposes a weakness in DeepFake technology that can be leveraged for detection in even the most convincing DeepFake videos.
Proposed workflow for using emotions conveyed by audio and video to detect DeepFakes
How does AI Summarize an Entire Book?
Natural Language Processing (NLP) tasks often aim to increase efficiency for humans, assisting with tasks such as detecting text sentiment at scale or deriving the topic of an article. In order to tackle another manually-intensive task, OpenAI has developed a model that can summarize books of any length. Traditional NLP summarization models are unable to summarize documents much longer than most news articles. To overcome this limitation, OpenAI fine-tuned their GPT-3 model to break large text into small portions and then recursively summarize those portions until a full-text summary is achieved. By combining this approach with reinforcement learning, which rewards or penalizes the model for specific behaviors as they align to human summary preference, OpenAI’s model was able to generate “book-level summaries”. The end results are more similar to a list of detailed events than a traditional book synopsis, but OpenAI aims to perfect this approach. While there currently is no plan to open source their summarization model, Novetta has independently been exploring similar approaches. Researchers in the Machine Learning Center of Excellence have worked closely with our Novetta Mission Analytics team to develop document summarization on clusters of related text to deliver high-level summaries. This approach enables analysts to quickly get an overview of a news event while providing them the ability to dive deeper into individual news articles if desired. The development of these types of large scale summary systems can augment traditional information retrieval applications, effectively lowering the burden of institutional knowledge required to operate in legacy data systems.
Recursive summarization producing a final summary for “Romeo and Juliet”
Can a Synthetic Face Dataset Perform Well in the Wild?
The idea of using entirely synthetic images to train facial computer vision models is appealing because gathering and labeling real data is often time-consuming, expensive, and prone to bias. On the other hand, using synthetic data is appealing because it ensures perfect ground-truth labels and facilitates the creation of more complex labels than humans can produce in a reasonable amount of time. It also gives practitioners full control over the diversity and composition of their datasets. However, previous attempts to use synthetic images during model training were unsuccessful due to the domain-to-reality gap, leading to poor performance on real images. A recent breakthrough from Microsoft minimizes this gap by combining “a procedurally-generated parametric 3D face model with a comprehensive library of hand-crafted assets to render training images with unprecedented realism and diversity.” Their method generates synthetic faces by starting with a template face and randomly choosing the identity, emotional expression, texture (skin and fine-details), hair/clothing/accessories, and the image background. To evaluate the dataset’s ability to cross the domain gap, they used it to train state-of-the-art (SOTA) face segmentation and landmark localization models and tested the models on real images. The performance of both models was competitive with the current SOTA models, even though they were trained on entirely synthetic data. While a successful proof-of-concept, developing this method required significant investment and domain expertise, limiting its broader applicability. However, Microsoft has made this dataset publicly available. This work is the first time that models trained on entirely synthetic face data have been able to compete with traditional models, creating a pathway for the creation of datasets that better represent the diversity of the human population.
An example of the image creation pipeline starting with a template face and randomizing each level of features until a realistic synthetic image has been rendered
AI Ethics Feature
Singapore’s Robot Watchmen
The deployment of machine learning models for new use cases that are used to monitor the public brings AI ethics concerns to the forefront and raises privacy and civil liberty implications. In September, Singapore began a three week trial where two robots were sent out to central Singapore to quell poor social behavior. The robots were equipped with cameras to film behavior considered bad for the community, such as smoking in public places, disobeying rules surrounding COVID safety, or parking improperly. When they detected behavior of the aforementioned types, the robots alerted their control center and also displayed messages educating the public about proper behavior. While such an approach might be appealing to some, it raises significant concerns around potential fairness and bias concerns, such as if certain genders or races are more likely to be flagged for “bad” behavior. This trial highlights the importance of robust AI ethics frameworks that can help determine fair use and provide guidance for balancing privacy versus public good.
This research was performed under the Novetta Machine Learning Center of Excellence.