Need help with Challenge-condition-FER-dataset?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

kaiwang960112
181 Stars 46 Forks 120 Commits 32 Opened issues

#### Description

This is our collected datasets for challenge condition facial expression recognition

!
?

No Data

# Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition

Kai Wang, Xiaojiang Peng, Jianfei Yang, Debin Meng, and Yu Qiao
{kai.wang, xj.peng, db.meng, yu.qiao}@siat.ac.cn

## Abstract

Occlusion and pose variations, which can change facial appearance signiﬁcantly, are among two major obstacles for automatic Facial Expression Recognition (FER). Though automatic FER has made substantial progresses in the past few decades, occlusion-robust and pose-invariant issues of FER have received relatively less attention, especially in real-world scenarios.Thispaperaddressesthereal-worldposeandocclusionrobust FER problem with three-fold contributions. First, to stimulate the research of FER under real-world occlusions and variant poses, we build several in-the-wild facial expression datasets with manual annotations for the community. Second, we propose a novel Region Attention Network (RAN), to adaptively capture the importance of facial regions for occlusion and pose variant FER. The RAN aggregates and embeds varied number of region features produced by a backbone convolutional neural network into a compact ﬁxed-length representation. Last, inspired by the fact that facial expressions are mainly deﬁned by facial action units, we propose a region biased loss to encourage high attentionweightsforthemostimportantregions.Weexamineour RAN and region biased loss on both our built test datasets and four popular datasets: FERPlus, AffectNet, RAF-DB, and SFEW. Extensive experiments show that our RAN and region biased loss largely improve the performance of FER with occlusion and variant pose. Our methods also achieve state-of-the-art results on FERPlus, AffectNet, RAF-DB, and SFEW.

## Region Attention Network

we propose the Region Attention Network (RAN), to capture the importance of facial regions for occlusion and pose robust FER. The RAN is comprised of a feature extraction module, a self-attention module, and a relation attention module. The proposed RAN mainly consists of two stages. The first stage is to coarsely calculate the importance of each region by a FC layer conducted on its own feature, which is called self-attention module. The second stage seeks to find more accurate attention weights by modeling the relation between the region features and the aggregated content representation from the first stage, which is called relation-attention module. The latter two modules aim to learn coarse attention weights and refine them with global context, respectively. Given a number of facial regions, our RAN learns attention weights for each region in an end-to-end manner, and aggregates their CNN-based features into a compact fixed-length representation. Besides, the RAN model has two auxiliary effects on the face images. On one hand, cropping regions can enlarge the training data which is important for those insufficient challenging samples. On the other hand, rescaling the regions to the size of original images highlights fine-grain facial features.

### Region Biased Loss

Inspired by the observation that different facial expressions are mainly defined by different facial regions, we make a straightforward constraint on the attention weights of self-attention, i.e. region biased loss (RB-Loss). This constraint enforces that one of the attention weights from facial crops should be larger than the original face image with a margin. Formally, the RB-Loss is defined as, where is a hyper-parameter served as a margin, is the attention weight of the copy face image, denotes the maximum weight of all facial crops.

## Confused Metrics

The confusion matrices of baseline methods and our RAN on the Occlusion- and Pose-FERPlus test sets.

The confusion matrices of baseline methods and our RAN on the Occlusion- and Pose-AffectNet test sets.

## What is learned for occlusion and pose variant faces?

Illustration of learned attention weights for different regions along with origianl faces. $s(\cdot)$ denotes the softmax function. Red-filled boxes indicate the highest weights while blue-filled ones are the lowest weights. From left to right, the columns represent the original faces, regions $I_1$ to $I_5$.Note that the left and right figures show the weights by use the PBLoss or not respectively. <!-- What is learned for occlusion and pose variant faces? In the left, we illustrate the final attention weights with softmax function to better explore our RAN in figure above. To better explore our RAN, we illustrate the final attention weights for several examples with RB-Loss and without RB-Loss, respectively. Occlusion examples are shown in the first two rows, and pose examples in the last two rows. -->

## Comparison with the state-of-the-art methods

We compare our best results to several stateof-the-art methods on FERPlus, AffectNet, SFEW, and RAFDB.

## Our collected datasets and state-of-the-art models

You can find occlusion dataset(ferplusocclusion, affectnetocclusion), pose(>30)(ferpluspose30, affectnetpose30) and pose(>45)(ferpluspose45, affectnetpose45) list.

The state-of-the-art models will be update at this link