Researchers have developed a new, lighter convolutional neural network (CNN) model for facial expression recognition. The findings describe an Xception-based model that balances training speed, memory usage, and recognition accuracy.
The original Xception model has 71 layers and can load a pre-trained version of the network trained on over a million images from the ImageNet biometric database. The trained network can then categorize images into a thousand different object categories. The improved Xception model, however, differs from previous CNN models in that it employs depth-wise separable convolutions.
Convolutions are the fundamental operation carried out at each layer of a CNN. The type of convolution used in the study differs from standard convolution in that it independently processes different channels (such as RGB) of the input image. At the end of the process, it combines the results.
Furthermore, the facial expression recognition model combines this convolution type with a technique known as “pre-activated residual blocks,” which results in significantly lower computational costs and the number of parameters required for accurate classification. With as few as 58,000 parameters, the researchers were able to create a model with good generalization ability. They put the new model up in a classroom setting against several other facial recognition algorithms. The “Extended Cohn-Kanade dataset,” which contains over 35,000 labeled images of faces expressing common emotions, was used to train and test all models in the experiment.