Chehre: An Emoji-Prompted Video Dataset for \\Perceptually Diverse Facial Expression Recognition

Paper Code Dataset

Abstract

Facial expressions are nonverbal social signals used in human interaction, but facial expression recognition datasets often focus on static images, basic emotion categories, or single deterministic annotations. We introduce Chehre, an emoji-prompted video dataset for analyzing dynamic facial expressions, human perception across a wide range of expressions, and human perceptual diversity. In Chehre, participants were prompted to express 40 facial emojis, each for recording one video. Later, their facial motions were transferred onto synthetic faces to preserve privacy. A separate group of annotators analyzed the anonymized videos using emoji and label annotations, resulting in 2,111 high quality videos collected from 203 performers and validated by 902 annotators. We define two benchmark tasks: dominant expression recognition, which tests whether models recover the top human-rated labels, and distributional expression recognition, which tests whether models capture the diversity of human responses. We benchmark recent vision-language models using random sampling and persona prompting to generate multiple predictions per video. Results show that both tasks are challenging: models often struggle to recover the dominant human labels, and their generated rating distributions remain far from human distribution. Chehre provides a benchmark for evaluating socially grounded, dynamic, and distributional facial expression recognition.

Dataset Samples

Click an emoji to view anonymized video samples and rating plots.