The authors investigated this question by reproducing Cleveland and McGill’s seminal 1984 experiments, which measured human perception efficiency of different visual encodings and defined elementary perceptual tasks for visualization. They measured the graphical perceptual capabilities of four network architectures on five different visualization tasks and compared to existing and new human performance baselines. While under limited circumstances CNNs are able to meet or outperform human task performance, they find that CNNs are not currently a good model for human graphical perception. They also visualize the activation map of CNN and try to understand the recognition process of visualizations.
In visualization, there is increasing research interest in the computational analysis of graphs, charts, and visual encodings [15,23,34], for applications like information extraction and classification, visual question answering (“computer, which category is greater?”), or even design analysis and generation [45].
Deconstructing and restyling D3 visualizations
DVQA: Understanding data visualizations via question answering
Data visualization optimization via computational modelingofperception
Generating charts from data in a data table
However, computational analysis of visualizations is a more complex task than natural image classification [24], requiring the identification, estimation, and relation of visual marks to extract information.
Our goal is to better understand the abilities of CNNs for visualization analysis, and so we investigate the performance of current off-the-shelf CNNs on visualization tasks and show what they can and cannot accomplish.
As computational visualization analysis is predicated upon an understanding of elementary perceptual tasks, we consider the seminal graphical perception settings of Cleveland and McGill.