Loading web-font TeX/Main/Regular
Data2Vis: Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Networks | IEEE Journals & Magazine | IEEE Xplore

Data2Vis: Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Networks


Abstract:

Rapidly creating effective visualizations using expressive grammars is challenging for users who have limited time and limited skills in statistics and data visualization...Show More

Abstract:

Rapidly creating effective visualizations using expressive grammars is challenging for users who have limited time and limited skills in statistics and data visualization. Even high-level, dedicated visualization tools often require users to manually select among data attributes, decide which transformations to apply, and specify mappings between visual encoding variables and raw or transformed attributes. In this paper we introduce Data2Vis, an end-to-end trainable neural translation model for automatically generating visualizations from given datasets. We formulate visualization generation as a language translation problem, where data specifications are mapped to visualization specifications in a declarative language (Vega-Lite). To this end, we train a multilayered attention-based encoder–decoder network with long short-term memory (LSTM) units on a corpus of visualization specifications. Qualitative results show that our model learns the vocabulary and syntax for a valid visualization specification, appropriate transformations (count, bins, mean), and how to use common data selection patterns that occur within data visualizations. We introduce two metrics for evaluating the task of automated visualization generation (language syntax validity, visualization grammar syntax validity) and demonstrate the efficacy of bidirectional models with attention mechanisms for this task. Data2Vis generates visualizations that are comparable to manually created visualizations in a fraction of the time, with potential to learn more complex visualization strategies at scale.
Published in: IEEE Computer Graphics and Applications ( Volume: 39, Issue: 5, 01 Sept.-Oct. 2019)
Page(s): 33 - 46
Date of Publication: 24 June 2019

ISSN Information:

PubMed ID: 31247545

Users create data visualizations using a range of tools with a range of characteristics (see Figure 1). Some of these tools are more expressive, giving expert users more control, while others are easier to learn and faster to create visualizations, appealing to general audiences. For instance, imperative APIs such as OpenGL and HTML Canvas provide greater expressivity and flexibility, but require significant programming skills and effort. On the other hand, dedicated visual analysis tools and spreadsheet applications (e.g., Microsoft Excel, Google Spreadsheets) offer ease of use and speed in creating standard charts based on templates, but afford limited expressivity and customization. Declarative specification grammars such as ggplot2, D3, and Vega provide a tradeoff between speed and expressivity. However, these grammars come with steep learning curves and can be tedious to specify. In fact, there is little known about the developer experience with visualization grammars, beyond the degree of their adoption by users. For example, ggplot2 can be difficult for users who are not familiar with R. Vega, based on a JSON schema, and can be tedious even for those who are familiar with JSON. Higher level abstractions such as chart templates also still require the user to manually select among data attributes, decide which statistical computations to apply, and specify mappings between visual encoding variables and either the raw data or the computational summaries. This task can be daunting with complex datasets especially for typical users who have limited time and limited skills in statistics and data visualization. To address these challenges, researchers have proposed models, techniques, and tools to automate designing effective visualizations and guide users in visual data exploration.^, ^,

Contact IEEE to Subscribe

References

References is not available for this document.