Visualizing the Scripts of Data Wrangling With Somnus | IEEE Journals & Magazine | IEEE Xplore

Visualizing the Scripts of Data Wrangling With Somnus


Abstract:

Data workers use various scripting languages for data transformation, such as SAS, R, and Python. However, understanding intricate code pieces requires advanced programmi...Show More

Abstract:

Data workers use various scripting languages for data transformation, such as SAS, R, and Python. However, understanding intricate code pieces requires advanced programming skills, which hinders data workers from grasping the idea of data transformation at ease. Program visualization is beneficial for debugging and education and has the potential to illustrate transformations intuitively and interactively. In this article, we explore visualization design for demonstrating the semantics of code pieces in the context of data transformation. First, to depict individual data transformations, we structure a design space by two primary dimensions, i.e., key parameters to encode and possible visual channels to be mapped. Then, we derive a collection of 23 glyphs that visualize the semantics of transformations. Next, we design a pipeline, named Somnus, that provides an overview of the creation and evolution of data tables using a provenance graph. At the same time, it allows detailed investigation of individual transformations. User feedback on Somnus is positive. Our study participants achieved better accuracy with less time using Somnus, and preferred it over carefully-crafted textual description. Further, we provide two example applications to demonstrate the utility and versatility of Somnus.
Published in: IEEE Transactions on Visualization and Computer Graphics ( Volume: 29, Issue: 6, 01 June 2023)
Page(s): 2950 - 2964
Date of Publication: 25 January 2022

ISSN Information:

PubMed ID: 35077364

Funding Agency:

Citations are not available for this document.

1 Introduction

Scripting languages including SAS, R, and Python have been widely accepted by data workers for data transformation. They usually seek to understand the semantics of scripts in various scenarios. For example, validation (or called double-checking in some companies and laboratories) is important for data scientists. A data scientist might seek to understand code pieces written by others, then locate and correct possible mistakes. Understanding the semantics of an intricate script, however, requires advanced programming skills. And sometimes, the process is tedious and error-prone [48], [62], [71].

Cites in Papers - |

Cites in Papers - IEEE (9)

Select All
1.
Andreas Walch, Attila Szabo, Harald Steinlechner, Thomas Ortner, Eduard Gröller, Johanna Schmidt, "BEMTrace: Visualization-Driven Approach for Deriving Building Energy Models from BIM", IEEE Transactions on Visualization and Computer Graphics, vol.31, no.1, pp.240-250, 2025.
2.
Zhongsu Luo, Kai Xiong, Jiajun Zhu, Ran Chen, Xinhuan Shu, Di Weng, Yingcai Wu, "Ferry: Toward Better Understanding of Input/Output Space for Data Wrangling Scripts", IEEE Transactions on Visualization and Computer Graphics, vol.31, no.1, pp.1202-1212, 2025.
3.
Yuan Tian, Weiwei Cui, Dazhen Deng, Xinjing Yi, Yurun Yang, Haidong Zhang, Yingcai Wu, "ChartGPT: Leveraging LLMs to Generate Charts From Abstract Natural Language", IEEE Transactions on Visualization and Computer Graphics, vol.31, no.3, pp.1731-1745, 2025.
4.
Kai Xiong, Xinyi Xu, Siwei Fu, Di Weng, Yongheng Wang, Yingcai Wu, "JsonCurer: Data Quality Management for JSON Based on an Aggregated Schema", IEEE Transactions on Visualization and Computer Graphics, vol.30, no.6, pp.3008-3021, 2024.
5.
Zhen Wen, Yihan Liu, Siwei Tan, Jieyi Chen, Minfeng Zhu, Dongming Han, Jianwei Yin, Mingliang Xu, Wei Chen, "Quantivine: A Visualization Approach for Large-Scale Quantum Circuit Representation and Analysis", IEEE Transactions on Visualization and Computer Graphics, vol.30, no.1, pp.573-583, 2024.
6.
Sungwon In, Tica Lin, Chris North, Hanspeter Pfister, Yalong Yang, "This is the Table I Want! Interactive Data Transformation on Desktop and in Virtual Reality", IEEE Transactions on Visualization and Computer Graphics, vol.30, no.8, pp.5635-5650, 2024.
7.
Yingchaojie Feng, Xingbo Wang, Bo Pan, Kam Kwai Wong, Yi Ren, Shi Liu, Zihan Yan, Yuxin Ma, Huamin Qu, Wei Chen, "XNLI: Explaining and Diagnosing NLI-Based Visual Data Analysis", IEEE Transactions on Visualization and Computer Graphics, vol.30, no.7, pp.3813-3827, 2024.
8.
Ran Chen, Di Weng, Yanwei Huang, Xinhuan Shu, Jiayi Zhou, Guodao Sun, Yingcai Wu, "Rigel: Transforming Tabular Data by Declarative Mapping", IEEE Transactions on Visualization and Computer Graphics, vol.29, no.1, pp.128-138, 2023.
9.
Kai Xiong, Zhongsu Luo, Siwei Fu, Yongheng Wang, Mingliang Xu, Yingcai Wu, "Revealing the Semantics of Data Wrangling Scripts With Comantics", IEEE Transactions on Visualization and Computer Graphics, vol.29, no.1, pp.117-127, 2023.

Cites in Papers - Other Publishers (3)

1.
Liqiong Chen, Lei Yunjie, Sun Huaiying, "Visual software defect prediction method based on improved recurrent criss-cross residual network", International Journal of Web Information Systems, 2024.
2.
Chang Liu, Arif Usta, Jian Zhao, Semih Salihoglu, "Governor: Turning Open Government Data Portals into Interactive Databases", Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pp.1, 2023.
3.
Tao Jianhua, Gong Jiangtao, Gao Nan, Fu Siwei, Liang Shan, Yu Chun, "Human-computer interaction for virtual-real fusion", Journal of Image and Graphics, vol.28, no.6, pp.1513, 2023.
Contact IEEE to Subscribe

References

References is not available for this document.