Visualizing the Scripts of Data Wrangling With Somnus | IEEE Journals & Magazine | IEEE Xplore

Visualizing the Scripts of Data Wrangling With Somnus


Abstract:

Data workers use various scripting languages for data transformation, such as SAS, R, and Python. However, understanding intricate code pieces requires advanced programmi...Show More

Abstract:

Data workers use various scripting languages for data transformation, such as SAS, R, and Python. However, understanding intricate code pieces requires advanced programming skills, which hinders data workers from grasping the idea of data transformation at ease. Program visualization is beneficial for debugging and education and has the potential to illustrate transformations intuitively and interactively. In this article, we explore visualization design for demonstrating the semantics of code pieces in the context of data transformation. First, to depict individual data transformations, we structure a design space by two primary dimensions, i.e., key parameters to encode and possible visual channels to be mapped. Then, we derive a collection of 23 glyphs that visualize the semantics of transformations. Next, we design a pipeline, named Somnus, that provides an overview of the creation and evolution of data tables using a provenance graph. At the same time, it allows detailed investigation of individual transformations. User feedback on Somnus is positive. Our study participants achieved better accuracy with less time using Somnus, and preferred it over carefully-crafted textual description. Further, we provide two example applications to demonstrate the utility and versatility of Somnus.
Published in: IEEE Transactions on Visualization and Computer Graphics ( Volume: 29, Issue: 6, 01 June 2023)
Page(s): 2950 - 2964
Date of Publication: 25 January 2022

ISSN Information:

PubMed ID: 35077364

Funding Agency:

No metrics found for this document.

1 Introduction

Scripting languages including SAS, R, and Python have been widely accepted by data workers for data transformation. They usually seek to understand the semantics of scripts in various scenarios. For example, validation (or called double-checking in some companies and laboratories) is important for data scientists. A data scientist might seek to understand code pieces written by others, then locate and correct possible mistakes. Understanding the semantics of an intricate script, however, requires advanced programming skills. And sometimes, the process is tedious and error-prone [48], [62], [71].

Usage
Select a Year
2025

View as

Total usage sinceJan 2022:714
05101520JanFebMarAprMayJunJulAugSepOctNovDec14613000000000
Year Total:33
Data is updated monthly. Usage includes PDF downloads and HTML views.
Contact IEEE to Subscribe

References

References is not available for this document.