Big Data issues in Computational Chemistry | IEEE Conference Publication | IEEE Xplore

Big Data issues in Computational Chemistry


Abstract:

Digital data have become a torrent engulfing every area of business, science and engineering disciplines. In the age of Big Data, deriving values and insights from large ...Show More

Abstract:

Digital data have become a torrent engulfing every area of business, science and engineering disciplines. In the age of Big Data, deriving values and insights from large amounts of data using rich analytics becomes an important differentiating capability for competitiveness, success and leadership in every field. Scientists and engineers of many different domains are increasingly clamouring for mechanisms to manage and analyse the massive quantities of information now available in order to obtain new answers and extract from it maximum value. Computational modelling and simulation is the central technology to numerous of these domains. Molecular Dynamics (MD) is a computational simulation technique that describes the physical forces and movements of interacting microscopic elements such atoms and molecules. MD has important applications in the fields of chemistry, biotechnology, pharmaceutical industry, energy, climate or materials science, among others. Advanced MD algorithms include not only Molecular Mechanics (MM), but also Quantum Mechanics (QM) approaches, raising important big data challenges still to be sorted out. MD simulations perform an iterative process generating large amounts of data in streaming. Current software technology is far from being able to manage, analyze and visualize the extremely large and complex data sets generated by important molecular processes. This paper analyzes the current big data limits in the Computational Chemistry field, especially in the MD processes. To overcome these challenging situations, this work provide guidance for future research including advances in scalable algorithms for data analysis, dynamic query technology, data models and storage strategies, parallel executions, I/O optimization, and interactive visual exploration and analysis of MD data.
Date of Conference: 27-29 August 2014
Date Added to IEEE Xplore: 15 December 2014
Electronic ISBN:978-1-4799-4357-9
Conference Location: Barcelona, Spain
References is not available for this document.

I. Introduction

The concept of Big Data mainly refers to data that exceeds the processing capacity of conventional database systems. Data is too big, moves to too fast and/or does not fit in classical database based architectures [1]. To address these new challenges, research innovation on elastic parallel and scalable algorithms is necessary [2]. Computational modelling and simulation are central to numerous scientific and engineering domains, being a good example of Big Data generation and analysis [3]. Basic simulation data is often 4D (three spatial dimensions and time), but additional variable types, such as vector or tensor fields, multiple variables, multiple spatial scales, parameter studies, and uncertainty analysis can increase the dimensionality. Workflows and systems for interacting, storage, managing, visualizing and analysing this data are already at the breaking point [4]. And as computations grow in complexity and fidelity and run on larger computers and clusters, the analysis of the data they generate will become more challenging still [5].

Select All
1.
R. Casado, "The three generations of Big Data processing," in Big Data Spain, 2013.
2.
A. G. W. Paper, "'Big Data:' Big Challenge, Big Opportunity," pp. 1-6.
3.
H. Ode, M. Nakashima, S. Kitamura, W. Sugiura, and H. Sato, "Molecular Dynamics Simulation in Virus Research," Front. Microbiol., vol. 3, 2012.
4.
Y. Demchenko, Z. Zhao, P. Grosso, A. Wibisono, C. De Laat, and N. E. Group, "Big Data Challenges for e-Science Infrastructure."
5.
I. O'Reilly, Big Data Now: 2012 Edition. O'Reilly Media, 2012.
6.
D. Frenkel, B. Smit, and M. A. Ratner, "Understanding Molecular Simulation: From Algorithms to Applications," Phys. Today, vol. 50, no. 7, p. 66, 1997.
7.
P.-E. Bernard, T. Gautier, and D. Trystram, "Large scale simulation of parallel molecular dynamics," in Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999, pp. 638-644.
8.
M. Zhou, U. States, J. Grimmer, G. King, and Q. S. Science, "The Age of Big Data," 2012.
9.
B. C. Gibb, "Big (chemistry) data.," Nat. Chem., vol. 5, no. 4, pp. 248-9, Apr. 2013.
10.
T. Tiankai, C. A. Rendleman, D. W. Borhani, R. O. Dror, J. Gullingsrud, M. O. Jensen, J. L. Klepeis, P. Maragakis, P. Miller, K. A. Stafford, D. E. Shaw, and T. Tu, "A scalable parallel framework for analyzing terascale molecular dynamics simulation trajectories," in High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008. International Conference for, 2008, no. 1, pp. 1-12.
11.
S. Jiao, C. He, Y. Dou, and H. Tang, "Molecular dynamics simulation: Implementation and optimization based on Hadoop," in 2012 8th International Conference on Natural Computation, 2012, pp. 1203-1207.
12.
N. Allsopp, G. Ruocco, and a Fratalocchi, "Molecular dynamics beyonds the limits: massive scaling on 72 racks of a BlueGene/P and supercooled glass transition of a 1 billion particles system," Cogn. Sci., vol. cond-mat.s, no. 8, p. 14, Apr. 2011.
13.
S. Kumar, V. Pascucci, V. Vishwanath, P. Carns, M. Hereld, R. Latham, T. Peterka, M. E. Papka, and R. Ross, "Towards parallel access of multi-dimensional, multi-resolution scientific data," Petascale Data Storage Work. PDSW 2010 5th, vol. 1, no. c, pp. 1-5, 2010.
14.
Y. Duan, C. Wu, S. Chowdhury, M. C. Lee, G. Xiong, W. Zhang, R. Yang, P. Cieplak, R. Luo, T. Lee, J. Caldwell, J. Wang, and P. Kollman, "A point-charge force field for molecular mechanics simulations of proteins based on condensed-phase quantum mechanical calculations.," J. Comput. Chem., vol. 24, no. 16, pp. 1999-2012, 2003.
15.
F. Dehne and H. Zaboli, "Parallel Real-Time OLAP on Multi-core Processors," in 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), 2012, pp. 588-594.
16.
J. E. Stone, K. L. Vandivort, and K. Schulten, "GPU-accelerated molecular visualization on petascale supercomputing platforms," in Proceedings of the 8th International Workshop on Ultrascale Visualization-UltraVis '13, 2013, pp. 1-8.
17.
P. J. Sadalage and M. Fowler, NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence [Tapa blanda]. 2012.
18.
T. Estrada, B. Zhang, P. Cicotti, R. S. Armen, and M. Taufer, "A scalable and accurate method for classifying protein-ligand binding geometries using a MapReduce approach.," Comput. Biol. Med., vol. 42, no. 7, pp. 758-71, Jul. 2012.
19.
"Quantum molecular modeling with simulated annealing-A distributed processing and visualization application," in Proceedings SUPERCOMPUTING '90, pp. 816-825.

Contact IEEE to Subscribe

References

References is not available for this document.