Balazs Gerofi - IEEE Xplore Author Profile

Showing 1-25 of 25 results

Filter Results

Show

Results

Full-sequence program (FSP) can program multiple bits simultaneously, and thus complete a multiple-page write at one time for naturally enhancing write performance of high density 3-D solid-state drives (SSDs). This article proposes an FSP scheduling approach for the 3-D quad-level cell (QLC) SSDs, to further boost their read responsiveness. Considering each FSP operation in QLC SSDs spans four di...Show More
Most flash-based solid-state drives (SSDs) adopt an onboard dynamic random access memory (DRAM) to buffer hot write data. Then, the write or overwrite operations can be absorbed by the DRAM cache, given that there is sufficient locality in the applications’ I/O access pattern, to consequently avoid flushing the write data onto underlying SSD cells. After analyzing typical real-world workloads over...Show More
Stochastic gradient descent (SGD) is the most prevalent algorithm for training Deep Neural Networks (DNN). SGD iterates the input data set in each training epoch processing data samples in a random access fashion. Because this puts enormous pressure on the I/O subsystem, the most common approach to distributed SGD in HPC environments is to replicate the entire dataset to node local SSDs. However, ...Show More
The long standing consensus in the High-Performance Computing (HPC) Operating Systems (OS) community is that lightweight kernel (LWK) based OSes have the potential to outperform Linux at extreme scale. To explore if LWKs live up to their expectation we developed IHK/McKernel, a lightweight multi-kernel OS designed for HPC, and deployed it on two high-end supercomputers to compare its performance a...Show More
Multi-component workflows play a significant role in High-Performance Computing and Big Data applications. They usually contain multiple, independently developed components that execute side-by-side to perform sophisticated computation and data exchange through file I/O over parallel file system. However, file I/O can become an impediment in such systems and cause undesirable performance degradati...Show More
Emerging workloads on supercomputing platforms are pushing the limits of traditional high-performance computing software environments. Multi-physics, coupled simulations, big data processing and machine learning frameworks, and multi-component workloads pose serious challenges to system and application developers. At the heart of the problem is the lack of cross-stack coordination to enable flexib...Show More
There is a wide range of implementation approaches to multi-threading. User-level threads are efficient because threads can be scheduled by a user-defined scheduling policy that suits the needs of the specific application. However, user-level threads are unable to handle blocking system-calls efficiently. To the contrary, kernel-level threads incur large overhead during context switching. Kernel-l...Show More
The parallel multigrid method is expected to play an important role in large-scale scientific computing on exa-scale supercomputer systems. Previously we proposed Hierarchical Coarse Grid Aggregation (hCGA), which dramatically improved the performance of the parallel multigrid solver when the number of MPI processes was O(104) or more. Because hCGA can handle only two layers of parallel hierarchic...Show More
Provides an abstract of the invited presentation and may include a brief professional biography of the presenter. The complete presentation was not made available for publication as part of the conference proceedings.Show More
Multi-kernels leverage today's multi-core chips to run multiple operating system (OS) kernels, typically a Light Weight Kernel (LWK) and a Linux kernel, simultaneously. The LWK provides high performance and scalability, while the Linux kernel provides compatibility. Multi-kernels show the promise of being able to meet tomorrow's extreme-scale computing needs while providing strong isolation, yield...Show More
Upcoming high-performance computing (HPC) platforms will have more complex memory hierarchies with high-bandwidth on-package memory and in the future also non-volatile memory. How to use such deep memory hierarchies effectively remains an open research question. In this paper we evaluate the performance implications of a scheme based on a software-managed scratchpad with coarse-grained memory-copy...Show More
Following the invention of the telegraph, electronic computer, and remote sensing, “big data” is bringing another revolution to weather prediction. As sensor and computer technologies advance, orders of magnitude bigger data are produced by new sensors and high-precision computer simulation or “big simulation.” Data assimilation (DA) is a key to numerical weather prediction (NWP) by integrating th...Show More
Extreme degree of parallelism in high-end computing requires low operating system noise so that large scale, bulk-synchronous parallel applications can be run efficiently. Noiseless execution has been historically achieved by deploying lightweight kernels (LWK), which, on the other hand, can provide only a restricted set of the POSIX API in exchange for scalability. However, the increasing prevale...Show More
Distributed file systems have been widely deployed as back-end storage systems to offer I/O services for parallel/distributed applications that process large amounts of data. Data prefetching in distributed file systems is a well-known optimization technique which can mask both network and disk latency and consequently boost I/O performance. Traditionally, data prefetching is initiated by the clie...Show More
Turning towards exascale systems and beyond, it has been widely argued that the currently available systems software is not going to be feasible due to various requirements such as the ability to deal with heterogeneous architectures, the need for systems level optimization targeting specific applications, elimination of OS noise, and at the same time, compatibility with legacy applications. To co...Show More
Heterogeneous architectures, where a multicore processor is accompanied with a large number of simpler, but more power-efficient CPU cores optimized for parallel workloads, are receiving a lot of attention recently. At present, these co-processors, such as the Intel Xeon Phi product family, come with limited on-board memory, which requires partitioning computational problems manually into pieces t...Show More
Heterogeneous architectures, where a multicore processor is accompanied with a large number of simpler, but more power-efficient CPU cores optimized for parallel workloads, are receiving a lot of attention these days. Currently, these co-processors come with a limited on-board memory, which requires partitioning computational problems manually into pieces that can fit into the device's RAM as well...Show More
The Intel Many Integrated Core (Intel MIC) architecture is Intel's latest design targeted for processing highly parallel workloads. The Intel MIC architecture is implemented on a PCI card, and has its own on-board memory, connected to the host memory through PCI DMA operations. The on-board memory is faster than the one in the host, but it is significantly smaller, requiring the programmer to part...Show More
Heterogeneous architectures, where a multicore processor, which is optimized for fast single-thread performance, is accompanied with a large number of simpler, but more power-efficient cores optimized for parallel workloads, such as NVIDIA's GPUs or Intel's Many Integrated Core (MIC), have been receiving a lot attention recently. Although NVIDIA's GPUs include built-in support for parallelism cont...Show More
Checkpoint-recovery based Virtual Machine (VM) replication is an emerging approach towards accommodating VM installations with high availability, especially, due to its inherent capability of tackling with symmetric multiprocessing (SMP) virtual machines, i.e. VMs with multiple virtual CPUs (vCPUs). However, it comes with the price of significant performance degradation of the application executed...Show More
Checkpoint-recovery based Virtual Machine (VM) replication is an emerging approach towards accommodating VM installations with high availability. However, it comes with the price of significant performance degradation of the application executed in the VM due to the large amount of state that needs to be synchronized between the primary and the backup machines. It is therefore critical to find new...Show More
With the growing prevalence of cloud computing and the increasing number of CPU cores in modern processors, symmetric multiprocessing (SMP) Virtual Machines (VM), i.e. virtual machines with multiple virtual CPUs, are gaining significance. However, accommodating SMP virtual machines with high availability at low overhead is still an open problem. Checkpoint-recovery based VM replication is an emerg...Show More
Distributed virtual environments (DVE), such as multi-player online games and distributed simulations may involve a massive amount of concurrent clients. Deploying distributed server architectures is currently the most prevalent way of providing such large-scale services, where typically the virtual space is divided into several distinct regions requiring each server to handle only part of the vir...Show More
With the advent of multi- and many-core architectures, new opportunities in fault-tolerant computing have become available. In this paper we propose a novel process replication method that provides transparent failover of non-deterministic TCP services by utilizing spare CPU cores. Our method does not require any changes to the TCP protocol, does not require any changes to the client software, and...Show More