Automatic Generation of Warp-Level Primitives and Atomic Instructions for Fast and Portable Parallel Reduction on GPUs | IEEE Conference Publication | IEEE Xplore