Batched LU Factorization With Fast Row Interchanges for Small Matrices on GPUs | IEEE Conference Publication | IEEE Xplore