GPNPU: Enabling Efficient Hardware-Based Direct Convolution with Multi-Precision Support in GPU Tensor Cores | IEEE Conference Publication | IEEE Xplore