US-Byte: An Efficient Communication Framework for Scheduling Unequal-Sized Tensor Blocks in Distributed Deep Learning | IEEE Journals & Magazine | IEEE Xplore