Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis | IEEE Conference Publication | IEEE Xplore