Integrating Self-Supervised Speech Model with Pseudo Word-Level Targets from Visually-Grounded Speech Model | IEEE Conference Publication | IEEE Xplore