Quantization and Hardware Architecture Co-Design for Matrix-Vector Multiplications of Large Language Models

Quantization and Hardware Architecture Co-Design for Matrix-Vector Multiplications of Large Language Models | IEEE Journals & Magazine | IEEE Xplore