I. Introduction
There has been much recent work on developing FPGA implementations of Convolutional Neural Networks (CNNs). While significant progress has been made in optimising the inference process of general CNN models on FPGAs, training and optimising CNNs for various domain-specific applications remain a demanding task. CNN models for domain-specific applications only need to detect or classify objects from a narrow range of classes. Recent discovery in transfer learning [1] - a research topic focusing on exploiting features reusable from one task to another - shows that CNN models that are pre-trained on general datasets can be efficiently fine-tuned [2] for specific domains. This approach works well for medical image analysis: a pre-trained CNN with adequate fine-tuning can outperform or perform as well as training from scratch [3].