Optimizing the Microarchitecture
To achieve efficient instruction execution without excessive design complexity, previous implementations of the ST20-C2 architecture employed relatively short pipelines to reduce branch delays and operand or result feedback penalties, thus producing acceptable IPC over many applications. In iCore's case, however, the aggressive frequency target dictated the use of a longer pipeline to minimize stage-to-stage combinatorial delays. Determining how to add the extra pipeline stages without decreasing instruction execution efficiency was a problem because there would be little point in increasing raw clock frequency at the expense of IPC. Following an analysis using a C-based performance model, we chose a relatively conventional pipeline structure that had some important variations targeted at optimizing instruction flow for the unique ST20-C2 architecture.