FPGA architectures
To understand the barriers to mainstream usage, we must first understand how FPGA architectures differ from multicore and GPU architectures. The most fundamental difference is that multi-cores and GPUs provide general-purpose or specialized processors to execute parallel threads, whereas FPGA architectures implement digital circuits by providing numerous lookup tables implemented in small RAMs. Lookup tables implement combinational logic by storing the corresponding truth table and using the logic inputs as the address into the lookup table. Figure 1 shows a simple example of a 32-bit adder decomposed into 32 full adder circuits, which synthesis tools might map onto 32 lookup tables. Similarly, FPGAs enable sequential circuits by providing flip-flops along lookup table outputs. By providing hundreds of thousands of lookup tables and flip-flops, FPGAs can implement massively parallel circuits.
Field-programmable gate arrays (FPGAs) perform computation by implementing the corresponding logic (for example, a 32-bit adder) using lookup tables (LUTs). For this example, synthesis tools divide the 32-bit adder into smaller circuits (for example, 32 full adders) and then map those circuits onto lookup tables with the same or larger numbers of inputs and outputs.