Abstract:
Typical reconfigurable machines exhibit shortcomings that make them less than ideal for general-purpose computing. The Garp Architecture combines reconfigurable hardware ...Show MoreMetadata
Abstract:
Typical reconfigurable machines exhibit shortcomings that make them less than ideal for general-purpose computing. The Garp Architecture combines reconfigurable hardware with a standard MIPS processor on the same die to retain the better features of both. Novel aspects of the architecture are presented, as well as a prototype software environment and preliminary performance results. Compared to an UltraSPARC, a Garp of similar technology could achieve speedups ranging from a factor of 2 to as high as a factor of 24 for some useful applications.
Published in: Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)
Date of Conference: 16-18 April 1997
Date Added to IEEE Xplore: 06 August 2002
Print ISBN:0-8186-8159-4
Citations are not available for this document.
Cites in Patents (237)Patent Links Provided by 1790 Analytics
1.
Vinod, Krishna N.; Kaushikkar, Sujoyita; Kakade, Aniket S.; ChoFleming, Kermin; Zou, Ping; Suprun, Alexey; Daya, Bhavya K., "Apparatuses, methods, and systems for memory interface circuit arbitration in a configurable spatial accelerator"
Inventors:
Vinod, Krishna N.; Kaushikkar, Sujoyita; Kakade, Aniket S.; ChoFleming, Kermin; Zou, Ping; Suprun, Alexey; Daya, Bhavya K.
Abstract:
Systems, methods, and apparatuses relating to arbitration among a plurality of memory interface circuits in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; a plurality of request address file (RAF) circuits, and a circuit switched interconnect network between the plurality of processing elements and the RAF circuits. As a dataflow architecture, embodiments of CSA have a unique memory architecture where memory accesses are decoupled into an explicit request and response phase allowing pipelining through memory. Certain embodiments herein provide for improved memory sub-system design via arbitration and the improvements to arbitration discussed herein.
Assignee:
INTEL CORP
Filing Date:
29 June 2019
Grant Date:
15 June 2021
Patent Classes:
Current International Class:
G06N0030400000, G06F0075300000, G06F0013234000, G06F0092200000
2.
ChoFleming, Jr., Kermin E.; Tithi, Jesmin Jahan; Cranmer, Joshua; Srinivasan, Suresh, "Methods and apparatus to detect and annotate backedges in a dataflow graph"
Inventors:
ChoFleming, Jr., Kermin E.; Tithi, Jesmin Jahan; Cranmer, Joshua; Srinivasan, Suresh
Abstract:
Disclosed examples to detect and annotate backedges in data-flow graphs include: a characteristic detector to store a node characteristic identifier in memory in association with a first node of a dataflow graph; a characteristic comparator to compare the node characteristic identifier with a reference criterion; and a backedge identifier generator to generate a backedge identifier indicative of a backedge between the first node and a second node of the dataflow graph based on the comparison, the memory to store the backedge identifier in association with a connection arc between the first and second nodes.
Assignee:
INTEL CORP
Filing Date:
30 March 2019
Grant Date:
08 June 2021
Patent Classes:
Current International Class:
G06F0083400000, G06F0094480000, G06F0084100000, G06F0158200000
3.
Omtzigt, Erwinus Theodorus Leonardus, "Execution engine for executing single assignment programs with affine dependencies"
Inventors:
Omtzigt, Erwinus Theodorus Leonardus
Abstract:
The execution engine is a new organization for a digital data processing apparatus, suitable for highly parallel execution of structured fine-grain parallel computations. The execution engine includes a memory for storing data and a domain flow program, a controller for requesting the domain flow program from the memory, and further for translating the program into programming information, a processor fabric for processing the domain flow programming information and a crossbar for sending tokens and the programming information to the processor fabric.
Assignee:
STILLWATER SUPERCOMPUTING INC
Filing Date:
01 April 2019
Grant Date:
08 June 2021
Patent Classes:
Current International Class:
G06F0158000000, G06F0158200000, G06F0151730000
4.
ChoFleming, Jr., Kermin E.; Tithi, Jesmin Jahan; Srinivasan, Suresh; Iyer, Mahesh A., "Methods and apparatus to insert buffers in a dataflow graph"
Inventors:
ChoFleming, Jr., Kermin E.; Tithi, Jesmin Jahan; Srinivasan, Suresh; Iyer, Mahesh A.
Abstract:
Disclosed examples to insert buffers in dataflow graphs include: a backedge filter to remove a backedge between a first node and a second node of a dataflow graph, the first node representing a first operation of the dataflow graph, the second node representing a second operation of the dataflow graph; a latency calculator to determine a critical path latency of a critical path of the dataflow graph that includes the first node and the second node, the critical path having a longer latency to completion relative to a second path that terminates at the second node; a latency comparator to compare the critical path latency to a latency sum of a buffer latency and a second path latency, the second path latency corresponding to the second path; and a buffer allocator to insert one or more buffers in the second path based on the comparison performed by the latency comparator.
Assignee:
INTEL CORP
Filing Date:
30 March 2019
Grant Date:
30 March 2021
Patent Classes:
Current International Class:
H04L0122400000, H04L0122600000, H04L0128610000
5.
Ivanov, Vladimir, "Method, device and system for control signalling in a data path module of a data stream processing engine"
Inventors:
Ivanov, Vladimir
Abstract:
Techniques and mechanisms for exchanging control signals in a data path module of a data stream processing engine. In an embodiment, the data path module may be configured to form a set of one or more data paths corresponding to an instruction which is to be executed. In another embodiment, data processing units of the data path module may be configured to exchange one or more control signals for elastic execution of the instruction.
Assignee:
INTEL CORP
Filing Date:
14 December 2018
Grant Date:
09 March 2021
Patent Classes:
Current International Class:
G06F0093000000, G06F0093800000, G06F0093200000, G06F0151700000, G06F0134000000
6.
ChoFleming, Kermin; Bai, Yu; Steely, Simon C., "Apparatuses, methods, and systems for memory interface circuit allocation in a configurable spatial accelerator"
Inventors:
ChoFleming, Kermin; Bai, Yu; Steely, Simon C.
Abstract:
Systems, methods, and apparatuses relating to memory interface circuit allocation in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; a plurality of request address file (RAF) circuits, and a circuit switched interconnect network between the plurality of processing elements and the RAF circuits. As a dataflow architecture, embodiments of CSA have a unique memory architecture where memory accesses are decoupled into an explicit request and response phase allowing pipelining through memory. Certain embodiments herein provide for an improved memory sub-system design via the improvements to allocation discussed herein.
Assignee:
INTEL CORP
Filing Date:
30 March 2019
Grant Date:
09 February 2021
Patent Classes:
Current International Class:
G06F0131600000, G06F0169010000, G06F0120806000
7.
Mathew, Suresh; Diamond, Mitchell; Fleming, Jr., Kermin E., "Apparatus, methods, and systems for low latency communication in a configurable spatial accelerator"
Inventors:
Mathew, Suresh; Diamond, Mitchell; Fleming, Jr., Kermin E.
Abstract:
Systems, methods, and apparatuses relating to low latency communications in a configurable spatial accelerator are described. In one embodiment, a processor includes a spatial array of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, a plurality of request address file circuits coupled to the spatial array of processing elements and a cache memory, each request address file circuit of the plurality of request address file circuits to access data in the cache memory in response to a request for data access from the spatial array of processing elements, a plurality of translation lookaside buffers comprising a translation lookaside buffer in each of the plurality of request address file circuits to provide an output of a physical address for an input of a virtual address, and a function controller to receive an interrupt that includes a first field, that when set to a first value, causes a shootdown message to be broadcast to the plurality of translation lookaside buffers to cause a shootdown in the plurality of translation lookaside buffers.
Assignee:
INTEL CORP
Filing Date:
30 June 2018
Grant Date:
12 January 2021
Patent Classes:
Current International Class:
G06F0121027000, G06F0030600000
8.
Fleming, Jr., Kermin E.; Zou, Ping; Diamond, Mitchell; Keen, Benjamin, "Apparatuses, methods, and systems for conditional operations in a configurable spatial accelerator"
Inventors:
Fleming, Jr., Kermin E.; Zou, Ping; Diamond, Mitchell; Keen, Benjamin
Abstract:
Systems, methods, and apparatuses relating to conditional operations in a configurable spatial accelerator are described. In one embodiment, a hardware accelerator includes an output buffer of a first processing element coupled to an input buffer of a second processing element via a first data path that is to send a first dataflow token from the output buffer of the first processing element to the input buffer of the second processing element when the first dataflow token is received in the output buffer of the first processing element; an output buffer of a third processing element coupled to the input buffer of the second processing element via a second data path that is to send a second dataflow token from the output buffer of the third processing element to the input buffer of the second processing element when the second dataflow token is received in the output buffer of the third processing element; a first backpressure path from the input buffer of the second processing element to the first processing element to indicate to the first processing element when storage is not available in the input buffer of the second processing element; a second backpressure path from the input buffer of the second processing element to the third processing element to indicate to the third processing element when storage is not available in the input buffer of the second processing element; and a scheduler of the second processing element to cause storage of the first dataflow token from the first data path into the input buffer of the second processing element when both the first backpressure path indicates storage is available in the input buffer of the second processing element and a conditional token received in a conditional queue of the second processing element from another processing element is a first value.
Assignee:
INTEL CORP
Filing Date:
30 June 2018
Grant Date:
01 December 2020
Patent Classes:
Current International Class:
G06F0150000000, G06F0157600000, G06F0093000000, G06F0095000000, G06F0158200000, G06F0093800000
9.
Ahsan, Bushra; Adler, Michael C.; Crago, Neal C.; Emer, Joel S.; Jaleel, Aamer; Parashar, Angshuman; Pellauer, Michael I., "Executing distributed memory operations using processing elements connected by distributed channels"
Inventors:
Ahsan, Bushra; Adler, Michael C.; Crago, Neal C.; Emer, Joel S.; Jaleel, Aamer; Parashar, Angshuman; Pellauer, Michael I.
Abstract:
A technology for implementing a method for distributed memory operations. A method of the disclosure includes obtaining distributed channel information for an algorithm to be executed by a plurality of spatially distributed processing elements. For each distributed channel in the distributed channel information, the method further associates one or more of the plurality of spatially distributed processing elements with the distributed channel based on the algorithm.
Assignee:
INTEL CORP
Filing Date:
17 June 2019
Grant Date:
01 December 2020
Patent Classes:
Current International Class:
G06F0131600000, G06F0120600000
10.
Corbal, Jesus; Sharma, Rohan; Steely, Jr., Simon; Ashok, Chinmay; Glossop, Kent D.; Bradford, Dennis; Caprioli, Paul; Huot, Louise; ChoFleming, Kermin; Tannenbaum, Barry, "Apparatuses, methods, and systems for swizzle operations in a configurable spatial accelerator"
Inventors:
Corbal, Jesus; Sharma, Rohan; Steely, Jr., Simon; Ashok, Chinmay; Glossop, Kent D.; Bradford, Dennis; Caprioli, Paul; Huot, Louise; ChoFleming, Kermin; Tannenbaum, Barry
Abstract:
Systems, methods, and apparatuses relating to swizzle operations and disable operations in a configurable spatial accelerator (CSA) are described. Certain embodiments herein provide for an encoding system for a specific set of swizzle primitives across a plurality of packed data elements in a CSA. In one embodiment, a CSA includes a plurality of processing elements, a circuit switched interconnect network between the plurality of processing elements, and a configuration register within each processing element to store a configuration value having a first portion that, when set to a first value that indicates a first mode, causes the processing element to pass an input value to operation circuitry of the processing element without modifying the input value, and, when set to a second value that indicates a second mode, causes the processing element to perform a swizzle operation on the input value to form a swizzled input value before sending the swizzled input value to the operation circuitry of the processing element, and a second portion that causes the processing element to perform an operation indicated by the second portion the configuration value on the input value in the first mode and the swizzled input value in the second mode with the operation circuitry.
Assignee:
INTEL CORP
Filing Date:
30 March 2019
Grant Date:
27 October 2020
Patent Classes:
Current International Class:
G06F0093000000, G06F0158200000
11.
ChoFleming, Kermin; Steely, Jr., Simon; Glossop, Kent, "Apparatuses, methods, and systems for in-network storage in a configurable spatial accelerator"
Inventors:
ChoFleming, Kermin; Steely, Jr., Simon; Glossop, Kent
Abstract:
Systems, methods, and apparatuses relating to in-network storage for a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator includes a plurality of processing elements; a circuit switched interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the circuit switched interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform an operation by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements; and an in-network storage element of the circuit switched interconnect network comprising a queue coupled to an output queue of a first processing element, and a controller that switches the in-network storage element into a first mode that provides a value stored in the queue of the in-network storage element by the output queue of the first processing element to an input queue of a second processing element when a configuration value is a first value, and into a second mode that bypasses the queue of the in-network storage element and provides a value from the output queue of the first processing element to the input queue of the second processing element when the configuration value is a second value.
Assignee:
INTEL CORP
Filing Date:
29 December 2018
Grant Date:
09 June 2020
Patent Classes:
Current International Class:
G06F0030000000, G06F0132400000, H04L0290800000, H04L0129330000
12.
Vorbach, Martin; Becker, Jurgen; Weinhardt, Markus; Baumgarte, Volker; May, Frank, "Integrated data processing core and array data processor and method for processing algorithms"
Inventors:
Vorbach, Martin; Becker, Jurgen; Weinhardt, Markus; Baumgarte, Volker; May, Frank
Abstract:
An integrated data processing core and a data processor are provided on a single integrated circuit and command sequences are forwarded from the data processing core to be executed on the array data processor wherein the command sequences comprise a group of instructions defining an algorithm.
Assignee:
PACT XPP SCHWEIZ AG
Filing Date:
27 October 2015
Grant Date:
03 March 2020
Patent Classes:
Current International Class:
G06F0120000000, G06F0158000000, G06F0093000000, G06F0120875000, G06F0120862000, G06F0093800000, G06F0157800000, G06F0120840000, G06F0093450000
13.
Fleming, Jr., Kermin Elliott; Steely, Jr., Simon C.; Glossop, Kent D., "Memory ordering in acceleration hardware"
Inventors:
Fleming, Jr., Kermin Elliott; Steely, Jr., Simon C.; Glossop, Kent D.
Abstract:
An integrated circuit includes a memory interface, coupled to a memory to store data corresponding to instructions, and an operations queue to buffer memory operations corresponding to the instructions. The integrated circuit may include acceleration hardware to execute a sub-program corresponding to the instructions. A set of input queues may include an address queue to receive, from the acceleration hardware, an address of the memory associated with a second memory operation of the memory operations, and a dependency queue to receive, from the acceleration hardware, a dependency token associated with the address. The dependency token indicates a dependency on data generated by a first memory operation of the memory operations. A scheduler circuit may schedule issuance of the second memory operation to the memory in response to the dependency queue receiving the dependency token and the address queue receiving the address.
Assignee:
INTEL CORP
Filing Date:
30 December 2016
Grant Date:
25 February 2020
Patent Classes:
Current International Class:
G06F0120000000, G06F0093000000, G06F0030600000, G06F0093800000
14.
Fleming, Jr., Kermin E.; Zou, Ping; Diamond, Mitchell; Keen, Benjamin, "Apparatus, methods, and systems for conditional queues in a configurable spatial accelerator"
Inventors:
Fleming, Jr., Kermin E.; Zou, Ping; Diamond, Mitchell; Keen, Benjamin
Abstract:
Systems, methods, and apparatuses relating to conditional queues in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator includes a first output buffer of a first processing element coupled to a first input buffer of a second processing element and a second input buffer of a third processing element via a data path that is to send a dataflow token to the first input buffer of the second processing element and the second input buffer of the third processing element when the dataflow token is received in the first output buffer of the first processing element; a first backpressure path from the first input buffer of the second processing element to the first processing element to indicate to the first processing element when storage is not available in the first input buffer of the second processing element; a second backpressure path from the second input buffer of the third processing element to the first processing element to indicate to the first processing element when storage is not available in the second input buffer of the third processing element; and a scheduler of the second processing element to cause storage of the dataflow token from the data path into the first input buffer of the second processing element when both the first backpressure path indicates storage is available in the first input buffer of the second processing element and a conditional token received in a conditional queue of the second processing element from another processing element is a true conditional token.
Assignee:
INTEL CORP
Filing Date:
03 April 2018
Grant Date:
18 February 2020
Patent Classes:
Current International Class:
G06F0093800000, G06F0134000000, G06F0158000000, G06F0094480000
15.
Fleming, Jr., Kermin E.; Zou, Ping; Diamond, Mitchell, "Apparatus, methods, and systems for multicast in a configurable spatial accelerator"
Inventors:
Fleming, Jr., Kermin E.; Zou, Ping; Diamond, Mitchell
Abstract:
Systems, methods, and apparatuses relating to multicast in a configurable spatial accelerator are described. In one embodiment, an accelerator includes a first output buffer of a first processing element coupled to a first input buffer of a second processing element and a second input buffer of a third processing element; and the first processing element determines that it was able to complete a transmission in a previous cycle when the first processing element observed for both the second processing element and the third processing element that either a speculation value was set to a value to indicate a dataflow token was stored in its input buffer (e.g., as indicated by a reception value (e.g., bit)) or a backpressure value was set to a value to indicate that storage is to be available in its input buffer before dequeuing the dataflow token from the first output buffer.
Assignee:
INTEL CORP
Filing Date:
30 December 2017
Grant Date:
18 February 2020
Patent Classes:
Current International Class:
G06F0131600000
16.
Fleming, Jr., Kermin E.; Glossop, Kent D.; Steely, Jr., Simon C.; Tang, Jinjie; Gara, Alan G., "Processors, methods, and systems with a configurable spatial accelerator"
Inventors:
Fleming, Jr., Kermin E.; Glossop, Kent D.; Steely, Jr., Simon C.; Tang, Jinjie; Gara, Alan G.
Abstract:
Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform a second operation when an incoming operand set arrives at the plurality of processing elements.
Assignee:
INTEL CORP
Filing Date:
30 December 2016
Grant Date:
11 February 2020
Patent Classes:
Current International Class:
G06F0120800000, G06F0093000000, G06F0093800000, G06F0120862000, G06F0120842000, G06F0120875000
17.
Elmufdi, Beshara, "Carry chain logic in processor based emulation system"
Inventors:
Elmufdi, Beshara
Abstract:
Disclosed herein is an apparatus and method for emulating hardware. The apparatus includes a data array configured to store input data for an emulation cycle and a carry chain coupled to the data array receives one or more inputs from the data array. The carry chain is configured to generate output data in response to performing an arithmetic operation by a set of configurable logic gates using the one or more inputs in a pre-determined number of clock cycles. One or more processors are coupled to the carry chain and the data array, and are configured to emulate a logic gate function using at least the input data from the data array or the output data from the carry chain.
Assignee:
CADENCE DESIGN SYSTEMS INC
Filing Date:
08 March 2016
Grant Date:
14 January 2020
Patent Classes:
Current International Class:
G06F0094550000
18.
Fleming, Kermin; Glossop, Kent D.; Steely, Jr., Simon C., "Processors, methods, and systems with a configurable spatial accelerator"
Inventors:
Fleming, Kermin; Glossop, Kent D.; Steely, Jr., Simon C.
Abstract:
Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a synchronizer circuit coupled between an interconnect network of a first tile and an interconnect network of a second tile and comprising storage to store data to be sent between the interconnect network of the first tile and the interconnect network of the second tile, the synchronizer circuit to convert the data from the storage between a first voltage or a first frequency of the first tile and a second voltage or a second frequency of the second tile to generate converted data, and send the converted data between the interconnect network of the first tile and the interconnect network of the second tile
Assignee:
INTEL CORP
Filing Date:
01 July 2017
Grant Date:
24 December 2019
Patent Classes:
Current International Class:
G06F0134200000, G06F0095000000, G06F0158200000
19.
Fleming, Kermin E.; Steely, Simon C.; Glossop, Kent D., "Memory circuits and methods for distributed memory hazard detection and error recovery"
Inventors:
Fleming, Kermin E.; Steely, Simon C.; Glossop, Kent D.
Abstract:
Methods and apparatuses relating to distributed memory hazard detection and error recovery are described. In one embodiment, a memory circuit includes a memory interface circuit to service memory requests from a spatial array of processing elements for data stored in a plurality of cache banks; and a hazard detection circuit in each of the plurality of cache banks, wherein a first hazard detection circuit for a speculative memory load request from the memory interface circuit, that is marked with a potential dynamic data dependency, to an address within a first cache bank of the first hazard detection circuit, is to mark the address for tracking of other memory requests to the address, store data from the address in speculative completion storage, and send the data from the speculative completion storage to the spatial array of processing elements when a memory dependency token is received for the speculative memory load request.
Assignee:
INTEL CORP
Filing Date:
01 July 2017
Grant Date:
24 December 2019
Patent Classes:
Current International Class:
G11C0150000000, G06F0157800000, G06F0093000000, G06F0093800000
20.
Fleming, Kermin; Glossop, Kent D.; Steely, Jr., Simon C., "Processors, methods, and systems for a memory fence in a configurable spatial accelerator"
Inventors:
Fleming, Kermin; Glossop, Kent D.; Steely, Jr., Simon C.
Abstract:
Systems, methods, and apparatuses relating to a memory fence mechanism in a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform a plurality of operations, each by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements. The processor also includes a fence manager to manage a memory fence between a first operation and a second operation of the plurality of operations.
Assignee:
INTEL CORP
Filing Date:
28 September 2017
Grant Date:
03 December 2019
Patent Classes:
Current International Class:
G06F0132800000, G06F0120811000, G06F0120813000
21.
Fleming, Jr., Kermin Elliott; Steely, Jr., Simon C.; Glossop, Kent D., "Runtime address disambiguation in acceleration hardware"
Inventors:
Fleming, Jr., Kermin Elliott; Steely, Jr., Simon C.; Glossop, Kent D.
Abstract:
An integrated circuit includes a processor to execute instructions and to interact with memory, and acceleration hardware, to execute a sub-program corresponding to instructions. A set of input queues includes a store address queue to receive, from the acceleration hardware, a first address of the memory, the first address associated with a store operation and a store data queue to receive, from the acceleration hardware, first data to be stored at the first address of the memory. The set of input queues also includes a completion queue to buffer response data for a load operation. A disambiguator circuit, coupled to the set of input queues and the memory, is to, responsive to determining the load operation, which succeeds the store operation, has an address conflict with the first address, copy the first data from the store data queue into the completion queue for the load operation.
Assignee:
INTEL CORP
Filing Date:
30 December 2016
Grant Date:
12 November 2019
Patent Classes:
Current International Class:
G06F0030600000, G06F0120000000, G06F0131600000, G06F0093800000, G06F0093000000
22.
Fleming, Kermin; Glossop, Kent D.; Steely, Jr., Simon C., "Processors and methods with configurable network-based dataflow operator circuits"
Inventors:
Fleming, Kermin; Glossop, Kent D.; Steely, Jr., Simon C.
Abstract:
Systems, methods, and apparatuses relating to configurable network-based dataflow operator circuits are described. In one embodiment, a processor includes a spatial array of processing elements, and a packet switched communications network to route data within the spatial array between processing elements according to a dataflow graph to perform a first dataflow operation of the dataflow graph, wherein the packet switched communications network further comprises a plurality of network dataflow endpoint circuits to perform a second dataflow operation of the dataflow graph.
Assignee:
INTEL CORP
Filing Date:
01 July 2017
Grant Date:
05 November 2019
Patent Classes:
Current International Class:
H04L0127210000, H04L0128010000, H04L0128630000, H04L0129350000, H04L0129370000
23.
Fleming, Jr., Kermin; Steely, Jr., Simon C.; Glossop, Kent D., "Processors and methods for pipelined runtime services in a spatial array"
Inventors:
Fleming, Jr., Kermin; Steely, Jr., Simon C.; Glossop, Kent D.
Abstract:
Methods and apparatuses relating to pipelined runtime services in spatial arrays are described. In one embodiment, a processor includes processing elements; an interconnect network between the processing elements; a first configuration controller coupled to a first subset of the processing elements; and a second configuration controller coupled to a second, different subset of the processing elements, the first configuration controller and the second configuration controller are to configure the first subset and the second, different subset according to configuration information for a first context, and, for a context switch, the first configuration controller is to configure the first subset according to configuration information for a second context after pending operations of the first context are completed in the first subset and block second context dataflow into the second, different subset's input from the first subset's output until pending operations of the first context are completed in the second, different subset.
Assignee:
INTEL CORP
Filing Date:
01 July 2017
Grant Date:
05 November 2019
Patent Classes:
Current International Class:
G06F0157800000, G06F0158000000, G06F0158200000
24.
Fleming, Jr., Kermin E.; Diamond, Mitchell; Zou, Ping; Keen, Benjamin, "Apparatuses, methods, and systems for integrated control and data processing in a configurable spatial accelerator"
Inventors:
Fleming, Jr., Kermin E.; Diamond, Mitchell; Zou, Ping; Keen, Benjamin
Abstract:
Systems, methods, and apparatuses relating to integrated control and data processing in a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; a network between the plurality of processing elements to transfer values between the plurality of processing elements; and a first processing element of the plurality of processing elements including a first plurality of input queues having a first width coupled to the network, a second plurality of input queues having a second, larger width coupled to the network, at least one first output queue having the first width coupled to the network, at least one second output queue having the second, larger width coupled to the network, a first operation circuitry coupled to the first plurality of input queues having the first width, a second operation circuitry coupled to the second plurality of input queues having the second, larger width, and a configuration register within the first processing element to store a configuration value that causes the first operation circuitry to perform a second operation on values from the first plurality of input queues to create a first resultant value, and when the first resultant value is a first value, the second operation circuitry is to perform a third operation on values from the second plurality of input queues to create a second resultant value and store the second resultant value in the at least one second output queue.
Assignee:
INTEL CORP
Filing Date:
30 June 2018
Grant Date:
29 October 2019
Patent Classes:
Current International Class:
G06F0090000000, G06F0134000000, G06F0131600000
25.
Fleming, Kermin; Glossop, Kent D.; Steely, Jr., Simon C.; Tang, Ping Tak Peter, "Processors, methods, and systems for a configurable spatial accelerator with performance, correctness, and power reduction features"
Inventors:
Fleming, Kermin; Glossop, Kent D.; Steely, Jr., Simon C.; Tang, Ping Tak Peter
Abstract:
Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform an operation when an incoming operand set arrives at the plurality of processing elements. At least one of the plurality of processing elements includes a plurality of control inputs.
Assignee:
INTEL CORP
Filing Date:
01 July 2017
Grant Date:
15 October 2019
Patent Classes:
Current International Class:
G06F0175000000, G06F0120802000, H03K0191770000, G06F0157800000, G06F0158000000, G11C0081200000
26.
Fleming, Kermin E.; Glossop, Kent D.; Steely, Simon C., "Apparatus, methods, and systems with a configurable spatial accelerator"
Inventors:
Fleming, Kermin E.; Glossop, Kent D.; Steely, Simon C.
Abstract:
Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform a second operation by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements.
Assignee:
INTEL CORP
Filing Date:
30 December 2017
Grant Date:
15 October 2019
Patent Classes:
Current International Class:
G06F0121045000
27.
Fleming, Kermin; Glossop, Kent D.; Steely, Jr., Simon C.; Sury, Samantika S., "Processors, methods, and systems for a configurable spatial accelerator with transactional and replay features"
Inventors:
Fleming, Kermin; Glossop, Kent D.; Steely, Jr., Simon C.; Sury, Samantika S.
Abstract:
Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In an embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform an atomic operation when an incoming operand set arrives at the plurality of processing elements.
Assignee:
INTEL CORP
Filing Date:
01 July 2017
Grant Date:
15 October 2019
Patent Classes:
Current International Class:
G06F0120802000, H03K0191770000, G06F0175000000, G11C0071000000, G06F0157800000, G06F0158000000, G11C0081200000
28.
Fleming, Kermin E.; Steely, Simon C.; Glossop, Kent D., "Processors and methods for privileged configuration in a spatial array"
Inventors:
Fleming, Kermin E.; Steely, Simon C.; Glossop, Kent D.
Abstract:
Methods and apparatuses relating to privileged configuration in spatial arrays are described. In one embodiment, a processor includes processing elements; an interconnect network between the processing elements; and a configuration controller coupled to a first subset and a second, different subset of the plurality of processing elements, the first subset having an output coupled to an input of the second, different subset, wherein the configuration controller is to configure the interconnect network between the first subset and the second, different subset of the plurality of processing elements to not allow communication on the interconnect network between the first subset and the second, different subset when a privilege bit is set to a first value and to allow communication on the interconnect network between the first subset and the second, different subset of the plurality of processing elements when the privilege bit is set to a second value.
Assignee:
INTEL CORP
Filing Date:
30 September 2017
Grant Date:
15 October 2019
Patent Classes:
Current International Class:
G06F0093000000, G06F0093800000, G06F0095000000, G06F0095400000
29.
Fleming, Kermin E.; Steely, Jr., Simon C.; Glossop, Kent D., "Apparatus, methods, and systems for memory consistency in a configurable spatial accelerator"
Inventors:
Fleming, Kermin E.; Steely, Jr., Simon C.; Glossop, Kent D.
Abstract:
Methods and apparatuses relating to consistency in an accelerator are described. In one embodiment, request address file (RAF) circuits are coupled to a spatial array by a first network, a memory is coupled to the RAF circuits by a second network, a RAF circuit is to not issue, into the second network, a request to the memory marked with a program order dependency on a previous request until receiving a first token generated by completion of the previous request to the memory by another RAF circuit, and a second RAF circuit is to not issue, into the second network, a second request to the memory marked with a program order dependency on a first request until receiving a second token sent by a first RAF circuit when a predetermined time period has lapsed since the first request was issued by the first RAF circuit into the second network.
Assignee:
INTEL CORP
Filing Date:
30 December 2017
Grant Date:
17 September 2019
Patent Classes:
Current International Class:
G06F0151600000, G06F0151730000, G06F0095400000
30.
Hasenplaugh, William C.; Fleming, Jr., Kermin E.; Fossum, Tryggve; Steely, Jr., Simon C., "Low energy consumption mantissa multiplication for floating point multiply-add operations"
Inventors:
Hasenplaugh, William C.; Fleming, Jr., Kermin E.; Fossum, Tryggve; Steely, Jr., Simon C.
Abstract:
A floating point multiply-add unit having inputs coupled to receive a floating point multiplier data element, a floating point multiplicand data element, and a floating point addend data element. The multiply-add unit including a mantissa multiplier to multiply a mantissa of the multiplier data element and a mantissa of the multiplicand data element to calculate a mantissa product. The mantissa multiplier including a most significant bit portion to calculate most significant bits of the mantissa product, and a least significant bit portion to calculate least significant bits of the mantissa product. The mantissa multiplier has a plurality of different possible sizes of the least significant bit portion. Energy consumption reduction logic to selectively reduce energy consumption of the least significant bit portion, but not the most significant bit portion, to cause the least significant bit portion to not calculate the least significant bits of the mantissa product.
Assignee:
INTEL CORP
Filing Date:
01 October 2016
Grant Date:
03 September 2019
Patent Classes:
Current International Class:
G06F0074870000, G06F0075440000