# SpikeOnChip : A Custom Embedded Platform for Neuronal Activity Recording and Analysis

Rick Wertenbroek<sup>(D)</sup>, Yann Thoma<sup>(D)</sup>, Flavio Maurizio Mor<sup>(D)</sup>, Sara Grassi<sup>(D)</sup>, Marc Olivier Heuschkel<sup>(D)</sup>, Adrien Roux<sup>(D)</sup>, and Luc Stoppini

Abstract-In this paper we present SpikeOnChip, a custom embedded platform for neuronal activity recording and online analysis. The SpikeOnChip platform was developed in the context of automated drug testing and toxicology assessments on neural tissue made from human induced pluripotent stem cells. The system was developed with the following goals: to be small, autonomous and low power, to handle micro-electrode arrays with up to 256 electrodes, to reduce the amount of data generated from the recording, to be able to do computation during acquisition, and to be customizable. This led to the choice of a Field Programmable Gate Array System-On-Chip platform. This paper focuses on the embedded system for acquisition and processing with key features being the ability to record electrophysiological signals from multiple electrodes, detect biological activity on all channels online for recording, and do frequency domain spectral energy analysis online on all channels during acquisition. Development methodologies are also presented. The platform is finally illustrated in a concrete experiment with bicuculline being administered to grown human neural tissue through microfluidics, resulting in measurable effects in the spike recordings and activity. The presented platform provides a valuable new experimental instrument that can be further extended thanks to the programmable hardware and software.

*Index Terms*—Biomedical device, FPGA, low-power, microelectrode array, neural cell interface, neural spikes, online analysis, system-on-chip bioinstrumentation.

## I. INTRODUCTION

**I** N the last decades, there has been an increased interest for *in vitro* approaches in the field of drug discovery and toxicity testing [1]. This is a key issue for companies that develop new medical drugs. Thanks to the discovery of Induced Pluripotent Stem cells (iPS-cells) [2], to the progress in computer performance, and to the advances in micro-fabrication technologies,

Manuscript received March 30, 2021; revised May 28, 2021 and June 25, 2021; accepted July 1, 2021. Date of publication July 19, 2021; date of current version September 13, 2021. This work was supported by HES-SO through the R&D program Diagnostic Biochips under Grant 73396 SpikeOnChip project. This paper was recommended by Associate Editor Dr. Mehdi Kiani. (*Corresponding authors: Yann Thoma; Adrien Roux.*)

Rick Wertenbroek, Yann Thoma, and Sara Grassi are with the School of Management and Engineering Vaud (HEIG-VD), HES-SO University of Applied Sciences and Arts Western Switzerland, 2800 Delémont, Switzerland (e-mail: rick.wertenbroek@heig-vd.ch; yann.thoma@heig-vd.ch; sara.grassi@heig-vd.ch).

Flavio Maurizio Mor, Marc Olivier Heuschkel, Adrien Roux, and Luc Stoppini are with the Tissue Engineering Laboratory, HEPIA HES-SO University of Applied Sciences and Arts Western Switzerland, 2800 Delémont, Switzerland (email: fmor82@gmail.com; marc.heuschkel@hesge.ch; adrien.roux@hesge.ch; luc.stoppini@hesge.ch).

Color versions of one or more figures in this article are available at https: //doi.org/10.1109/TBCAS.2021.3097833.

Digital Object Identifier 10.1109/TBCAS.2021.3097833

miniaturized *in vitro* approaches also called "lab-on-a-chip" make faster, cheaper, and more ethical tests possible. In combination, human iPS cells allow the experiments to be more human specific than *in vitro* or *in vivo* tests on animal models. One very promising example is the use of human neural tissue grown from reprogrammed iPS cells [3]–[5]. Electrophysiological recording allows for on-line functional monitoring of neural tissue state. This approach is complementary to the traditional chemical, cytosolic, and histologic readouts and for some experiments is the only information needed. The main advantages of this type of readout is that the resulting signals are a direct and immediate representation of the neural tissue state and they account for a functional, high-level description of the neurons.

The correct understanding of neural tissue activity requires the recording of many neurons per sample, to describe not only the activity of single neurons but also the global neuronal network dynamics. In that context, the neural spikes (referred to as "spikes") are electrical events detected from the action potential of a neuron, and are of great interest. Therefore, the electrophysiological approach requires the recording of a large number of sensors simultaneously. The electrophysiological readouts are recorded from within Micro-Electrode Array (MEA) biochips, smart culture chambers integrating an array of electrodes. The electrodes allow electrical recording and stimulation of neural cells and are controlled by a data acquisition system typically made of a signal amplification stage followed by an analog to digital converter (ADC) accompanied by software for experiment control, data storage and analysis.

Although the use of engineered human neural tissue has many advantages over animal models, there are constraints on the tissue culture method. The long time of maturation (typically eight weeks of culture [6], up to several months) makes growing tissue cultures directly within MEA biochips complex to handle and expensive. The best culturing and experimental setup for such 3D tissues is to grow them at air-liquid interface, where the tissues are not immersed into nutritive medium, but remain in direct contact with air to provide sufficient oxygenation while being fed through a porous membrane [6]–[8]. Mature tissues grown on a air-liquid interface can then be placed into adapted porous substrate-based MEA biochips connected to external micro-fluidics for medium supply during experimental recordings/measurements.

One of the main research topics of the Tissue Engineering Laboratory at HEPIA (Geneva, Switzerland) [9] is the study of functional effects of new drugs and potential neurotoxicity

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

of chemicals on human brain tissue models. In order to better understand mode of actions of new chemical entities, the authors are not only interested in experimental endpoints, but also to be able to detect subtle changes induced in electrical activity profiles of neuronal networks particularly when applying low compound concentrations to the tissues.

Today, most commercial *in vitro* MEA electrophysiology data acquisition systems, e.g., from MultiChannelSystems MCS (Reutlingen, Germany) [10] or Axion Biosystems (Atlanta, USA) [11], are well suited for use with dissociated cell cultures as well as acute tissue slices from animal origin. These systems provide medium to high throughput by the use of multiwell plate format MEA devices, manage recordings of large numbers of recording channels in parallel, and provide offline data analysis software for extraction of information from the recorded data. However, due to the compact design of the plates, the use of these systems is not adapted to air-liquid interface grown live tissue.

Available wireless *in vivo* data acquisition systems, e.g., from MultiChannelSystems MCS or Deuteron Technologies (Jerusalem, Israel) [12], are composed of highly miniaturized amplifiers and data acquisition electronics connected to implantable *in vivo* probes made to be worn by e.g., rodent test animals. Recorded data is sent via Wi-Fi or radio waves to a base station that collects and sends the data to a computer. However, throughput limits the number of recording channels. Deuteron overcame this issue by recording data onto a SD card at the amplifier stage/level, which insures no data loss but adds a supplementary data copy to the computer. We have redesigned and combined these different embedded technologies to be able to plug our MEA biochips using specific connection interfaces.

The main drawbacks of all available data acquisition systems are the time consuming and cumbersome data processing that needs to be performed on very large recordings in order to extract valuable information. The need to move MEAs in and out of the data acquisition instruments for medium exchange and/or storage induces neuronal activity variability and does not allow long experimental time-courses of several weeks or months in one step. Therefore, the authors proposed and developed a novel data acquisition system well suited for long-term experimentation using air-liquid interface tissue cultures that also integrates real-time data preprocessing in order to reduce data to be saved and/or transferred to a computer.

This paper describes the data acquisition system that has been developed and focuses specifically on its key electronic parts and associated software. The full custom system for the acquisition and analysis of electrophysiological data, called SpikeOnChip, consists of a specifically tailored embedded platform, while the accompanying PC software provides an interface to the user for data visualisation and acquisition as well as control over the embedded platform to program experiments.

The overall key specifications and the resulting system are described in Section II, including functional description and hardware architecture of the SpikeOnChip system. The embedded platform is based on a Xilinx Zynq-7020 SoC [13] featuring an ARM-based microprocessor and a Field Programmable Gate

Array (FPGA). The system architecture is explained in Section III, comprising the processing implemented in the FPGA and in the ARM processor, as well as the accompanying PC software. The signal processing executed in the FPGA is explained in Section IV. Section V gives the development methods and tools, and Section VI provides FPGA usage and power consumption results. This section also displays an example experiment illustrating and validating the SpikeOnChip system. Finally, conclusion and future work are given in Section VII.

## II. OVERALL SYSTEM DESCRIPTION AND HARDWARE ARCHITECTURE

Engineered human neural tissues are small human-derived iPS cell neuropsheres/minibrains [6]. These tissues, even though the cellular brain organization is not reproduced, incorporate all the key types of cells and present a behaviour similar to human brain tissue, making it a good biological model for the study of brain-related diseases and toxicology screening [6], [14]–[16].

As mentioned previously, these tissues are grown using airliquid interface culturing techniques prior to experimental usage. The tissues can easily be transferred onto adequate MEA biochips using small porous PTFE membrane patches that are placed onto the working area of the MEA biochips (Fig. 1(B)). However, it has been found out that such tissues are very sensitive to environmental changes. This is a major constraint for the conduction of the experiments as the tissue environment should not change during the experiment. Change in external conditions could lead to biological activity pattern artifacts and confound effects of the drug or chemical compound under test. In order to solve this issue, the MEA biochips as well as the signal amplification, analog to digital conversion, and SpikeOnChip electronics should be located in a controlled chamber/incubator. Furthermore, the chamber should not be opened for e.g., culture medium exchange, as a change in gas concentrations already induces variations in biological activity patterns.

It results that the whole electronics setup around the MEA biochip needs to be: 1) autonomous from a functional and power supply point of view, 2) should be accessible with remote control to change experimental parameters, and 3) should be able to store the data securely and locally.

From an experimental point of view, the overall data acquisition system also needs to include a continuous perfusion system for the constant feeding needs of the tissues and the possibility to visualize the tissues inside the incubator in real-time using a camera.

To properly record the action potentials or spikes generated by neurons (whose duration is around 1 ms), we need a sampling frequency of a least 20 kHz, whereas 50 kHz would be optimal in order to be very accurate on signal shape. Spike amplitude, polarity and shape are required for potential offline data-based cell-sorting, to help to discriminate the different cellular origins of the signal recorded at each micro-electrode and to evaluate effects of drugs or compounds on cells more specifically.

Recordings are typically achieved using MEA biochips integrating up to 256 electrodes. An electrode array of 8 recording electrodes is typically necessary to provide sufficient biological



Fig. 1. (A) Custom made Micro-Electrode Array (MEA) biochip composed of three parts: at left, a fluidic channel for biological sample perfusion, at center, the recording sites including electrodes and at right, a printed circuit board for connection to signal amplifier, analog to digital conversion and embedded platform. (B) Magnified view of the MEA biochip recording sites that can accept four neural tissues. It is composed of four independent recording sites of 8 recording ( $\emptyset$  30  $\mu$ m) and 3 large reference electrodes. (C) View of the SpikeOnChip embedded platform in a 3D printed case. Black arrow: commercial 32-channel signal amplifier and ADC from Intan Technologies. Blue arrow: small interconnection PCB for electrical connection between signal amplifier and MEA biochip. (D) View of setup for pharmacological expreiments through microfluidics located in an incubator. It includes a fluidics module (right) for control of the fluidic perfusion system and a custom-made vision module (left) for the optical follow-up of the 3D neural tissues. Black bars show scale.

data to validate tissue electrical activity patterns and its evolution over time. In this work, it has been decided to use MEA biochips integrating 32 electrodes and to have the possibility to record up to 8 MEA biochips at a time using a single SpikeOnChip embedded system. Individual recordings in an experiment typically last for 15 minutes up to several hours depending on the type of experiment.

The resulting volume of data generated by the electrophysiology recordings is substantial. For example, at a sampling frequency of 30 kHz and a resolution of 16 bits per sample, 166 GBytes would be needed to record 256 channels during 3 hours. In general, the electrophysiology data are acquired and transmitted to a PC to be analyzed offline. Another advantageous solution is to save the data directly in on-board SD cards, to allow stand-alone mode and to increase reliability in the case of connection loss with the PC.

Writing and transferring large data sets to the PC, as well as analyzing them offline is labor intensive, cumbersome, and slow. To overcome this, part of the processing can be done "on-line" and "on-chip," e.g., by cutting out short windows containing the interesting biological signals than can then be transmitted, stored, and analyzed offline. Using cut-out windows of 5 ms and assuming that a population of neurons is firing at a mean frequency of 10 Hz, by applying a selective extraction of spikes from the raw data, we decrease the number of samples by a factor of approximately 20. This storage reduction of about 95% does allow the system to run autonomously for longer time durations and speeds up data transfers and offline processing in the PC.

One other feature of the acquisition system should be to perform real-time online frequency analysis providing higher content information about modification of biological activity patterns. Indeed, insight into the neuronal network activity, i.e. synchronous electrical activity within the tissue, can be obtained from the local field potential hidden in the raw data. To analyse the different frequency domains that contain physiological information about the neuronal network, a series of band pass filters, which allow to separate different physiological frequency bands or rhythms to evaluate their amplitude and thus their relative contribution to network activity, are used. These frequency bands can be set to brain oscillation frequency ranges  $\delta$  (1-4 Hz),  $\vartheta$  (4-8 Hz),  $\alpha$  (8-13 Hz),  $\beta$  (13-30 Hz), and  $\gamma$  (30-100 Hz) commonly observed in human electroencephalography and are believed to play an active role in neural communication [17], [18]. One application of this feature is the analysis of bursting activity periods within the neural tissues.

To best fit the previously mentioned requested experimental requirements, a novel system for the data acquisition and analysis thus needed to be developed. Its key specifications are the integration of the following important features: it should (1) use cost-effective off the shelf parts and allow autonomous battery operation; (2) be fully customizable to the needs of the experiments; (3) allow further improvements, such as increasing the number of recording channels, and integrate more advanced data processing and analysis; and (4) allow the deployment of each of the different data processing blocks into the part of the system that is more suitable for achieving optimal performance. On-chip spike detection has been explored on FPGA or ASIC platforms [19]-[23], but solutions rely on custom boards or ASICs and do not integrate on-line network activity analysis, e.g., in frequency domain. The authors of [24] present a 32channel solution with off the shelf parts but the platform is limited in FPGA size and processing power making further developments difficult.

Our experimental setup consists of custom 32-channel MEA biochips, a commercially available signal amplifier ADC, the



Fig. 2. Block diagram of the SpikeOnChip embedded platform, within the complete system, from electrode to analysis on PC.

SpikeOnChip embedded platform itself in a 3D printed case, depicted in Fig. 1(A), (B), (C); A MEA biochip holder including a visualization module, a microfluidics system to host and support the neuronal cell cultures as shown in Fig. 1(D); and a control software running on a traditional PC. Fig. 2 shows the block diagram of the SpikeOnChip system. The main components of the system are the following.

Micro-Electrode Array (MEA): The in-house developed MEA biochip corresponds to the air-liquid culture chamber as well as the electrical sensors allowing the monitoring of the functional activity of the neural tissues. The tissues are placed on a porous polyimide membrane (thickness of 8  $\mu$ m and equivalent porosity of 10% of the area generated by  $\emptyset$ 7.5  $\mu$ m holes on a 20  $\mu$ m grid etched through the membrane) incorporating thinfilm platinum wires (thickness of 150 nm) and platinum black coated electrodes ( $\emptyset$ 30  $\mu$ m, 200  $\mu$ m grid, impedance below 100 k $\Omega$  at 1 kHz). It includes four recording sites with eight recording and three large reference electrodes each (Fig. 1(B)). The polyimide membrane is mounted onto a printed circuit board using conductive glue allowing the electrodes to be connected to the external signal amplification and data acquisition electronics. Under this membrane, a fluidic channel allowing perfusion of nutrients and test compounds is built in several layers of PMMA plastic (Fig. 1(A), (B)).

*Electrophysiology interface for data acquisition:* We use the RHD2132 Digital Electrophysiology interface chip from Intan Technologies (Los Angeles, USA) [25]. On one side of the Intan chip, we connect 32 electrodes (channels) from the MEA biochip. The Intan chip amplifies and converts the voltages from the electrode array into a digital data stream which is sent to the FPGA of the Zynq-7020 through a four-wire SPI interface with Low-Voltage Differential Signaling (LVDS). This connection is done using the RHD2000 SPI Interface Cable [26]. The RHD2132 chip is set up to sample each of the 32 electrodes at a sampling frequency of 30 kS/s with a resolution of 16 bits. At the core of the RHD2132 chip is an array of low-noise amplifiers with integrated analog filters that can be used to isolate frequencies of interest and minimize aliasing. We have set the amplifier passband from 0.1 Hz to 5.0 kHz (1st order high-pass, 3 rd order Butterworth low-pass). The RHD2132 chip can also perform digital signal processing (DSP) offset removal from all

channels using a first order high-pass IIR filter, which was not used in our application.

Embedded Platform: Our embedded platform is based on the Microzed board [27] equipped with a Xilinx Zyng-7020 SoC [13] containing a dual-core ARM9 Processing System (PS) running an embedded Linux kernel, and FPGA Programmable Logic (PL) in a single package. The MicroZed board is placed on a MicroZed Breakout Carrier Card [28] to provide I/O pin accessibility to connect up to 8 32-channel Intan chips for a total of 256 channels. The FPGA is connected to the electrophysiology interface chip via SPI and acts as a master sending commands to the chip for duties such as initiating the Intan self-calibration, reading-writing the Intan configuration registers, and initiating the analog to digital conversions. The FPGA receives the raw digital data stream from the Intan ADCs and performs on-line analysis operations. The ARM side of the SoC manages data transmission and recording and acts as a webserver/streaming source to communicate with a PC through a wired ethernet connection or wirelessly through Wi-Fi. The embedded platform is equipped with a microSD card, accessible from the ARM processor to store recordings (e.g., when the platform is used autonomously) (Fig. 1(C)).

*Full Setup:* The custom-made perfusion system coined Dual-Hub allows to connect the MEA biochip and is used to perform autonomous long-/short-term experiments such as pharmaco-logical experiments. The DualHub has two main modules, one for visual monitoring and one for perfusion. A camera, lens, and LED light diffuse source are used to visually monitor the four neural tissues (Fig. 1(D) at left). A custom made PCB with microcontroller and motor drivers is used to control the perfusion experiments controlled through a custom graphical user interface (GUI). Two peristaltic pumps and two solenoid valves allow perfusion of either the control medium or the molecules to be tested (Fig. 1(D) at right).

*Host PC:* A custom application with a GUI was developed for the host PC. This application is used for three main tasks: 1) Control and configuration of the embedded system 2) Visualization of on-line (streaming) and off-line (recorded) data and signals. 3) Experiment management, e.g., to program a recording of 15 minutes every hour.

## **III. SYSTEM ARCHITECTURE**

The system architecture closely matches the block diagram of the system displayed in Fig. 2 and the embedded system processing tasks are shown in the upper diagram. The electrodes are sampled by the Intan Chips, and the data are retrieved by the FPGA. Within the FPGA the data go through three different paths, generating three types of data that are timestamped and transmitted to the ARM processor: Unprocessed raw data, spike window cut-outs, and Local Field Potential Analysis (LFPA) values. The ARM processor manages these data, saving them to the SD card, streaming them to the host PC, or discarding them, according to the configuration received from the host PC.

In Subsection III-A to III-C, we describe the inner workings of each component of the system, their communication and the dataflow, starting from the data acquisition up to the host PC.



Fig. 3. Data flow diagram inside the FPGA side of the embedded platform.

#### A. FPGA Programmable Logic (PL) Firmware

Fig. 3 shows the basic data flow inside the FPGA. Each Intan chip is managed by a controller (up to 8 chips). The external DDR memory is accessible by the PL and the PS. The data flows from the Intan controllers through an AXI-Stream. The data consists of the 16-bit samples and identification metadata (electrode number and Intan chip number of origin). The data flow of all channels  $256 \times 30$  kS/s = 7.68 MS/s is easily handled by the FPGA running at 100 MHz.

The raw data stream is separated in three flows. The first is for recording or visualization of the raw signal, analogous to the operation of an oscilloscope. The second goes through a spike detection block that generates cutouts of 5 ms windows (150 samples) around detected events within the signal. Cutouts are analogous to the resulting image of single-shot acquisition on an oscilloscope. The third flow goes through the LFPA analysis block, which computes the energy spectral density in frequency bands of interest, analogous to a spectrum analyzer.

The resulting output data streams from all three flows (raw, windows, energy analysis) are timestamped in order to allow coherent temporal interpretation, because window cutouts appear sporadically and the LFPA analysis has a frequency band dependent latency. The resulting data is stored in a circular buffer of  $2 \times 4096$  bytes ( $2 \times 4$  kB), separate for each data type. For each stream, there is an associated Direct Memory Access (DMA) module that transfers the data to a dedicated zone in DDR memory when a buffer is filled, and once the transfer to DDR is done the DMA interrupts the PS so it can handle the data. All DMA transfers are done as 4 kB bursts on AXI High-Performance ports (HP0-2) which have direct access to the DDR controller, completely bypassing the PS, allowing for best transfer speeds.

## B. ARM Processing System (PS) Software

The PS application runs on an embedded Linux system (kernel 4.9.0). At startup a custom driver initializes the embedded platform: it programs the FPGA and configures it with default parameters, it allocates contiguous memory spaces in DDR3 for the DMA transfers, and it registers callback functions that manage the interrupts from the FPGA, one per type of data. Finally, the userspace management program is launched. It initializes the TCP sockets used to stream data to the host PC and sets-up the REST interface that allows the PC application to control the embedded platform.

After initialization, the PS is in IDLE mode, waiting for commands from the host PC, or interrupts from the FPGA. Depending on the commands previously received from the host PC, the PS can be in one of four possible states: IDLE, WRITING data to the SD card, STREAMING data to the host PC, or BOTH writing and streaming data. The data to be written or streamed can be any combination of the three possible data flows (raw, spike, LFPA) depending on the configuration received from the host PC.

The PS can receive three different interruptions from the FPGA, according to the type of data (raw, spike, LFPA) indicating that a buffer of this type of data is ready in DDR memory, and should be dealt with. To avoid data loss due to uneven transfer speeds when writing to the SD card or the TCP sockets, we implemented an elastic buffer of 64 MB in the DDR (1 GB) for each data stream, 6 possible combinations, 3 sources (raw, spike, LFPA) and 2 sinks (SD card, socket). These buffers are filled with the data coming from the FPGA, and writing to the SD card or streaming in a socket is handled by separate threads implemented as a standard producer-consumer model in a FIFO order. Although the FPGA can handle up to 256 channels, the writing of raw data to the SD card is limited and depends on the SD card speed. We formatted the data partition of the SD card with EXT4 and disabled journaling to maximize speed. This allowed for recording of raw data for at least 32 channels. Other possibilities are to use a more flash-friendly filesystem such as YAFFS2, and pre-filling the card with zeros when possible. However, raw data only serve for validation, the cutouts windows being the main regions of interest.

Besides handling data flow, the PS sets the FPGA parameters and execution flow via an AXI bus, writing information to memory mapped registers of the different controllers in the FPGA: a) The Intan controller manages an Intan chip configuration and the data acquisition, b) the Window recording controller manages the processing and the transfer of the window cut-outs, and c) the LFPA controller manages the LFPA computation and transfer of the resulting data. Thus, the data processing in the FPGA can be controlled from the PS. This allows the host PC to configure the data processing in the FPGA, by sending commands to the PS, via the REST interface.

## C. Host PC Software

The PC software provides a GUI that serves three purposes: 1) to control the acquisition system, 2) to display on-line and off-line data, and 3) to store data for further use. With this GUI the user can:

- Access a specific embedded platform over the network given its IP address.
- Configure the embedded platform,<sup>1</sup> e.g., the parameters of the spike detection (see IV-A) and of the LFPA (see IV-B), for each of the 256 channels (individually or by groups, e.g., by location on the MEA).
- Control the acquisition and the data storage.
- Set the type of data that will be acquired (raw data, spike window cut-outs, LFPA results).
- Store the acquired data to the SD card in the embedded platform. This is controlled with a start/stop button. It is also possible to schedule a given number of recordings of a given duration, separated by a given interval.
- Store the acquired data to an HDF5<sup>2</sup> file in the host PC. This is controlled with a start/stop button. It is also possible to schedule a given number of recordings of a given duration, separated by a given interval.
- Visualize on-line (real-time) data being acquired. The data is streamed from the embedded platform to the host PC over the network. The streaming is controlled with a play/pause and a stop button. The data is visualized in three different tabs, one for each type of data (raw, spike, LFPA). Each of the tabs has 32 small graphs, one for each channel, and one larger graph to zoom on a selected channel. The user can adjust the vertical and horizontal scaling of the graphs. In the visualization tab of the spikes, several spikes are superposed on the graph for each channel, to allow the user to compare the shapes of spikes observed on a specific electrode. The number of superposed spikes is 10 by default but can be changed to any value. Displayed spikes can be cleared if needed for clarity.
- Visualize off-line (recorded) data either from the embedded platform SD card either over the network or by plugging in the SD card or from an HDF5 file previously recorded by the software, with the same three tabs used for the visualization of on-line data. The offline data can be played at a chosen speed or browsed with a slider.
- Manage recording on the remote SD card, e.g., delete them or convert them to HDF5 format.

Fig. 4 shows a screenshot of the GUI software. The raw data on-line visualization with detected spike timestamps visualization is displayed on top, the spike visualization is shown in the



Fig. 4. Graphical User Interface (GUI): (A) Raw data from channels with spike events (bars) with one channel maximized (bottom). (B) Extracted spikes shown on channels (bars in A) with one channel maximized (right). (C) Corresponding calculated LFPA traces on channels (raw signal in A) with one channel maximised (bottom).

middle, and the LFPA visualization is shown at the bottom (each view has one channel maximized).

## IV. SIGNAL PROCESSING IN THE FPGA

The traditional approach is to record and store raw data from the ADCs (Intan chips) and process the data off-line on a PC. The volume of data generated in electrophysiology recordings can be colossal, approx. 6.5 GBytes per hour of recording for a 32-electrode system at 30 kS/s 16-bit precision  $(32 \times 30 \times 10^3 \times 2 \text{ bytes} \times 60 \text{ sec} \times 60 \text{ min} \approx 6.44 \times 2^{30} = 6.44 \text{ GBytes})$ . Although having the raw data allows for any kind of post-processing, the storage costs can become prohibitive for longer experiments, especially with more electrodes.

<sup>&</sup>lt;sup>1</sup>Parameters can be set and changed at any time.

<sup>&</sup>lt;sup>2</sup>HDF5 is a file format allowing to embed structured data that are then easily accessible [29].

Therefore, it was chosen to pre-process the recorded data on-line in the FPGA, making it possible to extract relevant information from the raw data stream without having to record the whole stream. The FPGA therefore provides two extra data streams: Window cutouts of 5 ms around neural spike activity and local field potential analysis. A combination of recording raw signals from a few reference electrodes and only window cut-outs and/or LFPA results for all the other electrodes is possible and may be the best approach. Real-time display of raw data is also useful to check on the status of a neural cell culture during experiments, to see if there is any activity without necessarily recording or monitoring all electrodes.

The spike detection and window cutout methods and the LFPA calculation are described below.

## A. Spike Event Detection and Window Cutout

The hardware to detect the biological activity and to cut the windows is time-multiplexed among the 256 channels, with only one channel being active at a time. This is possible through pipelining of operations and FPGA logic running at a higher clock speed (100 MHz) than the datarate (30 kHz).

Spike events are detected by applying a threshold crossing criterion. The threshold can either be a static value set by the user or be based on a real-time standard deviation multiplied by a settable constant C. When the deviation of a sample relative to the mean of the signal is bigger than the threshold, an event has been detected.

Once a spike has been detected, a window of 5 ms (150 samples) is recorded. The window can be centered anywhere around the event according to a *time offset* parameter. A 150-sample FIFO is used for every channel in order to make this possible. By default, the *time offset* is centered in the window but this can be reconfigured as a parameter through the GUI. This parameter and the above criterion constant can be set individually per channel if needed.

The threshold condition is computed individually per channel and updated every sample. This requires to have access to the mean and standard deviation of each channel for every sample. The mean and standard deviation are usually computed through a moving average and moving standard deviation, i.e., computed on a window of samples. Updating this value on a per sample basis requires to add the new sample (divided by the window size) and subtract the oldest sample (divided by the window size), as shown in 1. This only requires a single addition and subtraction but requires to store as many samples as the window size N, which becomes a burden with large window sizes, and they have to be stored for every channel. Therefore, it was chosen to approximate the moving average by a leaky integrator (exponential smoothing [30]) implemented as a first order infinite impulse response (IIR) filter  $\bar{\mathbf{x}}_t$  in 2 ( $\alpha = 1/N$ ). This has the advantage that it requires only to store a single value per channel instead of a whole window of N past samples.

$$\bar{x}_t = \frac{1}{N} \sum_{n=0}^{N-1} x_{t-n} = \bar{x}_{t-1} + \frac{1}{N} x_t - \frac{1}{N} x_{t-N}$$
(1)

$$\bar{x}_t \approx \bar{\mathbf{x}}_t = \alpha x_t + (1 - \alpha)\bar{\mathbf{x}}_{t-1} = \frac{1}{N}x_t + \left(1 - \frac{1}{N}\right)\bar{\mathbf{x}}_{t-1}$$
(2)

The leaky integrator will induce an overshoot on the computed mean value if a sample deviates too much. However, this is not a problem here because when this scenario occurs it means there was an event, and as so, the event detection will trigger and record the next samples. The detection remains inactive until the end of the recording. During this time the overshoot will be reduced through exponential decay (smoothing, leakage) producing an estimated mean with an acceptable value. The time constant can be changed through  $\alpha = 1/N$ .

The number of operations for both methods is the same (one addition, one subtraction), however, windows of any sizes N now become possible. The IIR filters used to compute the values require to be updated for every sample at 30 kHz and considering that an update only costs a single clock cycle at 100 MHz, it is possible to reuse the hardware by time multiplexing. Therefore a single instance of the IIR logic can be used for more than 3000 channels (100 MHz / 30 kHz). Intermediate values for each channel are stored in local BRAMs with single cycle read/write latency.

In order to reduce the computational costs, the squared standard deviation  $s^2$  was used instead of the deviation s to avoid a square root, changing the threshold condition to Deviation<sup>2</sup> >  $s^2 \cdot C^2$ , which is equivalent but requires to compute a square for Deviation<sup>2</sup> instead of a square root for s. The constant  $C^2$  is precomputed when C is set in the GUI.

The squared standard deviation  $s^2$  in Eq. 3 is approximated by replacing the sum of squares in the right hand side of Eq. 3 by a leaky integrator as above, but with the input  $x_t^2$ .

$$s_t^2 = \frac{1}{N} \sum_{n=0}^{N-1} (x_{t-n} - \bar{x}_t)^2 = \left(\frac{1}{N} \sum_{n=0}^{N-1} x_{t-n}^2\right) - \bar{x}_t^2 \qquad (3)$$

This results in an efficient implementation of the spike event detection mechanism. In our case N was chosen as powers of two e.g., 128 or 8192, to reduce 1/N to wiring. Computations were done with fixed point values of 64-bits with the radix at the  $32^{nd}$  bit.

### B. Local Field Potential Analysis (LFPA)

The goal of the LFPA is to follow the evolution of the energy spectral density in several frequency bands. The energy spectral density describes how the energy of a signal is distributed with frequency. Here we want to follow the evolution of energy in frequency bands, for example  $\delta$  (1-4 Hz),  $\vartheta$  (4-8 Hz),  $\alpha$  (8-13 Hz),  $\beta$  (13-30 Hz), and  $\gamma$  (30-100 Hz).

On a discrete signal, the definition of the energy spectral density is given by 4.

$$S_{xx}(f) = \frac{1}{N} \left| \sum_{n=0}^{N-1} x_n e^{-\frac{2\pi j}{N} fn} \right|^2$$
(4)

Since we are interested in the evolution over time, we can rewrite this equation as 5, where  $x_t$  is the  $t^{\text{th}}$  sample.

$$S_{xx}(f,t) = \frac{1}{N} \left| \sum_{n=0}^{N-1} x_{n+t} e^{-\frac{2\pi j}{N} fn} \right|^2$$
(5)

For each band we chose to analyze the sum of the energy spectral density of 8 frequency bins equally distributed over the frequency range of interest. The number of bins was chosen arbitrarily and considered acceptable through simulation (See V). More bins would require more computations and the requirements (256 channels) make this difficult. (6) is the resulting signal of the LFPA for a specific band.

$$P_8(t) = \sum_{k=0}^{7} S_{xx}(f_k, t), \text{ where } f_k \text{ distributed in band}$$
 (6)

The frequencies  $f_k$  are chosen to be equally distributed in the band of interest. In order to be able to do this, N is chosen specifically for each band. Each band is therefore analyzed with a different discrete Fourier transform window. This makes the comparison between bands impossible but is not a problem since we are only interested in the evolution of  $P_8(t)$  over time and not in the comparison with other bands. Therefore, each band will have a different  $N(N_{\delta}, \ldots, N_{\gamma})$  and each band will have its own  $P_8(t)$ . To avoid spectral leakage, the actual window size of the DFT is chosen to be 5 times as big as the minimal required value for the frequencies of interest, so as to have a few cycles of the lowest frequency we try to detect inside the time window of the calculation [31]. This also reduces the impact of not using a windowing function before the DFT, which would require an extra convolution step.

Because the frequencies of interest can be very low, N will be very large. Therefore, computing the DFT (e.g., via FFT) is very costly. Especially if it has to be done on multiple channels for multiple bands. In order to make it possible to follow the evolution of  $P_8(t)$  over time on up to 5 bands for up to 256 channels online (without post-processing) first, we chose to get  $P_8(t)$  on a per window basis (every  $N^{\text{th}}$  sample) instead of a per sample basis. This limitation does not impact the monitoring of the evolution of the energy spectral density of neural tissue over time because changes in bands of interest are slow. Second, the channels are decimated from 30 kHz to 2 kHz, because the bands of interest are at low frequencies. The pass band region is chosen to be 0-300 Hz ( $f_c = 300$  Hz for the low pass filter with  $f_{\text{stop}} = 1700$  Hz).

The decimator reduces the sampling frequency from 30 kHz to 2 kHz by means of two chained decimators, to reduce the size of the low pass Finite Impulse Response (FIR) filter needed to avoid aliasing. The first decimation stage goes from 30 kHz to 6 kHz, and the second stage goes from 6 kHz to the desired 2 kHz.

To compute  $P_8(t)$ , it was chosen to implement the computation of a DFT frequency bin  $X_f$  as a first order Goertzel filter [32]. The filter output  $y_t$  is equal to  $X_f$  when t = N, the register is then reset for the next computation.

If the register is reset every window (N) the output  $y_t$  of this filter is equal to  $X_f$  after the N<sup>th</sup> sample. The advantage of using this filter is that it requires a single (complex valued) register and single complex multiplicator per bin. This multiplicator can be implemented as 4 real multiplicators, 3 if optimized [33], and can be computed in a single clock cycle on the FPGA.

For a channel, the 5 analyzed bands which each have 8 computed bins therefore require 40 cycles per sample to update the temporary values used to compute the different  $X_f$ . This would allow to compute the values for up to 1250 channels  $(100 \text{ MHz}/(2 \text{ kSps} \times 40) = 1250)$  using a single complex multiplier, making this implementation very resource efficient. The limiting factor is actually the memories (BRAMs) used to store the twiddle factors  $e^{-\frac{2\pi j}{N}fn}$  and intermediate values  $z^{-1}$  (see Section VI-A).

The twiddle factors to compute  $P_8(t)$  for a given band on a given channel are precomputed on the host PC since they remain the same unless the band of interest is changed. The twiddle factors are stored as two fixed point values on 32 bits using Euler's formula  $e^{jx} = cos(x) + jsin(x)$ . Because the codomain of sin and cos is [-1, 1], the fixed point representation used has the fractional point right before the first bit, allowing to almost entirely cover [-1, 1]. Because 1 is only used for the extreme frequency bins where n = 0 or n = N and because those bins are not used in any computations of  $P_8(t)$ , this fractional point placement is the best choice. Results and intermediate values are stored as two 64-bit precision fixed point values (real and imaginary) with the fractional point at the  $32^{nd}$  bit.

Our implementation allows to set the twiddle factors independently for each of the 256 channels to compute  $P_8(t)$  on 5 individually definable bands within 0-300 Hz. If this feature is not needed, adding the restriction of using the same 5 frequency bands on all channels would reduce the required memory for the twiddle factors by a factor of 256. This would allow to target smaller FPGAs (see Section VI-A).

Each time a value of  $P_8(t)$  for a given band is generated (at different intervals since the different bands use different window sizes) it is locally stored with the corresponding meta data (channel ID, band ID, timestamp) in a 4 kB buffer awaiting transfer to the DDR as described in Section III-A.

## V. DEVELOPMENT METHODS AND TOOLS

The FPGA hardware design has been written in VHDL-2008. Synthesis and FPGA development was done using Xilinx Vivado (2016.4 - 2020.1) [34]. The following Xilinx IPs were used : FIR compiler, FIFO, DMA, and AXI interconnects. The rest of the implementation is written in VHDL and is vendor agnostic. We used QuestaSim [35] for simulation and validation of the design through testbenches. QuestaFormal has been used to formally validate part of the design. Signal processing modeling was done in Matlab and Python (Numpy, Scipy libraries). The Matlab DSP System Toolbox was used to calculate the coefficients of the decimator lowpass FIR filters. This allowed to develop and test the different DSP algorithms before porting them to the FPGA (see Section IV).

The input test signals were a combination of synthetically generated signals (sinusoids, sine sweeps, triangular, white noise, etc.), neural spikes recordings provided by the Tissue Engineering Laboratory at HEPIA [9], and signals recorded from an electrophysiological signal generator [36]. We used the same input test signals to test the design on the FPGA and compared the results with the Matlab/Python simulations, hardware simulations in Questasim and real board results. This allowed to validate the design against a known reference. For the replay of arbitrary signals from inside the FPGA, the raw data stream coming from the ADCs was replaced by a signal generator that could replay signals from memory (BRAM). This also helped to debug time multiplexing issues e.g., by generating values corresponding to the channel number as samples which would allow to identify data flow issues.

The embedded system PS firmware was developed using a host-target co-design approach, which allowed to compile the source code for both the embedded target (ARM processor) as well as for the host computer (x86 Intel processor). This allowed us to test the REST server and streaming of data without the need for the embedded platform. This was made possible through the use of a hardware abstraction layer (HAL), emulating data coming from the FPGA by reading data from files when running on the host development PC. It allowed to easily validate the TCP/IP data streams and REST interface.

The toolchains used for compiling the embedded software, the Linux kernel (4.9.0) and the bootloader (U-Boot) were generated with the buildroot tool. The embedded software was written in C and compiled with GCC. The host GUI software was written in C++ with the Qt library for OS interoperability (Windows, Linux, Mac).

As a more general approach during each stage of development, we created a simulator that could replace the part being developed: (1) The neural tissue was emulated by an electrophysiology stimulator [36], (2) in the FPGA an arbitrary signal generator was used for the ADCs, (3) the FPGA was abstracted from the software by replacing the data generation by threads reading files, and (4) the embedded platform was emulated by software running on the host. Therefore, each stage could replay pre-recorded or pre-generated data to emulate the previous stage. This allowed to develop each stage in isolation and in parallel, and also to validate each stage thanks to comparisons to a model with known data.

#### VI. RESULTS

#### A. FPGA Resource Usage

The final FPGA design resource usage is listed in Table I. Resource usage of the FPGA is low  $\approx 30\%$ , the most used resources are BRAMs, notably to store all the intermediate values and twiddle factors for the LFPA.

| TABLE I                                               |
|-------------------------------------------------------|
| RESOURCE UTILIZATION ON THE XILINX ZYNQ7020 FPGA (PL) |

| Resource                  | Utilization                |
|---------------------------|----------------------------|
| Look up Table 6-bit (LUT) | 18,245 / 53,200 (34.30 %)  |
| Flip-Flop (FF)            | 25,486 / 106,400 (23.95 %) |
| BRAM (36kb)               | 80.5 / 140 (57.50 %)       |
| DSP 48-bit                | 27 / 220 (12.27 %)         |

 TABLE II

 Power Consumption of the MicroZed Board

|                            | 4.51 V | 4.75 V | 5.00 V |
|----------------------------|--------|--------|--------|
| CPU running                | 176 mA | 331 mA | 484 mA |
| FPGA all functions enabled | 0.79 W | 1.57 W | 2.42 W |
| CPU running                | 62 mA  | 148 mA | 295 mA |
| FPGA disabled              | 0.28 W | 0.70 W | 1.48 W |

All units are DC, voltages  $\pm$  0.01 V, currents  $\pm$  2 mA

TABLE III CAPABILITIES OF THE SPIKEONCHIP SYSTEM

|                                         | RAW Data                                    | Spike Event Detection                                                                                                                                    | LFPA                                                     |
|-----------------------------------------|---------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------|
| Proces-<br>sing                         | No<br>processing<br>done                    | Up to 256 ch.<br>detection with individual<br>thresholds per ch.                                                                                         | Up to 256 ch.<br>5 bands (0-<br>300Hz) per ch.           |
| Max<br>write<br>rate<br>(to SD<br>card) | Continuous<br>32 ch.<br>16-bit at<br>30kSps | Continuous max rate is 6,400<br>events/s (150 samples/event)<br>64MB DDR buffer can sustain<br>the max event rate of 256 ch.<br>(51,200 events/s) for 4s | Not limited.<br>Can fully be<br>recorded<br>continuously |

The low resource usage is good because, first, it would allow to target a smaller (cheaper) FPGA device e.g., Zynq-7010. Second, because it means there is space left to implement future features and extra online processing (e.g., filtering) or analysis functions, which was one of the reasons for selecting this target device.

#### B. Power Consumption

Power consumption of the MicroZed board was measured at different supply voltages. The board can be powered through the USB connector or wired directly. The on-board DC-DC converters for the power supply rails (3.3 V, 1.8 V, 1.5 V, 1.0 V, and 0.75 V) can be operated with an input voltage as low as 4.5 V through USB [27]. Power was measured with two Fluke 179 TRMS multimeters and is given in Table II. The CPU only (running Linux and network, FPGA disabled) results allow to estimate FPGA consumption. Lower input voltages result in lower power because the DC-DC converters drop less voltage. Running the board at a lower voltage allows to significantly increase the battery run time. Further power consumption reduction could be achieved by bypassing the reverse polarity protection diode. Each 32-channel RHD2132 amplifier and ADC requires less than 50 mW to operate [25].

## C. Capability Summary

Table III summarizes the system capabilities. The processing capabilities can handle 256 channels for all tasks. However, not all data can be continuously recorded due to the limited write speed of the SD card. The system can nevertheless absorb bursts



Fig. 5. Effect of bicuculline on engineered 3D neural tissues derived from human pluripotent stem cells in a control situation (A) and after 10  $\mu$ M bicuculline injection (B)). (A) and (B)) typical raw data (top left), corresponding typical network activity events (top right), corresponding timestamps of events (bottom left), and corresponding calculated LFPA (bottom right), dark green = theta band, light green = alpha band and blue = beta band). Delta and gamma bands are not affected by the bicuculline. (\*electrovalve actuation artifact, \*\*bicuculline injection). (C) Representation of the normalized mean firing frequency in the control, exposure and recovery periods, a clear increase induced by the bicuculline is observed. (D) Corresponding representation of the normalized mean frequency of bursts during the experiment. (E) Representation of the normalized mean frequency of bursts during the experiment.

of events thanks to the internal buffering in RAM making it possible to record transient periods of high activity.

## D. Validation in the Field

In order to illustrate and validate the functionality of the SpikeOnChip acquisition system, experiments with a molecule referred to as bicuculline were performed to mimic epileptic activity on neural tissues and thus increase bursting activity [37]. 5 month old engineered 3D human neural tissues on pre-cut patches of membrane were placed on MEAs and exposed first, to culture medium as a control (Fig 5(A)), second to bicuculline at 10  $\mu$ M concentration for several minutes (Fig 5(B)), its electrical activity being recorded using the SpikeOnChip acquisition system during the whole duration of the pharmacological experiment.

At the beginning of an experiment, the fluidic MEA channel is filled with circulating control culture medium. After several hours of monitoring (Fig. 5(A)), the electrical activity becomes stable (control baseline) and the pharmacological experiment can start. Typical results obtained are presented in (Fig. 5(B)). During the 10 minutes of recording prior to molecule exposure, the perfusion is off and the last 3 minutes of data are used to illustrate the control period. There is spontaneous activity, i.e. single action potentials as well as the presence of burst activity, present in the neural tissues. Injection of the bicuculline at a flow rate as low as 300  $\mu$ l/min is turned on during 2 minutes and an exposure of 5 minutes without flow is defined as the exposure period. To reduce the noise induced by the perfusion, only data from the last 3 minutes of the 5 min exposure is used to illustrate and analyze the effect of bicuculline exposure as presented in Fig. 5(B) to (E).

Due to the real-time analysis of the raw data achieved by the SpikeOnChip system, the detected spike event windows and the LFPA values are directly available for further analysis and cumbersome data analysis of the raw data has not to be performed offline after the experiment. From this obtained data, it is easy and fast to compute several biological activity parameters such as the normalized mean firing frequency, the normalized number of active electrodes and the normalized mean frequency of bursts, respectively shown in Fig. 5(C) to (E). The mean firing frequency represents the number of biological spike events detected per unit of time. As expected, the presence of bicuculline increases the mean firing frequency with respect to the control period (Fig. 5(C)). The number of active electrodes (Fig. 5(D)) corresponds to the number of electrodes that recorded at least 5 spike events during 60 s ( $\approx$ 0.08 Hz). It shows a stable number of active electrodes during the whole experiment which indicates that all the neuronal activity detected remained during the experiment. The detection threshold was set to static mode for all the experiments and was set at the beginning of the experiment and manually adapted in case of signal drift. It is interesting to notice the difference in waveform shapes between typical waveform of spikes shown in Fig. 4(B) and typical

waveforms of global network activity visible in the background and shown in Fig. 5(A) and (B) (top right). This activity differs in shape and amplitude compared to background noise. To make sure that noise, e.g., white noise, was not interpreted as activity, we analyzed the correlation of the signal between the 8 electrodes from a single neural tissue against all electrodes. The mean frequency of bursts is defined as the synchronized number of biological spike events detected on each electrode in contact with the neural tissue per unit of time (see Fig. 5(A) and (B)). As expected, an increase of the bursting rate is also observed between the control and bicuculline exposure periods (Fig. 5(E)).

Recorded LFPA values (Fig. 5(A) and (B) bottom right) provide insight into the neural tissue state. The LFPA results in A) are stable. Sometimes transient high values can appear as an artifact due absence of filtering. In this particular case the noise was originating from electrovalve actuation. The higher LFPA values in the theta, alpha and beta bands in B) correspond to the increase of oscillations that appeared within the tissues in presence of bicuculline, where periodic high amplitude spikes have appeared within the bursts, as can be seen in the raw data snippet in B). This change in LFPA values reflects the higher neuronal activity in the tissue, similar to an epileptic episode.

At the end of the 5-minute exposure period, the culture medium is perfused during 4 minutes with control culture medium to wash out the remaining pharmacological molecules in the fluidics and tubes. The data analyzed for the recovery period correspond to the last 3 minutes of a 10 minutes recovery period recording directly following the washout (data not shown). As shown in Fig. 5(C) to (E), the 5 minutes exposure of bicuculline did not suppress the electrical activity within the tissue, however, the electrical activity did not come back to the same level as prior to the experiment (control period), indicating that the effect of 5 minutes exposure of 10  $\mu$ M bicuculline is not completely reversible on the short chosen recovery time period.

#### VII. CONCLUSION AND FUTURE WORK

This paper presented the development of a custom reconfigurable system for the acquisition and analysis of electrophysiological data, with emphasis on the description of the hardware and software architecture, as well as the algorithms implemented in the FPGA for online analysis.

The SpikeOnChip embedded platform achieves its goals of being cost-effective, lightweight and battery operated. By doing part of the processing "online and on-chip" close to the data acquisition, the volume of generated data for the electrophysiology recordings can be reduced to 5% of its original size by means of recording only window cutouts and LFPA results. These features allow to run longer experiments with more electrodes, autonomously and reliably, and also reduces the data storage needs and offline post-processing.

The system is operational and has been used to carry out different experiments in pharmacology, toxicology, and drug screening. It is currently used with 32-channel MEA biochips in an experimental setup with two biochips (64 electrodes, two Intan chips connected to the SpikeOnChip system). Upon acquisition of more biochips the system will be used with the full 256 channels it can support. Experiments requiring more channels can use multiple SpikeOnChip embedded platforms in parallel, multiple platforms can be independently controlled by a single instance of the GUI software running on the PC.

The SpikeOnChip system is fully customizable, in hardware thanks to the FPGA, and in software thanks to the software being developed in-house. The platform can therefore evolve and be tailored to the needs of future experiments. Extra processing steps can be added and sped-up, by deploying each step into the part of the system that is best suited for performance.

Future developments on the embedded platform include the addition of stimulation electrodes that can generate electrical signals and the inclusion of better noise rejection techniques e.g., for pump and electrovalve actuation noise. On the host side we are currently studying different machine learning methods for spike sorting and classification, currently done offline on PC. Work is also under way on accelerating these algorithms, possibly achieving real-time processing capability, by doing part of the processing in the FPGA, and part of the processing on a GPU in the host PC.

## ACKNOWLEDGMENT

The authors would like to thank all the people who have contributed to this work: Loris Gomez Baisac, Emmanuel Eggermann, Daniel Gachet, Arnaud Giner, Yannis Jeannotat, Alexandre Malki, Mike Meury, Leila Ouederni, Roland Scherwey, and David Truan.

#### REFERENCES

- J. Dunlop, M. Bowlby, R. Peri, D. Vasilyev, and R. Arias, "Highthroughput electrophysiology: An emerging paradigm for ion-channel screening and physiology," *Nature Rev. Drug Discov.*, vol. 7, no. 4, pp. 358–368, 2008.
- [2] K. Takahashi *et al.*, "Induction of pluripotent stem cells from adult human fibroblasts by defined factors," *Cell*, vol. 131, no. 5, pp. 861–872, 2007.
- [3] Y. Zhang *et al.*, "Rapid single-step induction of functional neurons from human pluripotent stem cells," *Neuron*, vol. 78, no. 5, pp. 785–798, 2013.
- [4] A. M. Paşca *et al.*, "Functional cortical neurons and astrocytes from human pluripotent stem cells in 3D culture," *Nature Methods*, vol. 12, no. 7, pp. 671–678, 2015.
- [5] A. Cota-Coronado, J. C. Durnall, N. F. Díaz, L. H. Thompson, and N. E. Díaz-Martínez, "Unprecedented potential for neural drug discovery based on self-organizing hiPSC platforms," *Molecules*, vol. 25, no. 5, 2020, Art. no. 1150.
- [6] S. Govindan, L. Batti, S. F. Osterop, L. Stoppini, and A. Roux, "Mass generation, neuron labeling, and 3D imaging of Minibrains," *Front. Bioeng. Biotechnol.*, vol. 8, 2021, Art. no. 1436.
- [7] L. Stoppini, P.-A. Buchs, and D. Muller, "A simple method for organotypic cultures of nervous tissue," *J. Neurosci. Methods*, vol. 37, no. 2, pp. 173–182, 1991.
- [8] B. Gähwiler, M. Capogna, D. Debanne, R. McKinney, and S. Thompson, "Organotypic slice cultures: A technique has come of age," *Trends Neurosci.*, vol. 20, no. 10, pp. 471–477, 1997.
- [9] Tissue Engineering Group HEPIA, Accessed: Dec. 3, 2020. [Online]. Available: https://www.hesge.ch/hepia/en/group/tissue-engineering
- [10] Multi Channel Systems GmBH, Accessed: Dec. 3, 2020. [Online]. Available: https://www.multichannelsystems.com
- [11] Axion BioSystems, Inc, Accessed: Dec. 3, 2020. [Online]. Available: https: //www.axionbiosystems.com
- [12] Deuteron Technologies Ltd, Accessed: Dec. 3, 2020. [Online]. Available: https://deuterontech.com

- [13] Zynq-7000 SoC Series. Xilinx, Accessed: Dec. 3, 2020. [Online]. Available: https://www.xilinx.com/products/silicon-devices/soc/ zynq-7000.html
- [14] M. S. Yap *et al.*, "Neural differentiation of human pluripotent stem cells for nontherapeutic applications: Toxicology, pharmacology, and in vitro disease modeling," *Stem Cells Int.*, vol. 2015, 2015, Art. no. 105172.
- [15] D. Pamies *et al.*, "A human brain microphysiological system derived from induced pluripotent stem cells to study neurological diseases and toxicity," *Altex*, vol. 34, no. 3, pp. 362–376, 2017.
- [16] Á. Apáti, N. Varga, T. Berecz, Z. Erdei, L. Homolya, and B. Sarkadi, "Application of human pluripotent stem cells and pluripotent stem cellderived cellular models for assessing drug toxicity," *Expert Opin. Drug Metab. Toxicol.*, vol. 15, no. 1, pp. 61–75, 2019.
- [17] O. Jensen, E. Spaak, and J. M. Zumer, "Human brain oscillations: From physiological mechanisms to analysis and cognition," *Magnetoencephalography: From Signals Dynamic Cortical Networks*, Cham/Zug, Switzerland: Springer, 2019, pp. 471–517.
- [18] S. R. Cole and B. Voytek, "Brain oscillations and the importance of waveform shape," *Trends Cogn. Sci.*, vol. 21, no. 2, pp. 137–149, 2017.
- [19] T. Wu et al., "A 16-channel nonparametric spike detection ASIC based on EC-PC decomposition," *IEEE Trans. Biomed. Circuits Syst.*, vol. 10, no. 1, pp. 3–17, Feb. 2016.
- [20] J. Park, G. Kim, and S.-D. Jung, "A 128-channel FPGA-based realtime spike-sorting bidirectional closed-loop neural interface system," *IEEE Trans. Neural Syst. Rehabil. Eng.*, vol. 25, no. 12, pp. 2227–2238, Dec. 2017.
- [21] Y. Liu, S. Luan, I. Williams, A. Rapeaux, and T. G. Constandinou, "A 64-channel versatile neural recording SoC with activity-dependent data throughput," *IEEE Trans. Biomed. Circuits Syst.*, vol. 11, no. 6, pp. 1344–1355, Dec. 2017.
- [22] E. A. Vallicelli *et al.*, "Neural spike digital detector on FPGA," *Electronics*, vol. 7, no. 12, pp. 392–407, 2018.
- [23] M. Tambaro, M. Bisio, M. Maschietto, A. Leparulo, and S. Vassanelli, "FPGA design integration of a 32-Microelectrodes low-latency spike detector in a commercial system for Intracortical recordings," *Digital*, vol. 1, no. 1, pp. 34–53, 2021.
- [24] I. Williams, S. Luan, A. Jackson, and T. G. Constandinou, "Live demonstration: A scalable 32-channel neural recording and real-time FPGA based spike sorting system," in *Proc. IEEE Biomed. Circuits Syst. Conf.*, 2015, pp. 1–5.
- [25] Digital Electrophysiology Interface Chips, Intan Technologies, LLC, 2013. [Online]. Available: http://intantech.com/files/Intan\_RHD2000\_ series\_datasheet.pdf
- [26] SPI Interface Cable/Connector Specification, Intan Technologies, LLC, 2019. [Online]. Available: http://intantech.com/files/Intan\_RHD2000\_ SPI\_cable.pdf
- [27] MicroZed. Avnet, Inc. Accessed: Dec. 3, 2020. [Online]. Available: http: //zedboard.org/product/microzed
- [28] MicroZed Breakout Carrier Card Zynq System-on-Module Hardware User Guide V1. 2, Avnet, Inc., 2016. [Online]. Available: http://zedboard.org/ sites/default/files/documentations/5271-UG-MBCC-BKO-V1.2.pdf
- [29] M. Folk, G. Heber, Q. Koziol, E. Pourmal, and D. Robinson, "An overview of the HDF5 technology suite and its applications," in *Proc. EDBT/ICDT Workshop Array Databases*, 2011, pp. 36–47.
- [30] C. Croarkin et al., "NIST/SEMATECH E-Handbook of statistical methods," NIST/SEMATECH, Jul. 2006. [Online]. Available: https://www.itl. nist.gov/div898/handbook/
- [31] J. G. Proakis and D. G. Manolakis, *Digital Signal Processing*, 4th ed., Hoboken, NJ, USA: Pearson, 2006.
- [32] R. G. Lyons, Understanding Digital Signal Processing, 3rd ed., Hoboken, NJ, USA: Pearson, 2010.
- [33] E. W. Weisstein, Complex Multiplication, Accessed: Dec. 3, 2020. [Online]. Available: https://mathworld.wolfram.com/ComplexMultiplication. html
- [34] Vivado Design Suite. Xilinx, Accessed: Dec. 3, 2020. [Online]. Available: https://www.xilinx.com/products/design-tools/vivado.html
- [35] Questa Advanced Simulator. Mentor, a Siemens Business, Accessed: Dec. 3, 2020. [Online]. Available: https://www.mentor.com/products/fv/ questa/
- [36] "Signal generator for the ME-Systems and the basic and advanced Wireless-Systems, multi channel systems GmBH," 2019. [Online]. Available: https://www.multichannelsystems.com/sites/multichannelsystems. com/files/documents/data\_sheets/ME-W-SG\_Datasheet.pdf

[37] G. A. Johnston, "Advantages of an antagonist: Bicuculline and other GABA antagonists," *Brit. J. Pharmacol.*, vol. 169, no. 2, pp. 328–336, 2013.



**Rick Wertenbroek** was born in Lausanne, Switzerland, in 1988. He received the Master of Science degree in information and communication technologies from the University of Applied Sciences and Arts Western Switzerland, Delémont, Switzerland, in 2020. He is currently working toward the Ph.D. degree in computational biology through a joint program between the School of Management and Engineering Vaud, Yverdon-les-Bains, Switzerland and the University of Lausanne, Lausanne, Switzerland. His current research focuses on hardware acceleration

and data compression for genomics. He joined the Reconfigurable & Embedded Digital Systems Institute in 2016 as a Research and Development Engineer and still works there part time. His research interests include NVMe computational storage, FPGA and GPU acceleration, embedded systems, and full system hardware-software cosimulation.



Yann Thoma was born in Geneva, Switzerland, in 1977. He received the M.Sc. degree in computer science and the Ph.D. degree in science from the Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland, in 2001 and 2005, respectively. After some years with the University of Geneva, Geneva, Switzerland, Group of Applied Physics, working on quantum cryptography, in parallel with a Postdoc with HEIG-VD, he became a Professor with the University of Applied Sciences and Arts Western Switzerland, Delémont, Switzerland, with the REDS

institute (HEIG-VD), in 2009. Since then, he has been teaching in various fields, from concurrent programming to FPGA systems design, with a strong emphasis on verification of digital systems.

His current interests include biomedical embedded applications and software, hardware acceleration based on heterogeneous systems, and reliability of embedded systems.



Flavio Maurizio Mor was born on June 23, 1982, in Geneva, Switzerland. He received the Diploma of Engineer (with Hons.) in applied physics in 2004, and the B.Sc. and M.Sc. degrees in physics from the Swiss Federal Institute of Technology Lausanne, Lausanne, Switzerland. From 2009 to 2013, he was a Doctoral Student and worked on designing and building experiments, taking data, performing mathematical analysis of them, going for deep fundamental science and its applications. He has four years of experience as a Scientific Assistant in electrophysiology, allowed him

to further develop his skills in data science and his capability to manage different projects in neuroscience with a strong focus on intelligent real-time data analysis while collaborating with different scientists, engineers, neuro-biologists or biologists, and private companies. He is currently a Data Scientist in biocomputing with FinalSpark, Geneva, Switzerland, a company active in the field of artificial intelligence. He was the recipient of the Best Ph.D. Thesis Work Award and a Special Distinction for his research on bionanophotonics.



Sara Grassi was born on March 9, 1966, in Maracay, Venezuela. She received a degree in electronics engineering from Simón Bolívar University, Caracas, Venezuela, the M.Sc. degree in communications and signal processing from Imperial College, London, U.K., and the Ph.D. in electronics and signal processing from the Institute of Microengineering, University of Neuchâtel, Neuchâtel, Switzerland. She has done more than 20 years of applied research, developing and implementing signal processing algorithms in very low power devices, mostly in collaboration with

the watchmaking industry. She is currently a Lecturer with two institutions of the University of applied Science Western Switzerland, Delémont, Switzerland.



Marc Olivier Heuschkel was born in Lausanne, Switzerland, in 1970. He received the Ph.D. degree in microengineering from the Swiss Federal Institute of Technology Lausanne, Lausanne, Switzerland, in 2001. From 2001 to 2011, he then founded and held the position of VP Engineering and Operations with Ayanda Biosystems SA, Vaud, Switzerland, where he carried out research, the development and manufacture of Micro Electrode Array (MEA) devices for in vitro monitoring of neural cell/tissue cultures. In 2011, he founded the spin-off Qwane Biosciences

SA, Switzerland, where he held the position of the Chief Executive Officer from 2011 to 2015. In 2016, he joined the Tissue Engineering Laboratory, University of Applied Sciences Western Switzerland, Geneva, Switzerland, where he completed the development of porous polyimide membranes incorporating a MEA. His current interests include the completion and optimisation of an in vitro MEA-based data acquisition system adapted to neural tissues derived from human iPS cells for toxicology and drug screening applications, and to revive commercial activity around MEA technology tools.



Luc Stoppini was born in 1956 in Annecy, France. He received the Ph.D. degree in neurobiology. Since 2008, he has been a Professor of bio-engineering with the University of Applied Sciences Western Switzerland Geneva, Geneva, Switzerland. Previously, he was the R&D Director of Capsant Neurotechnologies Ltd. U.K., and the Founder and President of BioCell-Interface S.A, Switzerland. He has held academic positions with the Universities of Lyon, Lyon, France, Geneva and Virginia (UVA, Charlottesville). He was involved in commercialising technologies for the last

ten years. Prior to joining Capsant Ltd, he was developing in-vitro models of neurodegenerative diseases (multiple sclerosis) within the Serono's research centre based in Geneva. He setting up in vitro models to study physiological and patho-physiological processes occurring in the central nervous system. He was the pioneer in developing the technique of brain organotypic slice cultures on membranes, which is currently used in numerous public institutions as well as private companies. In parallel to his activities on biological tissues, He was involved in the development and the validation of different types of bio-electronic interfaces.



Adrien Roux was born in August 23, 1980, in Paris, France. He received the Bachelor of Science degree in biology from the University of Technology of Paris, Paris, France, in 2001 and the Master of Science degree in the field of science and technology, and health biology from the University of Montpellier, Montpellier, France, in 2008, and the Doctor of Philosophy degree in biology from the University of Geneva, Geneva, Switzerland.

In 2001, he has started his carrier as a Lab Technician. He has worked in various academic labs in Paris

such as CNRS, the Faculty of pharmacy or the Curie Institute. In 2003, he moved to industrial environment with Thermo-Fisher Scientific in the research of new blood markers and development of assay before working for Merck Serono from 2008 to 2012 in brain drug delivery. In 2014, he join HEPIA, the University of Applied Science, Geneva, Switzerland (part of HES-SO network) were he was nominated as an Associate Professor in 2019 to take over the Tissue Engineering Laboratory.

His current research include the development of in vitro cellular assays and the integration of engineering technics to achieve complex readout. This include the work on the neurovascular unit based on iPS derived cells including the monitoring of the blood brain barrier integrity by bio-impedance or the functional electrical activity monitoring the neurons of 3D in vitro mini-brains. His research interest is in the area of organ-on-chip and in vitro cellular model with a focus on neurology and technology.