WIP: Major content update

This commit is contained in:
2022-06-13 17:39:51 +02:00
parent 3ca7604967
commit 01d0a2c796
352 changed files with 84802 additions and 58 deletions

View File

@ -0,0 +1,119 @@
---
title: "A 0.45 V continuous time-domain filter using asynchronous oscillator structures"
date: 2016-12-11T15:26:46+01:00
draft: false
toc: true
type: posts
math: true
tags:
- publication
- instrumentation
- CMOS
- time-domain
- asynchronous
---
Lieuwe B. Leene, Timothy G. Constandinou
Department of Electrical and Electronic Engineering, Imperial College London, SW7 2BT, UK
Centre for Bio-Inspired Technology, Institute of Biomedical Engineering, Imperial College London, SW7 2AZ, UK
# 1 Abstract
This paper presents a novel oscillator based filter structure for processing time-domain signals with linear dynamics that extensively uses digital logic by construction. Such a mixed signal topology is a key component for allowing efficient processing of asynchronous time encoded signals that does not necessitate external clocking. A miniaturized primitive is introduced as analogue time-domain memory that can be modelled, synthesized, and incorporated in closed loop mixed signal accelerators to realize more complex linear or non-linear computational systems. This is contextualized by demonstrating a compact low power filter operating at 0.45 V in 65 nm CMOS. Simulation results are presented showing an excess of 50 dB dynamic range with a FOM of 7 fJ/pole which promises an order of magnitude improvement on state-of-the-art filters in nanometre CMOS.
# 2 Introduction
The challenges for advancing digital devices and energy constrained computation no longer exhibit the coherent virtues dictated by Moore's Law[^1]. Instead current research is driven to find new solutions inspired by the natural world for solving problems that are dissonant with today's computational paradigm. This has led to the re-emergence of processing in the analogue domain as accelerator to the digital framework [^2]. Motivated by the fact that when tailored to a specific computational problem analogue efficiency can be vastly superior to its digital equivalent [^3][^4]. However there remain many challenges that prevent a clear advantage for such architectures in practice. Current state-of-the-art demands fully integrated SOCs in nanometre CMOS for a cost effective solution. This substantially degrades analogue performance in addition to the difficulty in miniaturizing analogue elements. More importantly analogue tends to drastically lose fidelity for near threshold supply voltages which is an essential aspect to ultra low power digital systems [^5]. To address such challenges oscillator based topologies have been proposed in association with a new computational paradigm [^6][^7]. There are two critical advantages that such an approach can leverage. The first is that the signals pertaining to these systems are digital in nature where the information is encoded with respect to the timing between logical events equivalent to clock edges. This implies that a single binary bit stream can represent multiple bits of information artificially increasing the density of CMOS interconnect. Moreover such signals allow them selves to be manipulated by standard logic gates and asynchronous digital controllers for very rich yet highly efficient signal control[^8]. The second aspect is that voltage controlled oscillators suffer very little performance degradation from aggressive technology scaling or poor transistor characteristics. In fact the perpetual improvement f<sub>T<sub> increases the maximum temporal resolution achievable using time-domain quantizers for unparalleled dynamic range.
{{< figure src="/images/icecs2016/td_system.svg" title="Figure 1: Oscillator based computing to realize linear and non-linear dynamics that utilize a phase domain state as memory." width="500" >}}
In an effort to explore the potential of such a modality this work considers the use of oscillator structures for processing neural activity in extension to a previously developed oscillator based instrumentation system in [^9]. As implantable brain machine interfaces present one of the most demanding applications for realizing power efficient structures that acquire and classify neural activity to treat neurophysiological disorders. The ring oscillator concept shown in Fig. 1. This oscillator plays the role of analogue memory by retaining a state in the phase domain. A transconductive element adjusts the phase subject to the digital control signals. The digital logic dictates the overall response of the structure by using single or multiple phases of the oscillator. This presents negative feedback that stabilizes the operation of the system by rejecting frequency off-sets and noisy aggressors external to the circuit. As will be demonstrated this closed loop dynamic has true analogue aliasing properties due to the nature of VCO based integration. This implies that any logical approximations that induce errors or distortion at high frequency can be rejected. This paper is organized as follows. The basic aspects of the filter architecture is introduced in Sec. 3. This is followed by the circuit level implementation that is detailed in Sec. 4. Sec. 5 presents preliminary simulation results which are concluded upon in Sec. 7.
# 3 Ring Oscillator based Filter
{{< figure src="/images/icecs2016/TD_L1.svg" width="500" >}}
{{< figure src="/images/icecs2016/TD_L1.svg" title="Figure 2: Proposed Single pole ROF structure where V<sub>R<sub> represents the only analogue node in the system. " width="500" >}}
A first order realization of this Ring Oscillator based Filter (ROF) is presented in Fig. 2. This simpler structure will allow the discussion to give insight the elementary operation. Here the digital signals D and Q are pulse width modulated (PWM) encoded time domain signals and are typically not modulated using the same carrier frequency. In essence a digital adder injects current into the oscillator such that the phase recedes or advances with respect another local oscillator by comparing the two pulse width components of D & Q. By using a differential structure the phase can be encoded as self referenced timing events that do not require global frequency synchronization to decode phase information. It is also important to note this structure differs characteristically from classic literature examples [^10] but remain very useful for analysis. This discrepancy arises from sub-threshold and current starved operation of the oscillator which implies that the conduction phases of the NMOS and PMOS transistors for each inverter are non-overlapping. In a general sense however the output voltage V<sub>out<sub> of the structure is often modelled as:
$$ V_{out} (t) = A(t) \cdot f\left[ \omega_0 t + \phi(t) \right] $$
$$ \phi (t) = \int_{-\infty}^{\infty} h_{\phi}(t,\tau) i(\tau) \: d\tau = \int_{-\infty}^{t} \Gamma(\omega_0 , \tau) i(\tau) \: d\tau $$
In Eq. 1 A and \pphi represent the amplitude and phase state variables of the system as a function of time. \\(f\\) describes the limit cycle of the oscillation that captures the non-linearities of V<sub>out<sub> as a function of phase. Our primary interest lies with Eq. 2 which captures the dependency of phase with respect to the impulse response h<sub>\pphi<sub> and the cyclo stationary impulse sensitivity function (ISF) \gGamma. \gGamma is evaluated with respect to a specific small signal source. This dependency is what gives rise to the inherent integral behaviour of oscillators where parasitics diminish the integration constant but will not effect the ideal loop gain of the circuit. The transconductance Gm is introduced to translate the digital output to currents injected into the oscillator represented by I<sub>\dDelta<sub>. The resulting behaviour can be summarized in the s-domain by assuming \gGamma <sub>Gm<sub> is approximately independent of the phase[^11]:
$$ H_{\phi}(s)=\frac{1}{s} \frac{Gm}{2\pi q_{max}} \text{where} q_{max} = N V_R C_{T} $$
$$ V_R = V_{th} + \eta U_T \ln \left( \frac{2 I_B}{2\eta U_T^2 \mu C_{ox}} \frac{L}{W} \right) $$
In Eq. 3 N, V<sub>R<sub>, and C<sub>T<sub> represent the number of inverter stages, voltage across the oscillator, and total capacitance seen as load to each gate in the oscillator. Note q<sub>max<sub> physically represents the total charge that is dissipated each period which implies a frequency of oscillation in terms of f<sub>osc<sub>=I<sub>B<sub>/q<sub>max<sub>. For simplicity the carrier mobility \mmu and V<sub>th<sub> for both PMOS and NMOS are taken as equivalent such that their conductivity is equal. In practice W must be adjusted to compensate this difference but will typically lead to improved supply noise rejection. Fig. 3 shows the phase dependency of \gGamma with respect to I<sub>B<sub> and constituent NMOS & PMOS devices of all gates together for a 5 stage ring oscillator. Although H<sub>\pphi<sub> due to I<sub>B<sub> exhibits some dependency with respect to phase it is well estimated by Eq. 3. The phase information is extracted using an XOR gate which has a gain of 1/\ppi.
{{< figure src="/images/icecs2016/ISF.svg" title="Figure 3: Impulse sensitivity function for a 5-stage ring oscillator with respect to the bias current and NMOS/PMOS contributions from all stages together. " width="500" >}}
Although the first order structure has very low complexity the drawback is that the bandwidth is directly related to the the frequency of oscillation when H<sub>\pphi<sub> is put into feedback. This coupling is undesirable for two reasons. The first is from a noise perspective which is that for a fixed frequency decreasing I<sub>\dDelta<sub> increases the input-referred noise power (e²<sub>n<sub>) of this circuit approximately as (U<sub>T<sub>\\(\cdot\\)I<sub>B<sub>/I<sub>\dDelta<sub>)² which may become very pronounced. This forces the structure to dissipate excessive amounts of power to maintain adequate dynamic range. The second aspect is that the capability to control the oscillator frequency independent of loop bandwidth is useful to adjusting digital power dissipation and its interaction with other system blocks.
{{< figure src="/images/icecs2016/TD_L2.svg" title="Figure 4: Proposed two-pole ROF structure with the digital output Q represented by K PWM phases for reduced analogue distortion." width="500" >}}
For this reason the second order structure is introduced in Fig. 4. This has equivalent characteristics to that of a miller compensated amplifier where the switched current loads into a capacitor across a high gain stage which is realized by the first order structure. As a result noise/bandwidth and oscillator characteristics are decoupled by being represented through two different capacitors C<sub>L<sub> and C<sub>T<sub>. The additional consideration required here is that the digital feedback Q over C<sub>L<sub> can cause large signal swings on the gate of M<sub>B<sub> degrading transconductive linearity. Generally if M<sub>B<sub> is also in sub threshold operation its input range is limited to 2U<sub>T<sub> before excessive distortion is introduced. However capacitively coupling M phases of Q in parallel the quantization levels are reduced to V<sub>DD<sub>/M. Each phase is simply represented by taking more taps from the ring oscillator in parallel. Moreover if M is chosen proportional to V<sub>DD<sub>/2U<sub>T<sub> this structure actually reduces in complexity and improves efficiency as the supply voltage decreases. Note that for high frequency operation the switched current DAC exhibit poor switching dynamics due to the reduced supply voltage. In such a case it is sufficient to replace this block with parallel resistors equivalent to active RC integrators. As such it may be expected that this configuration implies the 3dB frequency equivalent to f<sub>3dB<sub>=I<sub>\dDelta<sub>/C<sub>L<sub> where C<sub>L<sub> is approximated as U<sub>T<sub> kT/e²<sub>n<sub> to match the required noise levels. In extension the oscillator spurs can be set to match this noise floor by considering the filter response and quantizer level dependency such that f<sub>osc<sub>\textgreater f<sub>3dB<sub> SNR/N for a first order system.
# 4 Circuit Implementation
The presented implementation realizes a 0.45 V second order ROF using commercially available TSMC \cmostech LP MS RF technology (1P9M\_6X1Z1U\_RDL). Fig. 5 shows transistor level implementation of the transconductor and oscillator structure that retains the phase state of the system. Here a bias current is simply switched differentially into the capacitive load while M\tss{1-2} provide common mode feedback. The transistors M\tss{3-4} realize a current mirror that biases the ring oscillators proportionally to I<sub>B<sub>. The control switches S<sub>A<sub>/S<sub>C<sub>/S<sub>B<sub> correspond to +1/0/-1 transconductive gains that realize a 1.5 bit current DAC. The oscillators are floating in the middle the supplies due to M<sub>5<sub> which has its body connected to source. This improves the switching behaviour of the proceeding XOR gate by providing good high/low voltage levels while also reducing the noise coupling from ground/substrate if the oscillator is allowed to use isolated P/N-wells. The capacitor C<sub>L<sub> is split into 11 MIM fringe capacitors for a total of 100 fF load on each terminal.
{{< figure src="/images/icecs2016/TD_sch1.svg" width="500" >}}
{{< figure src="/images/icecs2016/TD_sch2.svg" title="Figure 5: The proposed transistor level implementation of the second order ROF" width="500" >}}
The digital logic used to realize unity gain feedback is presented in Fig. 6. Three out of the 11 phases are used in the feedback logic for demonstration. Typically this number of phases is directly related to the frequencies of D & Q or their intermodulation products that will introduce spurs outside of the filter bandwidth. Increasing the number of phases used reduces distortion components while increasing the effective carrier frequencies. This can and should be reconfigurable in addition to tuning I<sub>B<sub> to accommodate the typical process variance for transistor characteristics. While other types of phase detectors beside the XOR gate can be used it is important to realize its impact on distortion due to the finite bandwidth of digital gates. The XOR realization grantees that for near zero input Q will exhibit the smallest bandwidth requirement due to its 50 % duty cycle which gradually increases as the phase difference approaches 0 or \ppi.
{{< figure src="/images/icecs2016/TD_logic.svg" title="Figure 6: Boolean operator used that allows a unity gain configuration of the ROF." width="500" >}}
# 5 Simulation Results
In practice the primary difficulty with time domain structures is their associated simulation effort because the bandwidth of operation is many orders larger than the signals of interest. For this reason the analytic model is also presented to perform behavioural simulations and guide the design effort. The results presented here are based on transient noise simulations using industry provided PSP models for completeness. Fig. 7 shows these simulation results where a 1 kHz PWM encoded input signal is driving the system at 95 % of the full input range. V<sub>R<sub> shows the oscillator providing capacitive feedback on the miller integration node while the phase difference of the two ring oscillators tracks the pulse width of the input. Fig. 9 presents the frequency content when three of the phases are summed together and Fig. 8 shows the oscillator phase difference as a function of time. The 56 dB THD shown is critically related to the current DAC characteristics near the cut-off frequency as it not adequately shaped by the integration loop which is challenging to enhance with limited voltage overhead. Table 1 compares the performance presented here using a figure of merit defined where SINAD<sub>MAX<sub> is the maximum signal to noise and distortion ratio as: FOM = Power/(N<sub>poles<sub> BW SINAD<sub>MAX<sub>).
{{< figure src="/images/icecs2016/visual.svg" title="Figure 7: Digital input (D) & output (Q) components together with the integration node V<sub>R<sub> and oscillator outputs internal to the system. " width="500" >}}
{{< figure src="/images/icecs2016/Delta.svg" title="Figure 8: Phase difference of the oscillator structure measured as time delay." width="500" >}}
{{< figure src="/images/icecs2016/Spectrum.svg" title="Figure 9: Spectral power densities of the multi-phase PWM signal Q" width="500" >}}
Table 1: Performance summary and comparison with state of the art
| Specification | This Work | [^6] | [^12] | [^13] |
|----|----|----|----|----|
| modality | Time | Time | Volt. | Volt. |
| Order | 1 | 4 | 3 | 3 |
| Technology | 65nm | 90nm | 0.5\mmu | 0.5\mmu |
| Supply [V] | 0.45 | 0.55 | 3.3 | 1.8 |
| Supply [A] | 35n | 5.3m | 1.4m | 2m |
| Bandwidth [Hz] | 6k | 7M | 1.5M | 500k |
| SINAD [dB] | 52 | 61 | 60 | 65 |
| FoM [fJ] | 7.4 | 93 | 1026 | 1350 |
| Area [mm²] | 0.001 | 0.29 | 2.2 | 0.68 |
# 6 Acknowledgement
This work was supported by EPSRC grants EP/K015060/1 and EP/M020975/1.
# 7 Conclusion
The model and implementation of a oscillator based filter has been demonstrated to complement that of FIR structures [^14] for asynchronous time domain structures. High linearity is demonstrated at full input dynamic range while operating with a 0.45 V supply voltage. The extensive use of digital logic in its construction allows highly synthesizable oscillator based computing for future ultra low power systems in nanometre CMOS. Preliminary simulation result indicates a FOM of 7.4 fJ/pole for the 6 kHz bandwidth which is a substantial improvement over previous time-domain implementations. While it remains to be seen if the efficiency can be maintained in more complex systems the proposed topology shows much promise for ultra low power systems. Moreover we expect that both the first & second order primitives proposed here will find many other applications like \\(\Delta\Sigma\\) ADCs due to its simplicity and flexibility towards process parameters for low voltage operation.
# Refernces:
[^1]: I.L. Markov, ''Limits on fundamental limits to computation,'' Nature, vol. 512, pp. 147--154, August 2014.
[^2]: N.Guo etal., ''Energy-efficient hybrid analog/digital approximate computation in continuous time,'' IEEE J. Solid-State Circuits, vol.51, no.7, pp. 1514--1524, July 2016.
[^3]: M.Verhelst and A.Bahai, ''Where analog meets digital: Analog-to-information conversion and beyond,'' IEEE Solid-State Circuits Mag., vol.7, no.3, pp. 67--80, September 2015.
[^4]: R.Sarpeshkar, ''Analog versus digital: Extrapolating from electronics to neurobiology,'' Neural Computation, vol.10, no.7, pp. 1601--1638, Oct 1998.
[^5]: M.Alioto, ''Understanding dc behavior of subthreshold cmos logic through closed-form analysis,'' IEEE Trans. Circuits Syst. I, vol.57, no.7, pp. 1597--1607, July 2010.
[^6]: B.Drost, M.Talegaonkar, and P.K. Hanumolu, ''Analog filter design using ring oscillator integrators,'' IEEE J. Solid-State Circuits, vol.47, no.12, pp. 3120--3129, December 2012.
[^7]: W.Y. Tsai etal., ''Enabling new computation paradigms with hyperfet - an emerging device,'' IEEE Trans. Multi-Scale Comput. Syst., vol.2, no.1, pp. 30--48, Jan 2016.
[^8]: T.S. Lande etal., ''Running cross-correlation using bitstream processing,'' Electronics Letters, vol.43, no.22, Oct 2007.
[^9]: M.Elia, L.B. Leene, and T.G. Constandinou, ''Continuous-time micropower interface for neural recording applications,'' in IEEE Proc. ISCAS, May 2016.
[^10]: A.Hajimiri and T.Lee, ''A general theory of phase noise in electrical oscillators,'' IEEE J. Solid-State Circuits, vol.33, no.2, pp. 179--194, February 1998.
[^11]: A.Hajimiri, S.Limotyrakis, and T.Lee, ''Phase noise in multi-gigahertz cmos ring oscillators,'' in IEEE Proc. CICC, May 1998, pp. 49--52.
[^12]: C.Garcia-Alberdi etal., ''Tunable class ab cmos gm-c filter based on quasi-floating gate techniques,'' IEEE Trans. Circuits Syst. I, vol.60, no.5, pp. 1300--1309, May 2013.
[^13]: J.Galan etal., ''A very linear low-pass filter with automatic frequency tuning,'' IEEE Trans. VLSI Syst., vol.21, no.1, pp. 182--187, Jan 2013.
[^14]: M.Kurchuk etal., ''Event-driven ghz-range continuous-time digital signal processor with activity-dependent power dissipation,'' IEEE J. Solid-State Circuits, vol.47, no.9, pp. 2164--2173, September 2012.

View File

@ -0,0 +1,119 @@
---
title: "A 2.7 μW/MIPS, 0.88 GOPS/mm² distributed processor for implantable brain machine interfaces"
date: 2016-10-17T15:26:46+01:00
draft: false
toc: true
type: posts
math: true
tags:
- publication
- processor
- CMOS
- biomedical
---
Lieuwe B. Leene, Timothy G. Constandinou
Department of Electrical and Electronic Engineering, Imperial College London, SW7 2BT, UK
Centre for Bio-Inspired Technology, Institute of Biomedical Engineering, Imperial College London, SW7 2AZ, UK
# 1 Abstract
This paper presents a scalable architecture in 0.18 um CMOS for implantable brain machine interfaces (BMI) that enables micro controller flexibility for data analysis at the sensor interface. By introducing more generic computational capabilities the system is capable of high level adaptive function to potentially improve the long term efficacy of invasive implants. This topology features a compact ultra low power distributed processor that supports 64-channel neural recording system on chip (SOC) with a computational efficiency of 2.7 \\( \mu\\)W/MIPS with a total chip area of 6.2 mm². This configuration executes 1024 instructions on each core at 20 MHz to consolidate full spectrum high precision recordings from 4 analogue channels for filtering, spike detection, and feature extraction in the digital domain.
# 2 Introduction
A key challenge for state-of-the-art neuroscience is real-time data analysis at a massive scale for the diagnosis, treatment and recovery of incapacitating neurological conditions[^1]. While this field has advanced substantially in the realization of signal acquisition and methods for decoding activity. Current systems show a disconnect between implantable devices and the development of algorithms. The initiatives for next generation BMIs focus on scaling recording capabilities and do not consider a strategy for providing highly efficient processing which is imperative implantable SOCs. Moreover it is rare to see methods actively utilize the reconfigurability of modern sensor systems while maximizing the integrity of decoding spike train activity. Numerous aspects with regard to the signal integrity cannot be anticipated and thus assuming a specific method or signal modality will lead to conservative design because an excessively noisy environment is a potentiality. This reveals that chronic instrumentation have yet to be take advantage of more generic real-time processing to improve the efficacy of these invasive devices. Implantable systems predominantly struggle in finding compact and power efficient architectures for signal decomposition. Moving towards fully packaged millimetre scale devices that can support wireless spike train analysis of hundreds of neurons is a highly contested target for many research groups[^2][^3]. As a result high level reconfigurability is yet to be adopted in the current state-of-the-art.
The approach to specialized DSP in the literature reflects two problems pertaining to neural recording systems. The first is signal extraction from recordings that consists of spike detection to extract compressed spike train data. The other is associated with accelerating adaptive filters that map these spike trains to estimate cognitive dynamics or invoked limb movement. Typical examples for acquiring neural activity are fully synthesized cores [^4] [^4]\cite\{2\} that have been successful in realizing implantable solutions. In contrast high level decoding is predominantly performed by FPGAs as integration makes less sense at the system level [^6]. However highly reconfigurable instrumentation have been suggested to leverage both adaptive noise shaping or artefact removal [^7] [^8].
In line with such work this paper presents a distributed processor architecture. Sec. 3 motivates the direction taken here and models the principle constraints for processing at the sensor interface. The proposed system is introduced in Sec. 4 and contextualized by a software development driven platform. The execution unit implementation is detailed in Sec. 5 and accompanied with performance results in Sec. 6.
# 3 On-Node Processing
In order anticipate how future processing methods can be accomodated in SOCs it is essential to capture high level trends with respect to processing capacity of neural implants. Here digital resource requirements are normalized in terms of state variables for evaluating technology dependency. The number of state variables in a dynamic process is a good indicator for complexity whether is a digital classifier or an analogue filter. Here the focus is exclusively on processing such that the signal of interest is idealized with respect amplitude and representation. Consider \\(L\\) as a normalized feature size that allows the evaluation of parameters for a particular technology and extrapolate them based on constant field scaling factors. This remain adequate considering BMIs are fabricated using wide range of 65 nm to 1 \mmu m CMOS technologies.
$$ R_{D} = \underbrace{ \alpha f_s C_{g} V_{dd}^2 L^2 \log_2(SNR) }_{power} \cdot \underbrace{ \alpha \log_2(SNR) A_{g} L^2}_{Area} $$
Eq. 1 represents the power area product for a digital state variable. \\(C_{g}\\), \\(A_{g}\\), \\(\alpha\\) parametrise typical gate capacitance, area, and overhead for each register respectively [^9]. Similarly \\(f_s\\), \\(V_{DD}\\) reflect the sampling frequency and supply voltage. Generally the scaling of \\(R_D\\) constituents are well known and guide maximizing system efficiency in an abstract sense [^10]. For the sake of this discussion we assert that analogue instrumentation is limited to a large extent to having an area power product \\(R_A\\) larger than $10^{-15} Wm^2$ when considering an SNR of 60 dB for a 1.2V system. The derivation comes from the fact that neural signal levels require a specific current dissipation associated with the thermal noise levels and filtering & sampling imply a certain capacitor size according to the supply voltage. The later two terms trade off power with area that can be improved by optimization of the instrumentation topology but will be bounded by signal dynamic range and minimum capacitor sizes.
{{< figure src="/images/biocas2016/Operations.svg" title="Figure 1: Analytic number of digital operations available with respect to different technologies (red) with references to the normalized performance of image processors (blue)." width="500" >}}
With this understanding Fig. 1 illustrates the expected number of digital state variables that aggregate to an equivalent power area product to that of the instrumentation circuit. This shows standard logic in 0.18 um CMOS allows 100 state variables or equivalently 100 operations per sample taken. As reference specialized image processors that similarly rely heavily on data intensive operations are normalized in Fig. 1 to illustrate how technology scaling exhibits the predicted characteristics. As a result it is expected that even for ultra low power BMIs digital processing capacity will be an abundant resource in future systems.
# 4 Neuron-Processor Interface
The high level objective for this system is illustrated in Fig. 2. The application of a generic IoT platform is used to support an unconstrained software stack for networking, data analysis, or system interrogation that best described by high-level languages. This simplifies the development with non-hardware specific software abstractions and accommodates the ease incorporating other modules. The proposed Neuron-Processor Inteface (NPI) device may directly be integrated with the sensor as ASIC and receive configuration commands from this platform to adjust its operation. In extension it follows that peripherals for regulating power and distributing clocks must be integrated on chip. This conforms the interface towards simply providing power and bi-directional data in terms of a SPI protocol.
{{< figure src="/images/biocas2016/Sys_iP.svg" title="Figure 2: Proposed development platform for highly reconfigurable neural recording systems." width="500" >}}
The proposed architecture introduces a large number of on-chip DRAM macros to support the retention of 1024x32-bit instructions. This represents program that is pipelined to each execution unit to instruct filtering coefficients or feature extraction in a manner that can be extended to an arbitrary the number of processing units. Local to each unit is another 1 kbit macro that enables memory intensive methods such as template matching to take place. Four analogue instrumentation channels with a 12 bit ADC is multiplexed to the 8 b processing unit with 68 dB SNDR maximal precision. This is an extension to prior work in [^15] which details the analogue recording implementation.
{{< figure src="/images/biocas2016/NPI_TLT.svg" title="Figure 3: Implemented Neuron-Processor Interface (NPI) system architecture for realizing high performance reconfigurable processing." width="500" >}}
The system is illustrated in Fig. 3, there are multiple layers from system peripheral to the internal units where average data rates progressively increase. We adopt an in-data processing methodology such that the signal is maximally reduced to its principle components at the sensor interface with high-performance digital methods. This mitigates any redundant energy dissipation for data telecommunication. The primary mechanism of operation is the program memory that continuously feeds the stored instructions into the array of processors that operate locally on the recorded data. The execution of these instructions are handled with what is essentially a instruction decoder, memory module and an arithmetic operator. Inherently this implementation will sacrifice the availability of more intricate functionality found in DSPs since the data is not funnelled into one processing unit that can be very elaborate in complexity. The distributed structure is rationalized by the fact that typical methods such as clustering operate at a much lower speed due to the sporadic spiking activity which makes statistical convergence slow. Furthermore these adaptations need to be performed on the order of minutes by which such functions may also be implemented through the redundancy of elementary operations. It is important to mention that multiplexing loses its effectiveness in memory intensive applications such as neural decoding. This is because it does not mitigate the power & area scaling associated with memory allocation and in fact becomes less efficient.
## 5 Execution Unit
It is clear that although all recording channels should execute the same algorithm they will typically not share the same state of operation. This state dependency is exemplified with respect to intermittent processing during bursting neural activity and idling during quiet periods. This is an inherent limitation to sharing the program memory as the dynamic execution of the code where each core has its own program counter or a top level scheduler is not feasible for an arbitrary number of channels. The quasi-out-of-order execution makes it challenging for us to adopt scalable tile structures found in image processing [^11].
Instead branch control or conditional execution is mediated by skipping a section of the incoming instructions if a condition is not met. In this context individual cores may need to execute any section code and branching will only be limited by the dissipation related to the registers pipe-lining the instructions across the chip. As a result any resources associated with cycling through the program has a diminishing contribution to system requirements as the number of channels is increased. This implies that as more sensors are integrated the complexity in algorithm can also increase proportionally which will not be characteristic of conventional implementations that do not pipeline high-level control signals.
{{< figure src="/images/biocas2016/Sys_uC.svg" title="Figure 4: Functional connectivity of the embedded execution unit and sub-blocks" width="500" >}}
The individual components of the execution unit are shown in Fig. 4 and details the main data buses used for exchanging data. The majority operations revolve around manipulating data in the registers R1-R16 as A operand in association with any other data sources that can be used as B operand. In terms of instructions there are always two components where the first is simply the operation executed by the ALU in addition to the two memory sources. The second component optionally extends this simple functionality by writing these intermediate values to multiple other locations or arbitrary branching operations that will take the unit out of sleep.
# 6 Results & Discussion
{{< figure src="/images/biocas2016/Lay_sH.png" title="Figure 5: Fabricated NPI SOC using a 6-metal $0.18 \mu m$ CMOS process showing the system block annotation and top metal routing. " width="500" >}}
This system has been fabricated using a commercially available 6 Metal \cmostech technology (AMS/IBM C18A6/7SF) for validation. The chip micrograph is shown in Fig. 5 measuring 6.2mm² including test circuits and pad ring. While the architecture is capable of achieving very dense configurations at the system level we emphasize that the sensor interface plays an crucial role for noise isolation and chip area overhead.
{{< figure src="/images/biocas2016/TPhw.svg" title="Figure 6: Realization of the development platform used for characterization system functionality." width="500" >}}
The testing platform is photographed in Fig. 6 which interfaces the NPI system with a raspberry pi module. This set-up supports a embedded Linux operating system with low level device control to meed a diverse set of needs. By monitoring the internal data-bus of one core the specialized processing structure has been exhaustively validated at the design point for operating frequencies of 5 MHz to 20 MHz with varying sampling rates on the ADC. Currently the synthesis of instructions remain tailored in associated to the hardware specific compiler because the low level control is crucial for active ADC and amplifier control.
{{< figure src="/images/biocas2016/uC_PS.svg" title="Figure 7: Measured power dissipation with respect to specific operations for the same operand A=113 & B=114 in randomized order." width="500" >}}
The results in Fig. 7 shows the dependency of power dissipation with respect to different operators for the same operand A and B. It should be expected that the is a strong operand dependency with respect to power consumption but these results follow post layout simulations closely. When the unit is in a sleep or branching state the power dissipation is mainly associated with the instruction pipeline. As this 32-bit pipeline transverses the entire execution unit it represents a considerable baseline power contribution. While typical power consumption for full activity lies around 45\\( \mu\\)A at 20 MHz. The reduced complexity local to each channel allows this configuration to achieve \\(2.7 pJ/Cycle\\) or $2.7 \mu W/MIPS$. The specifications given in Table 1 summarize the main features associated with this system on chip for processing neural data at the sensor interface.
Table 1: Comparison of performance specifications for the NPI system.
| Specification | This Work | 2011 [^11] | 2011 [^4] |
|----|----|----|----|
| Scaling | Fine | Fine | Coarse |
| Tech. [nm] | 180 | 65 | 65 |
| Supply [V] | 1.2 | 1.2 | 0.27|
| Units | 64 | 2048 | 16|
| Freq. [MHz] | 20 | 300 | 0.48 |
| Sys. Power [mA] | 1.42 | 300 | 0.28 |
| Sys. Memory [kb] | 32 | - | 50 |
| Tile Memory [kb] | 1 | 1 | - |
| Processor Area [mm(^2)] | 1.37 | 5.10 | 2.09 |
| P-Merit [GOPS/mW] | 1.52 | 0.31 | - |
| A-Merit [GOPS/mm(^2)] | 0.88 | 36.1 | -|
# 7 Acknowledgement
This work was supported by EPSRC grants EP/K015060/1 and EP/M020975/1.
# 8 Conclusion
A scalable processing architecture is proposed in effort to realize compact and efficient neural recording arrays. The topology reflects the nature of processing neural data in the context of extracting signal components and we expect the application of this architecture to be relevant to many high channel count neural SOCs. This discussion details both low-level and system level considerations that look towards better software integration. The proposed system power consumption is on the order of \\(1.5 mW\\) with a power density \\(26 mW/cm^2\\). However this figure is subject to the physical & software reconfiguration that allows extensive optimization for different neural recording applications using the same fabricated device. This work aims to realize long term solution for neural recording implants directed at validating neural decoding methods with in-vivo settings. Importantly standardization off-chip interfacing protocols with self-sustained operation should grantee the ease of integrating existing wireless solutions in extension to this system.
# Refernces:
[^1]: I.H. Stevenson and K.P. Kording, ''How advances in neural recording affect data analysis,'' Nature neuroscience, vol.14, no.2, pp. 139--142, February 2011.
[^2]: A.Khalifa etal., ''A compact, low-power, fully analog implantable microstimulator,'' in IEEE Proc. ISCAS, May 2016, pp. 2435--2438.
[^3]: J.S. Ho etal., ''Midfield wireless powering for implantable systems,'' Proc. IEEE, vol. 101, no.6, pp. 1369--1378, June 2013.
[^4]: V.Karkare etal., ''A 75- $\mu$w, 16-channel neural spike-sorting processor with unsupervised clustering,'' IEEE J. Solid-State Circuits, vol.48, no.9, pp. 2230--2238, September 2013.
[^5]: A.M. Sodagar etal., ''A fully integrated mixed-signal neural processor for implantable multichannel cortical recording,'' IEEE Trans. Biomed. Eng., vol.54, no.6, pp. 1075--1088, June 2007.
[^6]: Y.Xin etal., ''An fpga based scalable architecture of a stochastic state point process filter (ssppf) to track the nonlinear dynamics underlying neural spiking,'' Microelectronics Journal, vol.45, no.6, pp. 690 -- 701, June 2014.
[^7]: C.Qian etal., ''A low-power configurable neural recording system for epileptic seizure detection,'' IEEE Trans. Biomed. Circuits Syst., vol.7, no.4, pp. 499--512, August 2013.
[^8]: Y.Xin etal., ''An application specific instruction set processor (asip) for adaptive filters in neural prosthetics,'' IEEE/ACM Trans. Comput. Biol. Bioinformatics, vol.12, no.5, pp. 1034--1047, September 2015.
[^9]: T.N. Theis and P.M. Solomon, ''In quest of the "next switch" prospects for greatly reduced power dissipation in a successor to the silicon field-effect transistor,'' Proc. IEEE, vol.98, no.12, pp. 2005--2014, December 2010.
[^10]: M.Verhelst and A.Bahai, ''Where analog meets digital: Analog-to-information conversion and beyond,'' IEEE Solid-State Circuits Mag., vol.7, no.3, pp. 67--80, September 2015.
[^11]: T.Kurafuji etal., ''A scalable massively parallel processor for real-time image processing,'' IEEE J. Solid-State Circuits, vol.46, no.10, pp. 2363--2373, October 2011.
[^12]: H.Noda etal., ''The design and implementation of the massively parallel processor based on the matrix architecture,'' IEEE J. Solid-State Circuits, vol.42, no.1, pp. 183--192, Jan 2007.
[^13]: J.Y. Kim etal., ''A 201.4 gops 496 mw real-time multi-object recognition processor with bio-inspired neural perception engine,'' IEEE J. Solid-State Circuits, vol.45, no.1, pp. 32--45, Jan 2010.
[^14]: C.C. Cheng etal., ''ivisual: An intelligent visual sensor soc with 2790 fps cmos image sensor and 205 gops/w vision processor,'' IEEE J. Solid-State Circuits, vol.44, no.1, pp. 127--135, Jan 2009.
[^15]: L.B. Leene etal., ''A compact recording array for neural interfaces,'' in IEEE Proc. BIOCAS, October 2013, pp. 97--100.

View File

@ -0,0 +1,663 @@
---
title: "Brain machine interfaces: Neural Recording Front End Design"
date: 2016-08-08T15:26:46+01:00
draft: false
toc: true
math: true
type: posts
tags:
- chapter
- thesis
- CMOS
- biomedical
---
Lieuwe B. Leene, Yan Liu, Timothy G. Constandinou
Department of Electrical and Electronic Engineering, Imperial College London, SW7 2BT, UK
Centre for Bio-Inspired Technology, Institute of Biomedical Engineering, Imperial College London, SW7 2AZ, UK
This chapter focuses on the multitude of questions associated with the mixed signal design for multi channel integrated neural recording systems. As a result, a significant section will be directed at developing an abstract understanding of how design parameters influence the various design challenges. This discussion will clarify the key limitations for these systems and propose how they can be mitigated or efficiently designed for. In the scope of integrating a large number of recording channels together, clearly understanding how each resource trades for another is crucial for optimizing a complex system. Optimization methods found in the literature typically assume a certain configuration which limits to what extent improvements can be made [^109]. However here we specifically identify abstractions that allow us to consider the impact of different topologies and filter structures simultaneously. This should enable a much boarder sense of optimization that will reflect in the improved performance characteristics demonstrated here.
We will focus on elaborately evaluating the dominant resource requirements with respect to noise, mismatch, quantization, and functional aspects for signal conditioning together that is mostly implementation independent. In addition we propose several circuit implementations based on this analysis that present highly efficient and compact instrumentation. The corresponding abstractions that we use attempt to realize clarity respect to underlying dependencies. This should allow better analytic models that make the limiting factors appear obvious and reveal means to circumvent specific constraints with alternative techniques. For example we may be interested to know when it is worthwhile to put certain functions in the digital domain in terms of the CMOS technology parameters. Approaching the ideal instrumentation structure in such a scenario remains highly desirable for constrained applications. Thus conforming to the technology parameters could reveal that conventional methods do not deliver the most effective solution.
The chapter is organized as follows; Section 17 describes the general problem statement related to the analogue front end which is followed by the associated amplifier design considerations in Section 19. The method for improving the analogue to digital conversion is outlined in Section \ref{ch:T1_converter}. These results are then collected in Section \ref{ch:T1_model} to evaluate the impact of system level parameters as a function of resource requirements.
# 17 Architecture for Neural instrumentation
The analogue dimension of neural recording system can be broken down into two objectives for signal conditioning that will maximize the performance of the proceeding digital signal processing. The first is related to getting adequate signal quantization by amplifying the signals to full input range of the data converter without corrupting the signal of interest. The second objective is performing some kind of filtering that removes noisy or irrelevant components in the recording and only captures the relevant signals of interest.
{{< figure src="technical_1/T1_SIG_Spectrum.pdf" title="Figure 20: Illustration of the spectral power density characteristic for a typical neural recording with the associated frequency bins. " width="500" >}}
As depicted in Figure 20, the input spectrum of a typical \text{in-vivo} electrode recording can be classified using a few frequency bands. The energy from extracellular spiking activity is primarily concentrated around \\(300 Hz\\) to \\(6 kHz\\) and is characteristically intermittent resulting in a distinct difference between the average and instantaneous spectral power [^110]. This characteristic is also present in the LFP band to a lesser extent. From an electrical standpoint the design constraints are derived from the tolerated noise levels in each frequency band to maintain a proper signal to noise ratio. As a consequence it important to specify the signal to noise ratio in terms of noise density opposed to integrated noise figures as digital processing accuracy is not limited by the later term. Here we should also note that the electrode spectral noise power $N^2_{electrode} = 4 kT R_{en} \Delta f$ depends on the resistive component of the electrode impedance. This is typically matched by that of the amplifier noise characteristic \\(N_{amp}\\) so that no excess power is wasted and is expressed in terms of the electrode resistance \\(R_{en}\\), Boltzmann energy \\(kT\\), and the frequencies of interest $\Delta f$.
## 18 Instrumentation Requirements
This kind of electrical sensing can be broken down in the a number of system blocks each of which perform an essential operation to this process. These are shown in Figure 21 and consist of an amplifier, a filter, a sampler, and a quantizer. Occasionally one circuit can combine multiple of these operations together depending on the construction. Table 3 presents the overall performance requirements that should be demonstrated when these components are integrated together. These are also the specifications that we will target as the design is being considered in the following discussion. The reasoning behind these specific requirements are mainly related to conventional signal acquisition given the bandwidth and noise requirements. Moreover these seem to be sufficient for most decoding/characterization methods hence similar figures can be found in most BMI publications.
{{< figure src="technical_1/ISYS.pdf" title="Figure 21: " width="500" >}}
Table 3: Summary of the target specifications for the analogue instrumentation system.
| Parameter | Symbol | Specification |
|----|----|----|
| Integrated Channels | | 64 |
| Supply Voltage | (V_{DD})| (<)1.8V |
| Power Dissipation | (P_{SYS}) | (<)5 (\mu)W |
| Diff. Signal | | 5 (\mu) - 5 mVpp |
| Common Signal | | 50 mVpp |
| CMRR/PSRR | | (>)80 dB |
| Input Referred Noise | (e^2_{in}) | (<)5 (\mu)Vrms |
| Total Gain | (A_T) | (>)40 dB |
| THD at max input | | (>)40 dB |
| 3dB Bandwidth | (f_{3dB}) | 6 kHz |
| High pass frequency | (f_{hp}) | (<)1 Hz |
| Sampling rate | (f_{smp}) | 25 kS/s |
| Input Impedance | (R_{IN}) | (>)50$ M \Omega$ @( 1 )kHz |
| ADC Resolution | (ENOB) | 12 bits |
| Active Area | | 0.01 mm(^2) |
# 19 Amplifier Principles for Miniaturization
{{< figure src="technical_1/Harrison.pdf" title="Figure 22: " width="500" >}}
The principle design considerations for neural instrumentation have been well established particularly with regard to the Harrison topology [^111] that been widely adopted in many systems and shown in Figure 22. Objectively the optimization techniques have become both more specialized and specific for maximizing the average signal to noise ratio in the LFP or EAP bandwidth with the absolute minimum power budget. Interestingly due to the use of more advanced CMOS technologies there is a persistent trend towards sub-threshold operation. This is motivated by trading in the excess transistor bandwidth for improved current efficiency that measured in terms of the achieved transconductance per dissipated ampere of current. In fact this is purely a result of maximizing the individual transistor performance with respect to the speed efficiency product [^112]. This is expressed in Eq 4 using \\(f_T\\), \\(U_T\\), \\(v_sat\\), \\(\mu\\) as the transition frequency, thermal voltage, velocity saturation voltage, and carrier mobility respectively.
$$ \max\limits_{IC} \left\lbrace f_{T} \frac{gm}{I_{DS}} \right\rbrace = \frac{v_{sat}^2}{4\pi \mu \eta U_{T}^2} \: \frac{1}{IC_{max}} \approx \frac{22}{IC_{max}} \left[ \frac{THz}{V} \right] Where IC_{max} = \left( \frac{L_{sat}}{L_{tech}} \right)^2 $$
Here \\(L_{sat}\\) is a technology independent BSIM6 parameter that reflects the impact of ballistic carrier transport during velocity saturation and normalizes the minimum feature length \\(L_{tech}\\) for a specific technology as an effective length. The implication of Equation 4 is that the transistors for optimized low frequency instrumentation amplifiers are exclusively in the sub-threshold regime because \\(f_T\\) is always in excess with respect to the signals of interest. The subthreshold operation results in each transistor's transconductance being defined as $gm = \frac{I_{DS}}{\eta U_{T}}$ which only depends on drain current. Instead of noise optimization though the overdrive voltage, \\(V_{ov}\\), the topology can only reduce noise by removing non-amplifying transistors or biasing them with reduced drain current when compared to the input transistor(s). This reflects the need for a different design methodology as the input referred contribution is dominated by how the total amplifier current distributed to all the transistors. At least in the small signal sense the key requirement is that the amplifying transistors dissipate all the current while biasing/non-amplifying transistors dissipate relatively very little.
In principle due to the under-determined nature of transistor level design the optimization methodology is initially constrained by one of the most important objective characteristics. This could be low noise, wide bandwidth, good linearity, etc. Hence this discussion will digress by distinguishing the design considerations for noise or bandwidth limited amplifiers as separate cases. This should reveal some key relations with respect how power efficiency is achieved. For each case we evaluate the implications with respect to different resource requirements.
## 20 Noise limited Amplifiers
This discussion is guided by the leading challenge for instrumentation systems which is maximizing efficiency while maintaining good linearity. For this reason a noise efficiency factor (NEF) was first introduced in [^113] and is expressed in Equation 5.
$$ NEF^2 = e^2_{in} \frac{I_{tot}}{ U_T 4kT \omega_{3db}} $$
This figure represents a normalized efficiency or in other words it evaluates how much extra current is dissipated by a particular circuit when compared to an ideal bipolar junction transistor for the same noise performance. Here \\(e^2_{in}\\), \\(I_{tot}\\) and \\(\omega_{3db}\\) represent the input referred noise power, the total current dissipation and the -3dB bandwidth in radians respectively. NEF reflects how well a particular topology achieves efficient amplification for a particular noise floor and thus it inherently trades off with a multitude of other parameters. Here we shall use it as design parameter that reflects the chosen transistor level topology. With this in mind, we propose the following reformulation from Equation 5:
$$ e^2_{in} = \frac{kT}{C} \frac{NEF^2}{\eta A_{cl}} \frac{I_{in}}{I_{tot}} \text{where} C = \frac{gm}{\omega_{3db} A_{cl}} \text{and} gm=\frac{I_{in}}{\eta U_T} $$
This result leads to:
$$ gm = \omega_{3db} \frac{kT}{e^2_{in}} \frac{\zeta}{\eta} Equivalently I_{in} = \omega_{3db} \frac{q U_T^2}{e^2_{in}} \zeta \text{where} \zeta = NEF^2 \frac{I_{in}}{I_{tot}} $$
Note that this relation is exclusive to noise limited characteristics and implies nothing with regard to the output load or linearity conditions. Moreover there is a fundamental requirement for transconductance with respect to noise and an implementation related factor \\(\zeta\\). This factor represents the noise efficiency of the topology and the slope factor \\(\eta\\) that tells us about the transistor performance as a fundamental process parameter. Numerous techniques for improving NEF can be found in the literature. As a generalization these can be put into two categories. The first reducing the transconductance of non-signal amplifying transistors using degeneration such that their input referred noise is minimized [^114]. The second approach is AC coupling the amplifier's input signal to biasing transistors such that the total transconductance is increased and the current efficiency is improved. Interestingly because this factor relates to current efficiency the NEF can be smaller than 1 or exceed the efficiency of a BJT using a stacked mixer structure that reuses the same biasing current for multiple amplifiers [^115]. This hints at the fact that NEF should be normalized to the voltage supply but in some sense these structures trade off dynamic range for power efficiency. Theoretical NEF figures for some of the primitive low noise topologies are listed in Table 4 assuming biasing transistors have negligible contribution and taking \\(V_{th}\\) as the NMOS & PMOS threshold voltage.
Table 4: Theoretical figures for NEF for various amplifier topologies. \\(^\star\\) N is the number of stages sharing the structure.
| Topology | NEF | Minimum (Vdd) | Reference |
|----|----|----|----|
| Single Transistor | $\eta $ | (V_{th}) | - |
| Differential Pair | $\eta √{2}$ | $V_{ds} + V_{th}$ | [^116] |
| Complementary Pair | $\eta $ | $2 V_{th}+2 V_{ds}$ | [^117] |
| Common Reference(^\star) | $ \eta √{\frac{1+N}{N}}$ | $V_{ds} + V_{th}$ | [^118] |
| Common Bias(^\star) | $ \eta √{\frac{2}{N}}$ | $(1+N) V_{ds} + V_{th}$ | [^115] |
These relations highlight the fact NEF primarily dependent on the chosen topology and less sensitive to the actual transistor design after optimization. Choosing a topology for the instrumentation amplifier with respect to its ideal NEF performance is significantly more effective than starting with a particular structure and introducing resistive degeneration on transistors that should not contribute noise.
Also notice that the expression for noise in Equation 6 only has one degree of freedom and that is the ratio between the closed loop gain and capacitive load of the amplifier. This implies the 3dB bandwidth of the amplifier is fixed but its unity gain frequency is arbitrary. In fact by satisfying the relation for Equation 7 it is automatically the case the the equivalent noise density requirement is also satisfied. This is significant because we could allow the first stage to provide wide band gain and rely on a second stage to perform filtering. The second stage will have a capacitor gain product that is \\(A_1^2\\) times smaller than if the fist stage had to perform filtering. This can has a large impact on analogue circuit area that is typically dominated by capacitors used for filtering and setting closed loop gain.
{{< figure src="technical_1/flickker.png" title="Figure 23: " width="500" >}}
So far we have only considered the implication thermal noise requirements on the design. We must also address the flicker noise sources because neural signals have a lot of low frequency content. Moreover because flicker noise sources concentrate the noise power at the lower frequencies, the total noise profile inside the LFP frequency band can be dominated by this type of noise. The nature of flicker noise from transistor physics can be due to a number of phenomena; mobility fluctuation $\Delta \mu$, carrier density fluctuation $\Delta N$, and changes in access resistances $\Delta R$. Each of these phenomena will exhibit a \\(1/f\\) frequency dependence when computing the input referred power spectrum. Typically for a given inversion coefficient IC only one of these phenomena will dominate the overall noise characteristic of a transistor. This is illustrated in Figure 23 which shows that $\Delta N$ is typically the leading cause for flicker noise generated additively to the drain current. IC is a factor that indicates to what extent a transistor is operating in the subthreshold region by using the definition IC=I<sub>D<sub>/($2\mu C_{ox} W/L U^2_T$). This uses The more general parameters \\(q\\), \\(W\\), \\(L\\), \\(C_{ox}\\) that represent electron charge, transistor width, transistor length, and gate oxide sheet capacitance respectively. The region of interest for biomedical circuits is typically when \\(IC<1\\) which exhibits good current efficiency and subthreshold operation. The phenomenological model corresponding to carrier density fluctuation $\Delta N$ component is expressed in Equation 8 after being referred to the transistor gate as an equivalent voltage noise density [^119].
$$ e^2_{fl} \Delta f = \frac{q^2 kT \lambda N_T}{W L C^2_{ox} f} \cdot K_{G} \text{where} K_{G} \approx (1 + \frac{\alpha \mu}{2})^2 \text{for} IC < 1 $$
Here \\(\alpha\\) and \\(\lambda\\) represent the coulomb scattering coefficient and tunnelling attenuation distance respectively. Notice that this expression has relatively weak biasing dependency in weak inversion contrast to the strong inversion region as shown in Figure 23. This trend follows very closely to the \\((gm/I_D)^2\\) characteristic which implies a fixed SNR for varying IC. The parameter \\(N_T\\) reflects the density of trapped charges at the oxide interface inside the transistor's conducting channel. Whether this parameter is consistent across various technology nodes is naturally put into question [^120] but similarly there is evidence supporting that indeed this factor is process independent [^121]. Now we should keep in mind that increasing the input transistor size will accommodate lower flicker noise but also result in increased noise. This is because of the signal loss when coupling \\(C_{in}\\) to \\(C_{fb}\\) that is loaded by the parasitic input capacitance of the amplifier \\(C_{g}\\) (see Figure 22). Keeping the ratio \\(C_{g}/C_{fb}\\) fixed as a \\(\delta\\), we can express the required input capacitance in Equation 9 in terms of general amplifier requirements using \\(A_{cl}\\), \\(K_F\\), \\(f_{cor}\\) as the closed loop gain, flicker charge density, corner frequency respectively.
$$ C_{in} = \delta A_{cl} C_{g} = \frac{3}{4} \delta A_{cl} C_{ox} W L = \frac{3}{4} \frac{K_F A_{cl} \delta }{C_{ox} e^2_{in} f_{cor}} \text{where} K_F = q^2 kT \lambda N_T K_G $$
This expression indicates that attempting to achieve all desirable characteristics; small \\(e^2_{in}\\), small \\(f_{cor}\\), large \\(A_{cl}\\) simultaneously in a single amplifier structure comes at the cost of a very large input capacitance that scales proportionally to all factors. This representation suggests the Harrison topology has limited flexibility for improving input capacitance as the only solution appears to be minimising \\(\frac{KF}{C_{ox}}\\) through CMOS process selection. Moreover \\(\delta\\) cannot be made arability small as it will more typically be bounded by the minimum feedback capacitor \\(C_{fb}\\). This need to be large enough to set the high pass pole location at sufficiently small frequency to prevent the resistor \\(R_{fb}\\) from introducing noise inside the signal band which has a integrated power of \\(\frac{kT}{C_{fb}}\\) [^111]. Not to mention that the resulting size of the input transistors can be very large for this particular topology.
## 21 Chopper Stabilized Amplifiers
Alternatively we can apply chopping techniques to deal with these noise requirements which is used extensively in bio-signal instrumentation systems [^69]. By up modulating the signal to a higher frequency before amplification, the flicker noise is added to the usual near-DC band which no longer coincides with out input signal. The output is then demodulated to recover the input. The difference is that the flicker components now lie at the chopper frequency \\(f_{chp}\\) which is typically out of band and can be rejected easily. This eliminates the requirements from Eq 9 on the input capacitance and shifts the focus to rejecting up modulated aggressors at higher frequencies. We suggest using keeping the sampling and chopper frequency coherent because it allows low order FIR filter reject all up modulated harmonics. For instance by chopping at the half the Nyquist frequency (\\(f_s/2\\)) or odd multiples of it (i.e. \\([2n+1] f_s/2\\) $|$ $n \in \mathbb{Z}$) will fold chopper harmonics onto \\(f_s/2\\). The resulting filter are quite relaxed because of the large fractional bandwidth in the transition band that separates our signal bandwidth \\(f_{3dB}\\) from \\(f_{chop}\\). In this particular case we employ a sampling frequency of \\(25 kS/s\\) and use a chopping frequency at \\(37.5 kHz\\) to achieve this functionality [^122]. Conveniently any common mode signals from the sensor or analogue supplies are also rejected using this configuration because they will appear at the chopper frequency.
In addition to basic chopping functionality, the performance can be further improved by providing closed loop feedback to actively cancel aggressors on top of filtering the resulting up modulated aggressors. This can be achieved in multiple ways and in some cases could improve linearity. One possible technique is using a DC-servo loop and another is performing ripple rejection both of which remove different components [^123]. Here we will consider the implementation of three such techniques that improve chopping performance that specifically have negligible power and area requirements. The considerations made here will be similar to that of [^124] [^125] but with explicit focus on area reduction.
{{< figure src="technical_1/T1_CAMP.pdf}" width="500" >}}
{{< figure src="technical_1/T1_CAMP_T.pdf}" title="Figure 24: Proposed compact chopper stabilized neural amplifier topology. " width="500" >}}
Figure 24 shows the proposed configuration that promises a significant reduction in input capacitance and the required silicon area. This configuration has two gain stages where the first stage A1 is a wideband low noise stage and the second provides A2 low pass filtering as motivated by Section 20. This enables the rejection of flicker noise from the first stage completely and effectively shifts the corner frequency of the second stage by gain of first stage squared. Moreover this the configuration does not require auxiliary integrators provide feedback on the capacitive feedback network around A1 that would lead to increased complexity.
The pseudo resistor across A1 in this configuration provides closed loop rejection of low frequency noise below the high pass pole with the time constant $\tau_{HP} = C_{F1} R_{HP}$. The noise components in the band from \\(f_{HP}\\) to \\(f_{CHP}\\) will be a mixture of flicker and thermal noise that are up-modulated by the chopper proceeding A1. This is because A1's corner frequency will lie inside of this band after sizing the input transistors such that \\(C_G\\) is about 5% of \\(C_{IN}\\) which usually leads to a target area of about 100 $\mu m^2$.
It is important that the gains of A1 and A2 are carefully selected because this configuration only provides a first order role off in terms of analogue filtering. It could be that \\(f_{CHP}\\) is not sufficiently outside of the \\(6kHz\\) filter bandwidth resulting in some aggressors to appear on the output of A2. For this reason we require an aggressive high pass pole location to minimize this total up modulated power. More specifically we can say that the total noise contribution below the \\(f_{HP}\\) is mainly from \\(R_{HP}\\) which has a noise power of $\frac{kT}{A1^2 C_{F1}}$ when referred to the input of A1. The main concern here is that we have to sacrifice a small amount of dynamic range on A2 to prevent distortion. Although this power is quite limited, we need make sure the FIR filter can reject up-modulated noise components effectively.
In addition we take advantage of the reduced output referred sampling noise a the input of the second stage that scales by \\(f_{3dB}/f_{CHP}\\). This is because most of the sampling noise will lie outside of the filter bandwidth. The size of \\(C_{I2}\\) can be reduced to alleviate the slewing errors due to the band limited behaviour of A1. In addition the parasitics at the output of A1 will pre-charge before \\(C_{I2}\\) is connected reducing the settling error due to the active nodes switching at the input and output.
A common concern for chopper stabilized circuits is the resistive element of each chopper which in this case is seen at the input of the amplifier. This resistance is due to the switching capacitor \\(C_{I1}\\) that is continuously dissipating dynamic current. This can partially be compensated for by performing positive feedback from the output to assist in cancelling the dynamic current through \\(C_{PF}\\) [^123]. However this will rely on the matching of the capacitor ratios $\frac{C_{I1}}{C_{F1}} \frac{C_{I2}}{C_{F2}}$ to be equal to \\(\frac{C_{I1}}{C_{PF}}\\). This can be quite challenging if small configurations are desired that do not need exhaustive calibration. The use of a high precision ADC makes this somewhat easier because the total gain A1\\(\cdot\\)A2 does not need to be as large implies smaller ratios and better matching. Evaluating this resistance in terms of the switching capacitance will result in the expression in Equation 10.
$$ R_{in} = \frac{1}{2 f_{CHP} \cdot (C_{I1} + C_{par} - \frac{C_{I1}}{C_{F1}} \frac{C_{I2}}{C_{F2}} C_{PF})} $$
This dependency should indicate that if the dynamic switching current cannot be well matched due to parasitics or variability the next objective would be to reduce the total switching capacitance. From our discussion however it appears that reducing the input capacitance is limited by the psuedo resistive noise that induces aggressors at the chopper frequency. This constraint can be mitigated using a distributed amplifier structure that splits A1 into two identical sections. This should be configured such that the second stage has its high pass pole and corner frequency proportionally larger than the first stage but scaled the gain of the prior stage. However such a topology is more constrained by parasitics that worsen the settling errors in $0.18 \mu m$ CMOS. In addition the poor control of psuedo resistive characteristics does not allow this to be convincing solution [^126]. The feasibility may be more favourable in more advanced technology nodes. Considering a value of \\(1 pF\\) for \\(C_{I1}\\) we expect slightly over \\(20 M\Omega\\) without positive feedback and approximately \\(200 M\Omega\\) with \\(10%\\) matching. This may be acceptable in either cases depending which type of electrode is used but generally any thing above \\(100 M\Omega\\) is satisfactory for most scenarios.
## 22 Bandwidth Limited Amplifiers
Biomedical instrumentation has the advantage that the slowly varying signals prevent most implementations from facing problems due to limited bandwidth. The exception however lies with the stage that drives the input capacitance of the ADC and the settling time during sampling can be quite challenging[^127]. Particularly when multiple channels are multiplexed to the same data converter. In some sense there is an similarity when we look at NEF and bandwidth efficiency because they are strongly dependent on maximizing transconductance efficiency.
$$ FOM \left[ \frac{MHz \: pF}{mA} \right] = \frac{f_{UGF} \cdot C_{L}}{I_{tot}} \text{for diff. pair} FOM = \frac{10^{3}}{4 \pi \eta U_{T}} $$
{{< figure src="technical_1/MCAmp.pdf" title="Figure 25: " width="500" >}}
Strictly stated in Equation 11, a bandwidth constrained circuit should minimize the total current consumption \\(I_{tot}\\) for a given unity gain bandwidth \\(f_{UGF}\\) and capacitive load \\(C_L\\). It is typical to find dedicated structures out side of the signal processing chain that drive the ADC input capacitance and focus specifically on maximizing the FOM by employing current recycling, adaptive biasing, and positive feedback techniques. The challenge here is efficiently introducing these techniques while also preserving the capability for full output swing, stability and particularly low distortion. The later is likely the most challenging and demands high loop gain that is generally not found in adaptive single pole structures if full output swing is also required. With that said, two stage Miller compensated topologies can provide an excellent solution to this problem because high gain in the second stage will suppress a number nonlinearities excited by the input stage. Further more the capacitive coupling of the output to the input of the second stage implies the settling speed is limited by bandwidth of the second stage. This allows the configuration to simultaneously provide filtering and settling while sharing many of the biasing and feedback elements. Using the model shown in Figure 25. We can show that sampling induced kick back from the ADC at \\(V_{out}\\) has negligible in pact on internal integration node as it is inversely proportional to the product $A_{cl}\cdot \frac{gm2}{gm1}$ where \\(gm1\\) and \\(gm2\\) are the transconductance of the first and second stage. This is derived from evaluating a step response due to discharging the output load \\(C_{L}\\) which has the Laplace domain response as Equation 12.
$$ H_{step}(s) = \frac{s^2}{s^2 + (\omega_2 - \frac{\omega_1 C_{M}}{A_{cl} C_{L}} ) s + \frac{\omega_{1} \omega_{2}}{A_{cl}}} \text{where} \omega_{1} = \frac{gm_1}{C_{M}} \: \: , \: \: \omega_{2} = \frac{gm_2}{C_{L}} $$
{{< figure src="technical_1/T1_T2AMP.pdf" title="Figure 26: " width="500" >}}
Figure 26 shows the proposed circuit implementation of the two-stage amplifier used inside the second instrumentation stage in Figure 24. This structure has the advantage of providing very high loop gain across the Miller capacitor and allows full output swing due to the positive feed back structures in the current mirrors. The PMOS mirror provides high gain by cancelling the \\(1/gm\\) transresistance of from the diode connected pair leaving the high impedance node and the NMOS mirror provides positive feedback to speed up the transient behaviour on the PMOS side. When this structure provides closed loop gain larger than 20 dB it is sufficient to rely on the NMOS current mirror for stability. In fact this is equivalent to a feed-forward stabilization technique that by passes high frequency signal lag induced by the pole at the PMOS side. However when good phase margin is required at the unity gain frequency stability becomes more stringent. In this case we suggest introducing an additional capacitor across \\(V_n\\) & \\(V_p\\) to realize a zero that cancels the pole in the PMOS branch [^128]. The zero will in fact boost the effective \\(\omega_{2}\\) from $N M \frac{gm_{M5}}{C_{L}}$ to $\frac{N M + M }{2-N} \frac{gm_{M5}}{C_{L}}$. The factor M in this structure has a rather interesting implication with respect to NEF. If M is large enough this topology will have a NEF equivalent to the complementary structure. However in effect the biasing current of the intermediate branch is reduced when M is large which can move the parasitic poles in side the amplifier bandwidth. The apparent trade off between stability and NEF is unique to this structure but it is not challenging to have M=8 for low power applications.
$$ FOM = \frac{10^3}{4\pi \eta U_{T}} \cdot \frac{2 C_L }{ C_M (1+M/K+1/K)} $$
The components that improve bandwidth efficiency are detailed in Equation 13. Referring this back to Equation 6 however implies the noise is dominated by the capacitor that introduces the dominant pole of the system. The observation made here is that unlike the single stage topologies, the two stage configuration can trade off input referred noise for a better speed FOM by adjusting the \\(\frac{C_M}{C_L}\\) ratio. The high level methodology applied here is replacing the \\(C_L\\) with a smaller capacitor that requires less power with the hope that stability can still be maintained by boosting transconductance power with positive feedback and current recycling.
## 23 Circuit Implementation
{{< figure src="technical_1/T1_T1AMP.pdf}" width="500" >}}
{{< figure src="technical_1/AMP_Feedback.pdf}" title="Figure 27: Schematic showing circuit implementation of the proposed compact neural amplifier. " width="500" >}}
Figure 27 shows the transistor level implementation of the topology used in Figure 24. The first gain stage is a highly compact complementary structure that exhibits exceptional noise performance. The second stage transistor implementation is the high gain two-stage topology discussed in Section 22. The variable gain configuration is facilitated by the digital controlled low leakage switches that connect a selected set of capacitors in feedback. This particular configuration provides more generic instrumentation of the 1 Hz to 6 kHz bandwidth. It is well known that the analogue filters introduce frequency dependent group delay near the pole locations which has been shown to degrade processing capabilities of spike sorting techniques [^129]. By placing the high pass pole well inside of the LFP band the spike wave-forms exhibit less distortion due to analogue filtering and is instead filtered using linear phase filters in the digital domain that do not suffer from such drawbacks.
The reset mechanism on instrumentation amplifiers using pseudo-resistive elements is essential. Either during stimulation, start-up, or amplifier saturation the charge across the feedback capacitor must be neutralized before correct operation can begin. This mechanism allows the rejection of various distortion components that would other wise corrupt the latent signal integrity or digital signal processing. However there is an inherent problem with these reset switches due to the parasitic charge injection induced on the intermediate semi-floating nodes. Moreover if these elements are cascaded to increase resistance or dynamic range these sensitive floating nodes are also increased thereby building up more residue charge. A significant amount of charge can introduce a permanent reset artefact after reset as this charge redistributes internally inside the resistor. The proposed solution to this problem is by minimizing the floating nodes and guarding the floating N-Well from injected noise. This should allow a very large pseudo resistance for a sub-Hz high pass cut off frequency while maintaining exceptional reset characteristics. We minimize the resulting charge residue by absorbing the leaky diode currents and residues into the guarding amplifier. Now there will be some instantaneous off-set as the reset signal injects charge directly onto the feedback capacitor but this can be quite small when using small switches. The drawback here is that there may exist a very slow drift on the order of \\(V/sec\\) from the guarding amplifiers due to $V_{os} R_{diode}$. But simple digital assistance will suffice in eliminating this concern by periodically resetting the structure and cancelling the residue off-set. This re-introduces the high pass pole at a well defined location depending on the periodicity of the reset signal and reconstructing signal in the digital domain [^130].
{{< figure src="technical_1/AMP_Label.pdf}" width="500" >}}
{{< figure src="technical_1/AMP_Chip.pdf}" title="Figure 28: Physical implementation of amplifier using a 6-metal $0.18 \mu m$ CMOS process measuring $75 \times 82 \mu m^2$ in size. " width="500" >}}
The floor plan for this implementation is annotated in Figure 28. The typical focus for analogue layout is achieving good matching for the input transistors and capacitors to minimise off-set or undesirable signal coupling. In this case the chopper introduces a lot switching that is difficult isolate from the signal so instead we focused on minimising parasitics of the clocked nets. The common mode feedback on the second stage uses a switched capacitor and wide band amplifier to ensure accurate common mode settling without deteriorating linearity. This is important because the ADC can be quite sensitive to the sampled common mode resulting in a reduced precision if there is an unexpected offset on the sampled output. Simulated performance of the implemented topology is shown in Figure 29. This compact configuration can achieve an input referred noise of $5.6 \mu V_{rms}$ over the specified bandwidth with a noise corner frequency of 20 Hz. The performance is detailed with a clear reduction in size can be observed when compared to other chopper systems in Table 5. The total gain is \\(421 V/V\\) for this particular configuration which can be adjusted using the digital calibration bits integrated into the structure allowing different gain and power settings. The maximum available gain setting is shown in Figure 30.
{{< figure src="technical_1/Noise_PLO.pdf}" width="500" >}}
{{< figure src="technical_1/Amp_Thd.pdf}" title="Figure 29: Post layout simulated results of the proposed instrumentation circuit. " width="500" >}}
{{< figure src="technical_1/sim_gain.pdf" title="Figure 30: Post layout simulated results using periodic steady state analysis to evaluate the closed loop gain of the instrumentation circuit. " width="500" >}}
Table 5: Summary of performance specifications of the proposed instrumentation topology and other bio-signal chopper stabilized amplifiers found in the literature.
| Parameter | Units | This Work | Markovic [^125] | Makinwa [^123] |
|----|----|----|----|----|
| technology | [nm] | 180 | 40 | 65 |
| Supply Voltage | [V] | (1.2) | (1.2) | (1) |
| Total Current | [(\mu)A] | (1.05) | (1.67) | (2.1)|
| Bandwidth | [Hz] | (<1)-(6 k) | (1-5k) | (0.5-1k) |
| Filter Order \ Roll-off | [dB/Dec] | (20) | (20) | (20) |
| Noise Floor | [$nV / √{Hz}$] | (55) | (101) | (60)|
| Noise Corner | [Hz] | $20 Hz$ | (100) | (<1)|
| Dynamic Range | [dB] | (58) | (69) | (64) |
| Area | [(\mu)m(^2)] | $6.2\cdot 10^3 $ | $7.2\cdot 10^4$| $2\cdot 10^5$ |
| Area-Power-Product | [(\mu)W (\mu)m(^2)] | (7.3 10^3) | (141 10^3) | (420 10^3) |
| NEF | | (1.08) | (2.5) | (1.66) |
Overall the proposed implementation performs well for supply voltages larger than \\(1.1 V\\) where the limiting factor is due to the current biased complimentary input stage. This configuration necessitates a voltage overhead requirement of \\(2V_{TH}+2V_{ov}\\). However both of the gain stages are class-A which at exhibit relatively well behaved current transients on the supplies. Class-AB alternatives do not share this feature and are more prone to disturb neighbouring recording circuits. Minimizing the dynamic current dissipation should lead to better LDO performance and lower supply induced sensor noise when many channels are integrated together. This also motivates another aspect for using a wide-band amplifier configuration for the first amplification stage because it usually implies that the common mode will also have wide band regulation. This leads to better common mode rejection in the signal band due to additional loop gain.
# 24 Analogue Signal Conversion
\label{ch:T1_converter}
Analogue to digital conversion remains to a crucial component instrumentation, particularly for full signal characterization. Even when considering the demanding constraints for integrated neural sensors, the prevalence of full spectrum signal characterization is ubiquitous in the literature. This is motivated by the efficiency and reliability of various digital processing methods that require very efficient signal conversion to discreet samples instead of processing recordings in the analogue domain. Typically the most valued performance criteria for such a system is the ADC power consumption. A Successive Approximation Register (SAR) ADC is commonly used for quantizing biomedical signals because it only dissipates switching energy that can be very small for slow sampling rates. The SAR topology is depicted in 31 and can be found extensively in BMI recording publications.
{{< figure src="technical_1/Split_Cap_Schmtc.pdf" title="Figure 31: Schematic of a conventional N bit SAR ADC with the split capacitor at position M." width="500" >}}
## 25 Capacitive array miniaturization
This discussion pays special attention to acquiring neural recordings that include LFPs while minimizing the required silicon area per sensor. This is motivated by wanting to integrate many sensors on chip for large arrays and secondly reducing any capacitive switching noise that can be quite difficult to reject in fully integrated systems. Recording LFPs and EAPs simultaneously will require increased ADC resolution so that the instrumentation dynamic range exceeds 60dB. Equivalently this means 10 to 14 bit precision is needed depending on the nonlinearity tolerances of the proceeding processing methods. This can be quite difficult it terms of the SAR specifications because the capacitor mismatch and sampling noise can prevent aggressive sizing on the unit capacitor. For a given ADC precision N, the SAR capacitor array will require $M \cdot 2^{N/M}+M$ unit capacitors \\(C_{unit}\\) where M is the number of equally split sections. By splitting the array into sections it should be obvious that the total capacitor requirement \\(C_{Total}\\) can be reduced to some extent. The quantization errors resulting from capacitor mismatch on the other hand is also closely related to these parameters. For given standard deviation \\(\sigma\\) and confidence interval CI we can use Equation 14 to make a simple estimate for the expected quantization error \\(E_Q\\) [^131].
$$ E_Q = V_{ref} \frac{\sum \Delta Ci}{2^N C_{unit} - \Delta C_{Total}} = V_{ref} \cdot \alpha(N) \frac{√{2^N}-1}{√{2}-1} $$
The above expression assumes no split configuration is used where \\(\alpha\\) represents a scaling factor that is dependent on the number control bits for each sub-DAC, $\alpha(x)= \frac{CI \sigma}{2^x - CI \sigma √{2^x}}$. \\(V_{ref}\\) is the reference voltage by which the sampled input is normalized to arrive at the binary encoded result. Now extending this formulation to include the dependency of M and bounding $E_Q < LSB/2$ in accordance with the required ADC precision leads to the expression in Equation 15.
$$ \frac{1}{2^{N+1}} \geq \alpha(N/M) \sum_{k=0}^{M} \left[ \sum_{i=0}^{M} √{2^{i}} \right] \cdot \left( \frac{\alpha(N/M)}{2^{N/M}} \right)^k $$
There several higher order terms with respect \\(\sigma CI\\) not shown here because they have vanishing contribution as N is increased and require a numerical solution to the problem. Otherwise for M=2 and arbitrary placing the split capacitor K position in the array we can similarly reconstruct the equality from 14 in Equation 16.
$$ \frac{1}{2^{N+1}} \geq \frac{CI\sigma(√{2^{N-k}}-1)}{(2^{N-K} - CI \sigma 2^{\frac{N-K}{2}} )(√{2}-1)} + \frac{(√{2^{K}}-1) CI \sigma} {(√{2}-1) 2^K (2^K -CI \sigma 2^{\frac{K}{2}})} $$
The standard deviation \\(\sigma\\) is closely related to the exact requirements for the whole capacitive DAC in terms of the total area and unit capacitor size. The dependency of \\(E_Q\\) is mainly subject to the variance due to the MSB capacitors and for each less significant bit (from MSB to LSB) the expected variance increases by \\(√{2}\\) while its capacitive coupling decreases by 2. This is because $\sigma \propto 1/√{A_C}$ where \\(A_C\\) is the area of the capacitor. Clearly there is a process related figure of merit here that relates to the quality of capacitors since small capacitors with excellent matching will result in the best characteristics ADCs such that we minimize the % deviation per \\(\mu m^2\\).
{{< figure src="technical_1/Split_CAP.pdf" title="Figure 32: Numerical solution to Equation 16 relating the capacitive DAC area requirement with the DAC resolution (N) and the position of the split capacitor before capacitor K. " width="500" >}}
Figure 32 shows a numerical solution to the equality in Equation 16. This allows us to consider the effect of split capacitor positioning with respect to the optimal area allocation for the capacitor array. The visible plateau for small N represents the case when the design is bounded by the minimal unit capacitance. This is determined using the process documentation for the target 0.18 \\(\mu m\\) CMOS technology that gives its mismatch specifications and minimum sizing. Generally split capacitor configurations are more sensitive to parasitics they can lead to more pronounced nonlinearities. However in some cases that the unit capacitor size limits the array size such that splinting the array is an effective solution for improving power dissipation. We reiterate that this also indicates that the binary weighted configuration without splitting maximizes area efficiency if we are not limited by sampling noise or minimal capacitor sizing. In addition a fully differential DAC counter intuitively reduces the minimum size if the switching method first detects polarity before applying successive feedback [^132]. This is because the first quantization cycle does not depend on the capacitive division. This in turn means that the array can tolerate twice the mismatch error implying a 4 times smaller unit capacitance while only doubling the number of capacitors in the array.
## 26 Model based topology selection
From here there are multiple directions we can take in order to ensure efficient operation and simultaneously achieve a compact configuration. A common approach is to multiplex the SAR ADC to a large number of channels but this will also require the analogue stage driving the ADC to dissipate proportionally more power due to settling requirements on the sampling capacitance. From a high level perspective, distributing the quantization effort into a large array of ADCs with staggered operation should lead to much more systematic power dissipation due to their uncorrelated operation. Opposed to using a single high speed ADC that requires a much higher clock frequency with stronger tones in the generated supply noise. Another SAR based alternative using calibration for the capacitive array such that it can specifically be designed with the smallest possible unit capacitors. Then we could correct any nonlinearity or quantization errors that arise from capacitor mismatch if the array is characterized precisely enough. This does require either foreground or background calibration modules to extract the individual capacitor weights. Because we aim to perform a number of processing techniques in the digital domain for characterizing neural recording, it makes sense for us to consider effective means to perform calibration.
{{< figure src="technical_1/SDADC.pdf" title="Figure 33: Schematic of the proposed $\Sigma \Delta$ assisted SAR ADC topology for achieving a more compact configuration." width="500" >}}
The structure illustrated in 33 represents a hybrid topology based on SAR and sigma delta structures. The motivation is driven by the efficiency of SAR quantization for large signals and the compactness of high resolution quantization from sigma delta loops. The digital control will perform fully differential bottom plate sampling of the input which is then rapidly quantized to \\(2^N\\) levels using the typical binary search. After the SAR operation the resulting residue left on the capacitive array is quantized using a sigma delta control loop that feedback on the nodes \\(V\Sigma\Delta\pm\\).
There is a strict advantage over conventional sigma delta loops which is that the residue error that needs to be quantized is reduced to \\(\frac{V_{ref}}{2^{N}}\\) which can easily be designed to lie within the linear range of a differential pair. This negates having to use passive or active feedback to deal with transconductance nonlinearity and significantly improves the power efficiency by retaining a relatively simple control loop topology. Moreover as the feedback loop is typically responsible for small dynamic range of 30dB the requirements on clock jitter and decimation filtering is made more relaxed.
The more desirable advantage over a high resolution SAR is that the capacitive DAC may designed in a highly optimal configuration with as few bits as possible. This allows sizing that primarily focuses on suppressing parasitic effects with minimal sampling capacitance. As will be demonstrated this topology does not require an axillary calibration DAC or a pseudo random dithering source for performing mismatch correction. This is due to the capability that the internal sigma delta structure is in the same signal loop as the SAR operation and can trade-off bandwidth for increased noise rejection simply by adjusting the sampling frequency \\(f_s\\). Naturally because this topology inherently needs a pre-amp stage for SAR conversion we should not expect the FOM to do better than low resolution SARs.
Intuitively one can think that when combining the two topologies the individual sources for power dissipation now scale with \\(2^{\frac{N}{2}}\\). More specifically these sources come from the capacitive DAC and decimation filter. The components that do not have reduced scaling are related to the sampling noise and the thermal noise floor of the oversampling modulator. To demonstrate this quantitatively we will build an analytic model for the SAR and \\(\Delta\Sigma\\) SAR topologies to demonstrate some of the inherent characteristics. This will also reveal the techniques for optimizing of the proposed structure.
$$ FOM_{ADC} = \frac{P_{sys}}{2^{N} f_s} $$
Maximizing the performance indicator from Equation 17 will represent our objective function which reflects the efficiency by which each sample is converted into a digital code. Through the simplicity of this relation, any comparison primarily requires an accurate expectation for power budget in terms of the required resolution or precision requirements.
$$ P_{Ideal} = \underbrace{ E_{search} \cdot f_s C_{unit} V_{ref}^2 (2^{N-2}+2^2)}_{Capacitor Array} + \underbrace{2N (N+2) f_s E_{gate} }_{Register Logic} $$
Equation 18 Considers the primitive structure with an ideal comparator where \\(E_{search}\\) represents the average dissipation for binary search switching method and \\(E_{gate}\\) is the average gate dissipation per clock cycle. Both these parameters adjust to different core libraries or various switching methods that typically trade off efficiency for parasitic tolerance [^133]. This ideal structure is extended by the requirements of either a dynamic latch comparator or an analogue pre amplifier that allows negligible comparator requirements at the expense of consuming a static current. The classic pre-amplifier approach also tends to deal with mitigating kick back noise but in general the straightforward application of classic \\(kT/C\\) relations conveniently give;
$$ P_{amp} = 32 \pi \ln(2) \cdot \underbrace{\frac{(U_T N 2^N)^2}{V_{ref} \cdot \eta q}}_{Noise} \cdot \underbrace{A_{ol} f_s NEF^2}_{Bandwidth} $$
Here \\(A_{ol}\\) represents the gain provided by the pre-amplifier. Notice the very typical inverse relationship with respect to \\(V_{ref}\\) which motivates the use of the more efficient dynamic comparator structure. However evaluating the equivalent input referred noise of a dynamic structure accurately requires the a piece wise evaluation for different phases of operation and the respective stochastic integrals [^134]. The contribution can be associated with two dominant sources, that of sampled noise;
$$ \sigma_{S} = \frac{4 kT}{3 C_x F} + \frac{ kT}{3 C_x F^2 H} + \frac{ kT}{12 C_x F^2 H^2} $$
And noise contributed from transconductive elements;
$$ \sigma_{M} = \frac{kT}{C_x F^2} + \frac{kT}{2 C_x F^2 H} + \frac{kT}{8 C_x F^2 H^2} $$
$$ F = \frac{2 \rho V_{th}}{V_{ref} - V_{th} } \text{and} H = \frac{V_{ref}}{2 V_{ov}} \cdot {2 \rho }{1 + \rho} $$
As before, this must be bounded by the acceptable quantization noise, $ V_{ref} \cdot {2^{-N-2}} = √{\sigma_M + \sigma_S } $, which give the values for \\(C_x\\). Strictly there is a strong dependence on the input signal in order to evaluate the dissipated power but on average it is reasonable to approximate this to the capacitive switching energy of $P_{Latch} \approx f_{s} C_x V^2$.
Now consider the components of the \\(\Delta\Sigma\\) structure. Clearly it will follow closely to that of the pre-amplifier based relations with the exception that the primitive components from Equation 18. Instead this will scale with \\(N-K\\) where \\(K\\) is the number of bits resolved by the sigma delta loop. Here two additional components will be accounted for, the first is the integrator and the second is the digital FIR that decimates the modulated residue quantization. A second order feed forward integration topology is chosen for \\(H(s)\\) based on its efficacy of being applied to the configuration shown in Figure 33 and primarily minimizing the number of summing operators and coefficients prone to mismatch. For the sake of discussion we make the assertion that decimation noise rejection is bounded $K \leq (FIR)^{-1/2}$ in the case of a rectangular window for analytic clarity [^135]. Furthermore, note that as we increase the SAR quantization the first stage will proportionally see a reduction of the input signal that needs to be accounted for to achieve the correct integration constant.
$$ P_{int} = 32 \pi \cdot \underbrace{\frac{(U_T 2^{N} NEF)^2 }{ q V_{ref}}}_{Noise} \cdot \underbrace{FIR f_s (1+2^{N-K}) }_{Bandwidth} $$
And similarly the digital decimation filter will scale in the form of;
$$ P_{fir} = \underbrace{2^{K}}_{OSR} \underbrace{( K + \log_{2}(K))}_{Quantization} fs E_{gate} $$
Collecting these terms for each topology will equate to expressions that typically have scalar dependencies on technology or implementation which we must make a set of reasonable assumptions for. The literature will indicate numerous means by which each component can be reduced through specialized logic cells, adaptive comparator power allocation, or power saving switching methods. Our particular interest lies with the dependency on N that will imply the effectiveness of a certain topology for a given dynamic range requirement. In addition this familiarizes us with specific factors fundamental to power dissipation with respect to resolution.
{{< figure src="technical_1/P_TOP_N.pdf}" width="500" >}}
{{< figure src="technical_1/P_TOP_A.pdf}" title="Figure 34: Summary of the FOM (\\(P_{sys}/2^{N} f_s\\)) for each topology with respect to different resolution requirements. " width="500" >}}
Figure 34 presents the expected merit for each topology as the target resolution is varied. Without consideration for area, there is a clear power advantage for the dynamic SAR structure mediated primarily by the fact that the comparator does not have settling associated tolerance. This is the main reason why the pre-amp topology requires a proportionally increased bandwidth/power as resolution is increased. What stands out is that the \\(\Delta\Sigma\\) structure has a power dependency $\propto 2^{3N}$ for achieving the required input referred noise in contrast to more conventional dependency of \\(2^{2N}\\). The mechanism behind this is due to the SAR quantization that reduces the signal input range which needs to be recovered to achieve the correct integration factors. Moreover the over sampling ratio increases simultaneously which has an overall multiplicative effect. Clearly the resolution of the SAR quantizer should only perform a few conversion that put the residue in the linear range of the loop filter and let the modulator perform most of the quantization effort. When all topologies are using the same unit capacitor, this result demonstrates that for \\(N < 5\\) & \\(N > 14\\) the \\(\Delta\Sigma\\) topology becomes strictly unfavourable in terms of power but performs comparably with respect to power efficiency for \\(N \approx 10\\). Taking the FOM area product by considering the capacitors in terms of \\(\Box\\) units the advantage of the \\(\Delta\Sigma\\) topology becomes more obvious. For the precision significant to neural recording, \\(8<N<12\\), the hybrid structure consistently grantees a more compact configuration by a factor of 10.
{{< figure src="technical_1/FOM_Space.pdf" title="Figure 35: Figure of merit dependency of the proposed \\(\Delta\Sigma\\)SAR topology with respect to design parameters K1 & K2. " width="500" >}}
Considering the design space of the \\(\Delta\Sigma\\)SAR structure in more detail will expose a more optimal strategy for increasing FOM. Figure 35 exemplifies how the FOM behaves as either the SAR of sigma delta accuracy is increased. After the optimal basin at N = 9 & K=4 the best strategy for improving ADC resolution is by increasing SAR quantization at half the rate of the sigma delta increase in resolution. For reference a conventional $\Delta \Sigma$ modulator [^136] is designed with the same target specifications and using the same design method to configure the OPAMP integrators and resistive input network. Such a configuration achieves 167 dB FOM<sub>s<sub> irrespective of target resolution when we consider just the analogue power dissipation. In fact this figure is commonly achieved by state of the art [^137]. As shown in Figure 36 the \\(\Delta\Sigma\\)SAR configuration can theoretically achieve more than 4X better performance than conventional \\(\Delta\Sigma\\) modulators for resolutions above 12 bits even when operating at lower supply voltages. This is because of the improved noise efficiency. Please refer to Section 58 for additional details regarding derivations and topology comparisons that are omitted here for clarity.
{{< figure src="technical_1/AMD.pdf" title="Figure 36: Estimation on the expected figure of merit for a target resolution and varying SAR precision. The red star and blue circle indicate the target and measured performance respectively. " width="500" >}}
## 27 Circuit Implementation
Extending the conventional SAR structure to perform sigma delta modulation is achieved with relatively little changes to the overall topology. The main difference is that during the last phase of SAR conversion a register must be toggled that switches in the integrators intermediate to the comparator. Simultaneously the \\(V\Sigma\Delta\pm\\) capacitors are directly connected to the comparator bipolar output instead of the common mode voltage \\(VCM\\) for differential feedback. This configuration is integrated on chip and performs 7 bits of differential SAR quantization with another 5 bits resolved by the noise shaping modulator with an over sampling rate of 32. At the system level, 4 analogue recording channels will be multiplexed to the input of the ADC which implies sampling rate of \\(100 kS/s\\) is required to sample each output at \\(25 kS/s\\).
{{< figure src="technical_1/SAR_Arch.pdf}" width="500" >}}
{{< figure src="technical_1/SAR_Logic.pdf}" title="Figure 37: Schematic configuration of the top level control for the \\(\Delta\Sigma\\)SAR data converter." width="500" >}}
Figure 37 shows the top level configuration of this data converter. By using a specialized register logic slice a small reduction in complexity is achieved in addition to the mitigation of timing issues typical with the conventional self clocking register configuration. This topology uses a bottom plate sampling strategy to neutralize the effect of parasitics and common mode comparator nonlinearities while operating at 1.2V with a 10MHz clock frequency. Although there are only \\(N-K+OSR\\) active phases, settling the output of the recording amplifiers on to the capacitor array will require several cycles because of the band limited behaviour present in the driving stage.
The implementation of the capacitive DAC and second order feed-forward integrator are shown in Figure 37. This configuration also opts to scale the voltage reference for the LSB in order to reduce the total number of capacitor required. As the capacitor array is implemented using CMIM devices the 7 bit differential structure with a split capacitor for \\(M=3\\) will grantee 10.1b for a confidence interval of \\(3\sigma\\) using Equation 16 and process documentation parameters that show a $8\times8 \mu m$ has \\(0.23 %\\) mismatch induced standard deviation. The reasoning for this configuration is that we are guaranteed \\(>9.5 bits\\) without calibration and will allow \\(>12 bits\\) with calibration. For either case the accuracy is sufficient for recording LFP and EAP signals simultaneously. This result was also confirmed with monte-carlo analysis using foundry supplied PSP models.
{{< figure src="technical_1/T1_SDSAR_CDAC.pdf}" width="500" >}}
{{< figure src="technical_1/T1_SDSAR_INT.pdf}" title="Figure 38: Schematic implementation of the \\(\Delta\Sigma\\)SAR structure. " width="500" >}}
The integrator topology primarily deals with the contrasting bandwidth requirement of the SAR operation and the sigma delta integration for the first stage. Particularly when taking the SAR decisions at the oversampled clock the first stage can only provide wideband gain if the capacitor is switched out and a resistive element is used instead. The circuit complexity can be dramatically reduced by using triode region transistors that regulate the PMOS biasing current for a well defined common mode. Because these transistor can be large in area they could slow down the maximum SAR speed. To avoid this the CMFB circuit is semi open loop during the SAR quantization leading to an increase bandwidth by using the common mode voltage that preserved on the integration capacitor. Also by switching the biasing current of the analogue summing stage a constant common mode can be presented to the comparator input thereby reducing any off-set disparity between the two operation phases.
{{< figure src="technical_1/ADC_Label.pdf}" width="500" >}}
{{< figure src="technical_1/ADC_Chip.pdf}" title="Figure 39: Physical implementation of ADC using a 6-metal $0.18 \mu m$ CMOS process measuring $93 \times 147 \mu m^2$ in size." width="500" >}}
Figure 39 shows the fabricated structure of the ADC. Since the capacitors are placed on top of the active circuits this floor plan distances the integrators and the MSB capacitors to physically isolate the digital switching noise sources. A number of shielding structures are employed to improve post layout performance. There include various guard rings and isolating N-wells but due to the proximity of the digital switching the most effective strategy is appropriately orienting fully differential structures in order to equalize the coupling components. Here metal layers 1-3 are used for transistor interconnect, layers 5-6 for the capacitive DAC, and layer 4 is interposed in order to shield the two sections while connected to the common mode voltage. This is because the transient fluctuations on \\(V_{cm}\\) are only due to mismatch and should be the most quiet reference in the system with large capacitive loading.
In order to take advantage of this structure we reveal two distinguishing characteristics that can not be found in either conventional topologies or other hybrid topologies. When the capacitive DAC is considered as a set of weights that need to be determined we realize that the derivative for slow varying signals is predominantly quantized by the sigma delta loop. With the exception when the SAR bits switch the quantization is independent of the mismatch in these weights. As a result all the mismatch coefficients can be accounted for with respect to the $\Sigma \Delta C$ capacitor.
{{< figure src="technical_1/adc_cloop.pdf" title="Figure 40: Control loop used to perform calibration with a slow test signal at the ADC input." width="500" >}}
The calibration technique discussed is abstractly represented by Figure 40 where there are two IIR control loops with the coefficients \\(a_1\\) and \\(b_1\\). In part this loop performs normal operation by evaluating the signal quantization \\(Q_{sig}\\). This is done adding the SAR quantization with calibrated weights and decimating the oversampled residue with a \\(32^{nd}\\) order FIR window quantized with 8 bit coefficients for each sample. Here \\(a_{1}\\) simply has to be small enough to track the signal and reject noisy components to determine $\Delta Q$. $\Delta Q$ represents DNL nonlinearities that are used to adjust the coefficients \\(K_{DAC}\\). The multiplication operator is in fact a bitwise evaluation that indicates if a coefficient needs to be adjusted due to a correlation between $\Delta Q$ and a change in that bit. Hence \\(b_{1}\\) needs to be small enough to prevent level dependent tuning and \\(V_{test}\\) should be a full range slow varying signal.
{{< figure src="technical_1/adc_UC.pdf}" width="500" >}}
{{< figure src="technical_1/adc_CC.pdf}" title="Figure 41: INL Plots illustrating the mismatch artefact reduction due to calibration." width="500" >}}
The improvement in INL is evident in Figure 41 due to the calibration mechanism with \\(a_1=1/4\\) and \\(b_1=2^{-8}\\). The close interaction between INL & DNL errors over the full dynamic range for a capacitive array in addition to the sigma delta loop's capability of quantizing $\pm 2 LSB$ of the array allows this method to converge accurately. Here it is observed that the calibration improves the quantization accuracy by two additional bits.
{{< figure src="technical_1/adc_thdsnr.pdf" title="Figure 42: Measured THD and SNR of the fabricated data converter." width="500" >}}
{{< figure src="technical_1/ADC_TEST.jpg}" width="500" >}}
{{< figure src="technical_1/adc_TI.pdf}" title="Figure 43: Testing setup used for characterizing the ADC." width="500" >}}
Figure 43 shows the test bench used during device characterization. The saleae logic device is a digital probe that offers 100 MS/s digital signal acquisition for measurements of up to 10 seconds. Here the raspberry pi module simply provides real time interaction with the device configuration using automated spi control and a graphical user interface that will indicate ADC precision based on the selected operation. This allows us to tweak the operating conditions and find which noise sources are disturbing the configuration. The analogue bias \\(I_{BIAS}\\) is generated by a 2602A Keithley system source meter and fed in using a guarded triax cable. The differential input signals are generated using a Agilent 33522A arbitrary waveform generator and fed to the ADC input using BNC cables.
Table 6 outlines the characteristics of the implemented ADC configuration while comparing it to recent oversampling/noise shaping data converter publications. Figure 42 demonstrates the spectral characteristics of the quantization output for a input signal at half the full input range after calibration. In comparison to the analogue instrumentation, the resource related specifications are significantly larger. However note that there requirements are distributed over a number of channels as a result of multiplexing this structure.
Table 6: Summary of performance specifications for the \\(\Delta\Sigma\\)SAR data converter and other oversampling/noise shaping data converter structures found in the literature.
| Parameter | Units | This Work | Lo [^138] | Roermund [^139] |
|----|----|----|----|----|
| Technology | [nm] | 180 | 65 | 65 |
| Supply Voltage | [V] | (1.2) | (1.2) | (0.8) |
| Total Current |[(\mu)A] | (12) | (13) | (1.7) |
| Sampling Frequency | [kS/s] | $200 $ | (8) | (16) |
| ENOB | [bits] | (11.3) | (17.5) | (14.5) |
| SFDR | [dB] | (86) | (105) | $ 87 $ |
| Area | [$\mu m^2$] | $93 \times 147$ | $400 \times 180$ | $600 \times 300$|
| Power-Area-Product | [$\mu W \mu m^2$] | $1.9 \cdot 10^5$ | $1.1 \cdot 10^6$ | $2.4 \cdot 10^5$|
| $P/(fs + 2^N)$ | [fJ/conv] | (10) | (29) | (6.6) |
| (SNDR+10log(BW/P)) | [dB] | (166) | (180) | (177) |
The trade off with respect to residue over sampling in Figure 44 demonstrates that there is some flexibility with respect to sampling rate and SINAD performance. In addition this also clarifies that post-fabrication adjustments do not exhibit significant resolution improvements beyond the design point. This is related to the sampling noise of the capacitor array and the noise floor of the analogue integrators that need to be programmable for different oversampling ratios. At which point the decimation also has more strenuous requirements that may result in an inefficient resource overhead. Strictly stated it is significantly more efficient to reject noise with digital bandpass filtering selected frequency components than having the ADC resolve the signal beyond the target precision.
{{< figure src="technical_1/adc_fom.pdf" title="Figure 44: Measured Figure of Merit as a function of oversampling ratio." width="500" >}}
In the context of miniaturization the topology presented here follows closely to the expected improvement from the model for high resolution signal acquisition. We achieve nearly 12 bits of quantization with a 6 bit equivalent capacitive DAC which is reflected in the compact design foot print. When compared to similar compact ADC implementations found in recent publications we observe a competitive power budget with again significantly smaller area requirement. Some additional digital processing is required opposed to the simplicity of SAR converters to take full advantage of the topology. However such hardware is typically readily available in systems that also perform spike sorting and neural signal classification.
# 28 System Level Abstraction
\label{ch:T1_model}
Numerous specifications such as ADC resolution and input referred noise of the instrumentation amplifiers relate directly to signal specific parameters. Moreover a particular processing algorithm would favour certain filter configurations of others in terms of signal conditioning. In multi stage systems however there is a significant amount of flexibility related to choosing gain for individual stages or their filter parameters that is indifferent to the resulting transfer function. Here we consider such a primitive \\(N\\) stage analogue processing chain and discuss the allocation of resources to gain insight to some of the high level the optimization for selecting a specific configuration. Such a configuration is shown in Figure 45.
{{< figure src="technical_1/ACS.pdf" title="Figure 45: Multistage amplifier configuration using the series G to adjust the allocation power and area. " width="500" >}}
$$ G[n] = A_{g} \left( \beta + \alpha^{n} \right) \text{where} A_{g} = √[N]{\frac{G_{T}}{ \prod_{i=1}^{N} (\beta + \alpha^{n} )}} $$
Consider a geometric series for the gain of each stage as expressed in Equation 25. Here \\(G_{T}\\), \\(\alpha\\), \\(\beta\\) represent the total gain required, resource distribution factor, and a minimal contribution factor. The formulation is motivated by the fact that if \\(\alpha\\) is one resources are allocated equally. This means every stage has equal gain but it also implies that the sum of all gain factors is minimal leading to a minimum amount of area due to the feedback capacitors. More typically designs will choose a smaller \\(\alpha\\) such that most of the gain is situated at the first few stages. This allows some reduction in power in the proceeding stages because of the reduced noise requirement. \\(\beta\\) simply allows us to specify that a fraction of the total gain is uniformly distributed but is typically kept small in order to maximize the benefit from resource redistribution. This allows us to express the noise power requirement for a given set of parameters in Equation 26.
$$ P_{Amplifiers} = P_{unit} \left( 1 + \sum_{k=1}^{N-1}\left[\prod_{i=1}^{k} \frac{1}{A_{g} \beta + A_{g} \alpha^{i} } \right] \right)^2 $$
$$ A_{Gain}= A_{unit}\left( \sum_{k=1}^{N} \left[1 + A_{g} \beta + A_{g} \alpha^{k} \right] \right) $$
Here \\(P_{unit}\\) is simply evaluated from Equation 6 and leads to an area requirement that is simply expressed using Equation 27. Now taking some typical parameters we can evaluate a possible configuration of gains and thereby the associated allocation of resources. This is shown in Figure 46.
{{< figure src="technical_1/RDBG.pdf" title="Figure 46: Resource allocation for analogue power and area using the parameters \\(G_T=500\\), \\(\alpha=0.3\\), and \\(\beta=0.05\\). " width="500" >}}
Lets take \\(A_{unit}\\) as some unit capacitance size that allows the deviation of gain due to mismatch to fall inside the confidence interval. In order to realize Equation 26, each stage has its power and input referred noise reduced by accumulated gain for the preceding stages. This result presents us with the trend illustrated in Figure 47 where it appears that in many stage systems it is relatively beneficial to redistribute the resources to the front-end for a reduction in overall power. However when the number of stages is three or less we observe the increase in area can diminish this improvement for high gain system requirements.
{{< figure src="technical_1/NM_NP.pdf}" width="500" >}}
{{< figure src="technical_1/NM_PAP.pdf}" title="Figure 47: Normalized resource improvements for \\(\alpha\\) with respect the case when \\(\alpha=1\\) for each configuration. " width="500" >}}
So far we have neglected some aspects to the design consideration. The first is the multiplicative increase standard deviation as N is increases and the sensitivity to variance being inversely proportional to closed loop gain. Here we can account for the increased variance by proportionally increasing \\(A_{unit}\\) in order to neutralize this increase according to Equation 28.
$$ \Delta \sigma^2 =\frac{A_{\mu+\sigma}}{A_{Gain}} \approx \prod_{i=1}^{k} \left( 1 + \sigma CI \left[ 2 - \frac{2}{√{ A_{g} \beta + A_{g} \alpha^i }} \right] \right) $$
Again \\(\sigma\\) represents the deviation for a chosen unit capacitance and \\(CI\\) is our confidence interval. For completeness in estimating area we will also introduce the capacitance required for performing filtering on the last \\(K\\) stages. Rearranging Equation 6 in terms of output referred noise according to Equation 29.
$$ e^2_{out} = \frac{kT}{C} {NEF^2}{\eta} $$
Combining these terms lets us define a more accurate area requirement that is reformulated in Equation 30.
$$ A_{filt} = A_{unit} \cdot \frac{kT}{C_{unit}} \frac{NEF^2 SNR^2}{Vdd^2 \eta} \cdot \left( 1 + \sum_{k=1}^{K-1} \prod_{i=1}^{k} \left[A_{g} \beta + A_{g} \alpha^{N-i} \right] \right) $$
It is important to point out that SNR here refers to the SNR of the data converter as we have fixed the input referred noise of the system for a systematic comparison and we adjust \\(G_T\\) to fill this dynamic range. And extending this result with the requirements for signal conversion we can estimate system level power \\(P_{Total}\\) and area \\(A_{Total}\\) requirements as a sum of individual components according to Equation 31.
$$ A_{Total} = A_{filt} + A_{Gain} + A_{ADC} \text{and} P_{Total} = P_{Amplifiers} + P_{ADC} $$
Taking an appropriate set of parameter values, the system of relations is exemplified in Figure 48 with respect to the dependency on the supply voltage, \\(Vdd\\). As illustrated there are two domains when considering the area requirement. For small \\(Vdd\\) the sampling & filtering noise requirements overwhelm the design particularly in this case if \\(\alpha\\) is not taken small enough and a second order roll off is needed. When there is more voltage overhead available we observe reliably matching in input dynamic range of the ADC is the dominating factor.
{{< figure src="technical_1/NM_TSNA.pdf}" width="500" >}}
{{< figure src="technical_1/NM_TSNAP.pdf}" title="Figure 48: Analogue resource relations with respect to different supply voltages. " width="500" >}}
The area power product also tells an interesting story. When \\(Vdd\\) is larger than 1 V a clear proportional dependency on power is apparent that is mostly related to the total gain & noise requirements of the system because the ADC is not the limiting factor. However for small supply voltage the power dissipation requirement is more closely related to the lower noise quantization requirements presented by the SD-SAR topology. We should be careful because certain circuit topologies are simply not viable below specific supply voltages and as a result it would no be possible to achieve a NEF smaller than 2. Figure 48 also indicates when particular topologies are viable specific to the $0.18 \mu m$ CMOS process where $V_{th} \approx 350 mV$. That said it is likely a system can be designed with \\(0.6 V\\) supply in order to achieve significant power and area savings. The main challenge will be achieving acceptable total harmonic distortion as the supply will not easily allow cascoding transistors. Particularly sub-threshold transistors suffer from \\(Gm\\) nonlinearity as a function of \\(e^{\frac{-V_{DS}}{U_T}}\\) that can only be compensated by increased loop gain and multi-stage topologies. Since it is implementation dependent, it is difficult to quantify what this increase in area an power overhead this will result in. We can assert that \\(60 dB\\) precision with instrumentation has very significant diminishing returns when the conventional design approaches a \\(2 V_{th}\\) supply. The reader can find more details in regard to these comparisons in Section 60.
The approach taken here can be exhaustively extended towards including more detail in the system level design in order to leverage the capability of numerical methods. Higher order Gm-C filtering structures can be accounted for as a single stage by introducing new parameters that reflect the increase in \\(NEF\\) and filtering capacitors. Transistor area per amplifier can arguably be assumed static if chopping techniques are employed or alternatively this can accounted for by considering the flicker noise relations for the input transistors. However these contributions have negligible effect on changing the optimal resource destitution and will be more influenced by strategic positioning of poles to reject certain noise components. The most critical parameters on the systems level is the supply voltage as well as the requirement for channel to channel gain matching. As the power area product has a inverse square dependency as either \\(V_{DD}\\) or gain variance tends to zero. There are only a select number of scenarios where gain matching is of significance which is primarily in the case of distributed LFP recording and multi electrode (i.e. tetrode) recordings where exact coupling of neural circuitry is in question. The supply voltage has significance with respect to the expected power dissipation of the on chip digital processing and it is understandably advantageous to aggressively dissipate more power on the analogue side if the power saving in the digital domain indicate a overall improvement.
We note another aspect to technology selection in addition enabling voltage scaling is the increase in functional capacitor density. In fact we have shown that the dominant factor for area requirement in chopper stabilized structures is capacitance through the strong dependency on gain and filtering elements. More advanced processes have an increased number of metal layers and higher transistor gate capacitance. This ultimately leads to an increased capacitor density per square millimetre. In certain scenarios this should allow us to marginally shrink amplifier configurations while keeping the same filter characteristics. The main concern would be associated with capacitor nonlinearity that requires extra consideration or correction circuits.
# 29 Conclusion
This chapter has demonstrated the capacity for conventional analogue instrumentation with state-of-the-art circuit techniques. This presents capacity for achieving very compact performance that is sufficient for the full characterization of neural recordings. The fabricated system uses 0.03 mm\\(^2\\) size silicon footprint for 4 recording channels that can characterize 5 mVpp neural signals with over 11 bits of precision. In addition proposed $\Delta\Sigma SAR$ ADC topology demonstrates how oversampling converters can achieve 10fJ/conversion efficiency with minimal circuit complexity. The techniques applied here suggests chopping and sigma-delta modulation are key components for achieving better performance particularly for size constrained systems. In association we suggest immediate digitization & coherent mixed signal processing to leverage a number of advantages. Moreover we expect modern system will allow more processing capabilities in the digital baseband for BMI systems that needs to be used effectively.
The significance of minimizing the noise efficiency factor has been revealed in terms of having profound influence to power dissipation and area. In extension we have presented a number of topologies that excel at achieving excellent power and area efficiency in the case of single stage, two stage, and ADC structures. However we are left with little surprise when methodical optimization of various configurations is limited by the fundamental bounds in terms of noise and dynamic range. In fact various idealized configuration show little benefit with respect to one another if they have been optimized and exploited appropriately with the understanding presented. It is characteristic that improving resource efficiency for full bandwidth signal quantization is difficult because we simultaniously attempt to achieve lower supply voltages.
Although digitization is crucial to most neural recording systems for extracting the signal characteristics used to train and improve signal postprocessing. It is clear that improvements at the system level will lie very much in the domain of specialized instrumentation and analogue to information converters. This notion is motivated by the desire for the system to be limited by the law of equipartition and less so by the quantization process of the data converter. The direct classification of recordings in the analogue domain has significant implications on the responsibilities of the accompanied DSP on chip and the reduction of associated processing bandwidth.
# References:
[^1]: R.Q. Quiroga, Z.Nadasdy, and Y.Ben-Shaul, ''Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering,'' Neural Computation, vol.16, pp. 1661--1687, April 2004. [Online]: http://dx.doi.org/10.1162/089976604774201631
[^2]: R.A. Normann, ''Technology insight: future neuroprosthetic therapies for disorders of the nervous system,'' Nature Clinical Practice Neurology, vol.3, pp. 444--452, August 2007. [Online]: http://dx.doi.org/10.1038/ncpneuro0556
[^3]: K.Birmingham, V.Gradinaru, P.Anikeeva, W.M. Grill, B.Pikov, VictorMcLaughlin, P.Pasricha, K.Weber, DouglasLudwig, and K.Famm, ''Bioelectronic medicines: a research roadmap,'' Nature Reviews Drug Discovery, vol.13, pp. 399--400, May 2014. [Online]: http://dx.doi.org/10.1038/nrd4351
[^4]: ''Bridging the bio-electronic divide,'' Defense Advanced Research Projects Agency, Arlington, Texas, January 2016. [Online]: http://www.darpa.mil/news-events/2015-01-19
[^5]: G.Fritsch and E.Hitzig, ''ber die elektrische erregbarkeit des grosshirns,'' Archiv für Anatomie, Physiologie und Wissenschaftliche Medicin., vol.37, pp. 300--332, 1870.
[^6]: G.E. Loeb, ''Cochlear prosthetics,'' Annual Review of Neuroscience, vol.13, no.1, pp. 357--371, 1990, pMID: 2183680. [Online]: http://dx.doi.org/10.1146/annurev.ne.13.030190.002041
[^7]: ''Annual update bcig uk cochlear implant provision,'' British Cochlear Implant Group, London WC1X 8EE, UK, pp. 1--2, March 2015. [Online]: http://www.bcig.org.uk/wp-content/uploads/2015/12/CI-activity-2015.pdf
[^8]: M.Alexander, ''Neuro-numbers,'' Association of British Neurologists (ABN), London SW9 6WY, UK, pp. 1--12, April 2003. [Online]: http://www.neural.org.uk/store/assets/files/20/original/NeuroNumbers.pdf
[^9]: A.Jackson and J.B. Zimmermann, ''Neural interfaces for the brain and spinal cord — restoring motor function,'' Nature Reviews Neurology, vol.8, pp. 690--699, December 2012. [Online]: http://dx.doi.org/10.1038/nrneurol.2012.219
[^10]: M.Gilliaux, A.Renders, D.Dispa, D.Holvoet, J.Sapin, B.Dehez, C.Detrembleur, T.M. Lejeune, and G.Stoquart, ''Upper limb robot-assisted therapy in cerebral palsy: A single-blind randomized controlled trial,'' Neurorehabilitation AND Neural Repair, vol.29, no.2, pp. 183--192, February 2015. [Online]: http://nnr.sagepub.com/content/29/2/183.abstract
[^11]: P.Osten and T.W. Margrie, ''Mapping brain circuitry with a light microscope,'' Nature Methods, vol.10, pp. 515--523, June 2013. [Online]: http://dx.doi.org/10.1038/nmeth.2477
[^12]: S.M. Gomez-Amaya, M.F. Barbe, W.C. deGroat, J.M. Brown, J.Tuite, Gerald F.ANDCorcos, S.B. Fecho, A.S. Braverman, and M.R. RuggieriSr, ''Neural reconstruction methods of restoring bladder function,'' Nature Reviews Urology, vol.12, pp. 100--118, February 2015. [Online]: http://dx.doi.org/10.1038/nrurol.2015.4
[^13]: H.Yu, W.Xiong, H.Zhang, W.Wang, and Z.Li, ''A parylene self-locking cuff electrode for peripheral nerve stimulation and recording,'' IEEE/ASME Journal of Microelectromechanical Systems, vol.23, no.5, pp. 1025--1035, Oct 2014. [Online]: http://dx.doi.org/10.1109/JMEMS.2014.2333733
[^14]: J.S. Ho, S.Kim, and A.S.Y. Poon, ''Midfield wireless powering for implantable systems,'' Proceedings of the IEEE, vol. 101, no.6, pp. 1369--1378, June 2013. [Online]: http://dx.doi.org/10.1109/JPROC.2013.2251851
[^15]: R.D. KEYNES, ''Excitable membranes,'' Nature, vol. 239, pp. 29--32, September 1972. [Online]: http://dx.doi.org/10.1038/239029a0
[^16]: A.D. Grosmark and G.Buzs\'aki, ''Diversity in neural firing dynamics supports both rigid and learned hippocampal sequences,'' Science, vol. 351, no. 6280, pp. 1440--1443, March 2016. [Online]: http://science.sciencemag.org/content/351/6280/1440
[^17]: B.Sakmann and E.Neher, ''Patch clamp techniques for studying ionic channels in excitable membranes,'' Annual Review of Physiology, vol.46, no.1, pp. 455--472, October 1984, pMID: 6143532. [Online]: http://dx.doi.org/10.1146/annurev.ph.46.030184.002323
[^18]: M.P. Ward, P.Rajdev, C.Ellison, and P.P. Irazoqui, ''Toward a comparison of microelectrodes for acute and chronic recordings,'' Brain Research, vol. 1282, pp. 183 -- 200, July 2009. [Online]: http://www.sciencedirect.com/science/article/pii/S0006899309010841
[^19]: J.E.B. Randles, ''Kinetics of rapid electrode reactions,'' Discuss. Faraday Soc., vol.1, pp. 11--19, 1947. [Online]: http://dx.doi.org/10.1039/DF9470100011
[^20]: M.E. Spira and A.Hai, ''Multi-electrode array technologies for neuroscience and cardiology,'' Nature Nanotechnology, vol.8, pp. 83 -- 94, February 2013. [Online]: http://dx.doi.org/10.1038/nnano.2012.265
[^21]: G.E. Moore, ''Cramming more components onto integrated circuits,'' Proceedings of the IEEE, vol.86, no.1, pp. 82--85, January 1998. [Online]: http://dx.doi.org/10.1109/JPROC.1998.658762
[^22]: I.Ferain, C.A. Colinge, and J.-P. Colinge, ''Multigate transistors as the future of classical metal-oxide-semiconductor field-effect transistors,'' Nature, vol. 479, pp. 310--316, November 2011. [Online]: http://dx.doi.org/10.1038/nature10676
[^23]: I.H. Stevenson and K.P. Kording, ''How advances in neural recording affect data analysis,'' Nature neuroscience, vol.14, no.2, pp. 139--142, February 2011. [Online]: http://dx.doi.org/10.1038/nn.2731
[^24]: C.Thomas, P.Springer, G.Loeb, Y.Berwald-Netter, and L.Okun, ''A miniature microelectrode array to monitor the bioelectric activity of cultured cells,'' Experimental cell research, vol.74, no.1, pp. 61--66, September 1972. [Online]: http://dx.doi.org/0.1016/0014-4827(72)90481-8
[^25]: R.A. Andersen, E.J. Hwang, and G.H. Mulliken, ''Cognitive neural prosthetics,'' Annual review of Psychology, vol.61, pp. 169--190, December 2010, pMID: 19575625. [Online]: http://dx.doi.org/10.1146/annurev.psych.093008.100503
[^26]: L.A. Jorgenson, W.T. Newsome, D.J. Anderson, C.I. Bargmann, E.N. Brown, K.Deisseroth, J.P. Donoghue, K.L. Hudson, G.S. Ling, P.R. MacLeish etal., ''The brain initiative: developing technology to catalyse neuroscience discovery,'' Philosophical Transactions of the Royal Society of London B: Biological Sciences, vol. 370, no. 1668, p. 20140164, 2015.
[^27]: E.DAngelo, G.Danese, G.Florimbi, F.Leporati, A.Majani, S.Masoli, S.Solinas, and E.Torti, ''The human brain project: High performance computing for brain cells hw/sw simulation and understanding,'' in Proceedings of the Digital System Design Conference, August 2015, pp. 740--747. [Online]: http://dx.doi.org/10.1109/DSD.2015.80
[^28]: K.Famm, B.Litt, K.J. Tracey, E.S. Boyden, and M.Slaoui, ''Drug discovery: a jump-start for electroceuticals,'' Nature, vol. 496, no. 7444, pp. 159--161, April 2013. [Online]: http://dx.doi.org/0.1038/496159a
[^29]: K.Deisseroth, ''Optogenetics,'' Nature methods, vol.8, no.1, pp. 26--29, January 2011. [Online]: http://dx.doi.org/10.1038/nmeth.f.324
[^30]: M.Velliste, S.Perel, M.C. Spalding, A.S. Whitford, and A.B. Schwartz, ''Cortical control of a prosthetic arm for self-feeding,'' Nature, vol. 453, no. 7198, pp. 1098--1101, June 2008. [Online]: http://dx.doi.org/10.1038/nature06996
[^31]: T.N. Theis and P.M. Solomon, ''In quest of the "next switch" prospects for greatly reduced power dissipation in a successor to the silicon field-effect transistor,'' Proceedings of the IEEE, vol.98, no.12, pp. 2005--2014, December 2010. [Online]: http://dx.doi.org/10.1109/JPROC.2010.2066531
[^32]: G.M. Amdahl, ''Validity of the single processor approach to achieving large scale computing capabilities, reprinted from the afips conference proceedings, vol. 30 (atlantic city, n.j., apr. 18-20), afips press, reston, va., 1967, pp. 483-485, when dr. amdahl was at international business machines corporation, sunnyvale, california,'' in AFIPS Conference Proceedings, Vol. 30 (Atlantic City, N.J., Apr. 18-20), vol.12, no.3.\hskip 1em plus 0.5em minus 0.4em
elax IEEE, Summer 2007, pp. 19--20. [Online]: http://dx.doi.org/0.1109/N-SSC.2007.4785615
[^33]: J.G. Koller and W.C. Athas, ''Adiabatic switching, low energy computing, and the physics of storing and erasing information,'' in IEEE Proceedings of the Workshop on Physics and Computation.\hskip 1em plus 0.5em minus 0.4em
elax IEEE, October 1992, pp. 267--270. [Online]: http://dx.doi.org/10.1109/PHYCMP.1992.615554
[^34]: E.P. DeBenedictis, J.E. Cook, M.F. Hoemmen, and T.S. Metodi, ''Optimal adiabatic scaling and the processor-in-memory-and-storage architecture (oas :pims),'' in IEEE Proceedings of the International Symposium on Nanoscale Architectures.\hskip 1em plus 0.5em minus 0.4em
elax IEEE, July 2015, pp. 69--74. [Online]: http://dx.doi.org/10.1109/NANOARCH.2015.7180589
[^35]: S.Houri, G.Billiot, M.Belleville, A.Valentian, and H.Fanet, ''Limits of cmos technology and interest of nems relays for adiabatic logic applications,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.62, no.6, pp. 1546--1554, June 2015. [Online]: http://dx.doi.org/10.1109/TCSI.2015.2415177
[^36]: S.K. Arfin and R.Sarpeshkar, ''An energy-efficient, adiabatic electrode stimulator with inductive energy recycling and feedback current regulation,'' IEEE Transactions on Biomedical Circuits and Systems, vol.6, no.1, pp. 1--14, February 2012. [Online]: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6036003&isnumber=6138606
[^37]: P.R. Kinget, ''Scaling analog circuits into deep nanoscale cmos: Obstacles and ways to overcome them,'' in IEEE Proceedings of the Custom Integrated Circuits Conference.\hskip 1em plus 0.5em minus 0.4em
elax IEEE, September 2015, pp. 1--8. [Online]: http://dx.doi.org/10.1109/CICC.2015.7338394
[^38]: K.Bernstein, D.J. Frank, A.E. Gattiker, W.Haensch, B.L. Ji, S.R. Nassif, E.J. Nowak, D.J. Pearson, and N.J. Rohrer, ''High-performance cmos variability in the 65-nm regime and beyond,'' IBM Journal of Research AND Development, vol.50, no. 4.5, pp. 433--449, July 2006. [Online]: http://dx.doi.org/10.1147/rd.504.0433
[^39]: L.L. Lewyn, T.Ytterdal, C.Wulff, and K.Martin, ''Analog circuit design in nanoscale cmos technologies,'' Proceedings of the IEEE, vol.97, no.10, pp. 1687--1714, October 2009. [Online]: http://dx.doi.org/10.1109/JPROC.2009.2024663
[^40]: Y.Xin, W.X.Y. Li, Z.Zhang, R.C.C. Cheung, D.Song, and T.W. Berger, ''An application specific instruction set processor (asip) for adaptive filters in neural prosthetics,'' IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.12, no.5, pp. 1034--1047, September 2015. [Online]: http://dx.doi.org/10.1109/TCBB.2015.2440248
[^41]: G.Schalk, P.Brunner, L.A. Gerhardt, H.Bischof, and J.R. Wolpaw, ''Brain-computer interfaces (bcis): detection instead of classification,'' Journal of neuroscience methods, vol. 167, no.1, pp. 51--62, 2008, brain-Computer Interfaces (BCIs). [Online]: http://www.sciencedirect.com/science/article/pii/S0165027007004116
[^42]: Z.Li, J.E. O'Doherty, T.L. Hanson, M.A. Lebedev, C.S. Henriquez, and M.A. Nicolelis, ''Unscented kalman filter for brain-machine interfaces,'' PloS one, vol.4, no.7, pp. 1--18, 2009. [Online]: http://dx.doi.org/10.1371/journal.pone.0006243
[^43]: A.L. Orsborn, H.G. Moorman, S.A. Overduin, M.M. Shanechi, D.F. Dimitrov, and J.M. Carmena, ''Closed-loop decoder adaptation shapes neural plasticity for skillful neuroprosthetic control,'' Neuron, vol.82, pp. 1380 -- 1393, March 2016. [Online]: http://dx.doi.org/10.1016/j.neuron.2014.04.048
[^44]: Y.Yan, X.Qin, Y.Wu, N.Zhang, J.Fan, and L.Wang, ''A restricted boltzmann machine based two-lead electrocardiography classification,'' in IEEE Proceedings of the International Conference on Wearable and Implantable Body Sensor Networks.\hskip 1em plus 0.5em minus 0.4em
elax IEEE, June 2015, pp. 1--9. [Online]: http://dx.doi.org/10.1109/BSN.2015.7299399
[^45]: B.M. Yu and J.P. Cunningham, ''Dimensionality reduction for large-scale neural recordings,'' Nature Neuroscience, vol.17, pp. 1500 -- 1509, November 2014. [Online]: http://dx.doi.org/10.1038/nn.3776
[^46]: S.Makeig, C.Kothe, T.Mullen, N.Bigdely-Shamlo, Z.Zhang, and K.Kreutz-Delgado, ''Evolving signal processing for brain: Computer interfaces,'' Proceedings of the IEEE, vol. 100, no. Special Centennial Issue, pp. 1567--1584, May 2012. [Online]: http://dx.doi.org/10.1109/JPROC.2012.2185009
[^47]: G.Indiveri and S.C. Liu, ''Memory and information processing in neuromorphic systems,'' Proceedings of the IEEE, vol. 103, no.8, pp. 1379--1397, August 2015. [Online]: http://dx.doi.org/10.1109/JPROC.2015.2444094
[^48]: Y.Chen, E.Yao, and A.Basu, ''A 128-channel extreme learning machine-based neural decoder for brain machine interfaces,'' IEEE Transactions on Biomedical Circuits and Systems, vol.10, no.3, pp. 679--692, June 2016. [Online]: http://dx.doi.org/10.1109/TBCAS.2015.2483618
[^49]: V.Karkare, S.Gibson, and D.Marković, ''A 75- $\mu$w, 16-channel neural spike-sorting processor with unsupervised clustering,'' IEEE Journal of Solid-State Circuits, vol.48, no.9, pp. 2230--2238, September 2013. [Online]: http://dx.doi.org/10.1109/JSSC.2013.2264616
[^50]: T.C. Chen, W.Liu, and L.G. Chen, ''128-channel spike sorting processor with a parallel-folding structure in 90nm process,'' in IEEE Proceedings of the International Symposium on Circuits and Systems, May 2009, pp. 1253--1256. [Online]: http://dx.doi.org/10.1109/ISCAS.2009.5117990
[^51]: G.Baranauskas, ''What limits the performance of current invasive brain machine interfaces?'' Frontiers in Systems Neuroscience, vol.8, no.68, April 2014. [Online]: http://www.frontiersin.org/systems_neuroscience/10.3389/fnsys.2014.00068
[^52]: E.F. Chang, ''Towards large-scale, human-based, mesoscopic neurotechnologies,'' Neuron, vol.86, pp. 68--78, March 2016. [Online]: http://dx.doi.org/10.1016/j.neuron.2015.03.037
[^53]: M.A.L. Nicolelis and M.A. Lebedev, ''Principles of neural ensemble physiology underlying the operation of brain-machine,'' Nature Reviews Neuroscience, vol.10, pp. 530--540, July 2009. [Online]: http://dx.doi.org/10.1038/nrn2653
[^54]: Z.Fekete, ''Recent advances in silicon-based neural microelectrodes and microsystems: a review,'' Sensors AND Actuators B: Chemical, vol. 215, pp. 300 -- 315, 2015. [Online]: http://www.sciencedirect.com/science/article/pii/S092540051500386X
[^55]: N.Saeidi, M.Schuettler, A.Demosthenous, and N.Donaldson, ''Technology for integrated circuit micropackages for neural interfaces, based on goldsilicon wafer bonding,'' Journal of Micromechanics AND Microengineering, vol.23, no.7, p. 075021, June 2013. [Online]: http://stacks.iop.org/0960-1317/23/i=7/a=075021
[^56]: K.Seidl, S.Herwik, T.Torfs, H.P. Neves, O.Paul, and P.Ruther, ''Cmos-based high-density silicon microprobe arrays for electronic depth control in intracortical neural recording,'' IEEE Journal of Microelectromechanical Systems, vol.20, no.6, pp. 1439--1448, December 2011. [Online]: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6033040&isnumber=6075219
[^57]: T.D.Y. Kozai, N.B. Langhals, P.R. Patel, X.Deng, H.Zhang, K.L. Smith, J.Lahann, N.A. Kotov, and D.R. Kipke, ''Ultrasmall implantable composite microelectrodes with bioactive surfaces for chronic neural interfaces,'' Nature Materials, vol.11, pp. 1065--1073, December 2012. [Online]: http://dx.doi.org/10.1038/nmat3468
[^58]: D.A. Schwarz, M.A. Lebedev, T.L. Hanson, D.F. Dimitrov, G.Lehew, J.Meloy, S.Rajangam, V.Subramanian, P.J. Ifft, Z.Li, A.Ramakrishnan, A.Tate, K.Z. Zhuang, and M.A.L. Nicolelis, ''Chronic, wireless recordings of large-scale brain activity in freely moving rhesus monkeys,'' Nature Methods, vol.11, pp. 670--676, April 2014. [Online]: http://dx.doi.org/10.1038/nmeth.2936
[^59]: P.Ruther, S.Herwik, S.Kisban, K.Seidl, and O.Paul, ''Recent progress in neural probes using silicon mems technology,'' IEEJ Transactions on Electrical and Electronic Engineering, vol.5, no.5, pp. 505--515, 2010. [Online]: http://dx.doi.org/10.1002/tee.20566
[^60]: ibitem3d-printH.-W. Kang, S.J. Lee, I.K. Ko, C.Kengla, J.J. Yoo, and A.Atala, ''A 3d bioprinting system to produce human-scale tissue constructs with structural integrity,'' Nature Biotechnology, vol.34, pp. 312--319, March 2016. [Online]: http://dx.doi.org/10.1038/nbt.3413
[^61]: ibitemdistrib-electC.Xie, J.Liu, T.-M. Fu, X.Dai, W.Zhou, and C.M. Lieber, ''Three-dimensional macroporous nanoelectronic networks as minimally invasive brain probes,'' Nature Materials, vol.14, pp. 1286--1292, May 2015. [Online]: http://dx.doi.org/10.1038/nmat4427
[^62]: R.R. Harrison, P.T. Watkins, R.J. Kier, R.O. Lovejoy, D.J. Black, B.Greger, and F.Solzbacher, ''A low-power integrated circuit for a wireless 100-electrode neural recording system,'' IEEE Journal of Solid-State Circuits, vol.42, no.1, pp. 123--133, Jan 2007. [Online]: http://dx.doi.org/10.1109/JSSC.2006.886567
[^63]: J.Guo, W.Ng, J.Yuan, S.Li, and M.Chan, ''A 200-channel area-power-efficient chemical and electrical dual-mode acquisition ic for the study of neurodegenerative diseases,'' IEEE Transactions on Biomedical Circuits and Systems, vol.10, no.3, pp. 567--578, June 2016. [Online]: http://dx.doi.org/10.1109/TBCAS.2015.2468052
[^64]: W.Biederman, D.J. Yeager, N.Narevsky, J.Leverett, R.Neely, J.M. Carmena, E.Alon, and J.M. Rabaey, ''A 4.78 mm 2 fully-integrated neuromodulation soc combining 64 acquisition channels with digital compression and simultaneous dual stimulation,'' IEEE Journal of Solid-State Circuits, vol.50, no.4, pp. 1038--1047, April 2015. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2384736
[^65]: R.Muller, S.Gambini, and J.M. Rabaey, ''A 0.013mm$^2$, $5 \mu w$, dc-coupled neural signal acquisition ic with 0.5v supply,'' IEEE Journal of Solid-State Circuits, vol.47, no.1, pp. 232--243, Jan 2012. [Online]: http://dx.doi.org/10.1109/JSSC.2011.2163552
[^66]: H.Kassiri, A.Bagheri, N.Soltani, K.Abdelhalim, H.M. Jafari, M.T. Salam, J.L.P. Velazquez, and R.Genov, ''Battery-less tri-band-radio neuro-monitor and responsive neurostimulator for diagnostics and treatment of neurological disorders,'' IEEE Journal of Solid-State Circuits, vol.51, no.5, pp. 1274--1289, May 2016. [Online]: http://dx.doi.org/10.1109/JSSC.2016.2528999
[^67]: M.Ballini, J.Müller, P.Livi, Y.Chen, U.Frey, A.Stettler, A.Shadmani, V.Viswam, I.L. Jones, D.Jäckel, M.Radivojevic, M.K. Lewandowska, W.Gong, M.Fiscella, D.J. Bakkum, F.Heer, and A.Hierlemann, ''A 1024-channel cmos microelectrode array with 26,400 electrodes for recording and stimulation of electrogenic cells in vitro,'' IEEE Journal of Solid-State Circuits, vol.49, no.11, pp. 2705--2719, Nov 2014. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2359219
[^68]: P.D. Wolf, Thermal considerations for the design of an implanted cortical brain--machine interface (BMI).\hskip 1em plus 0.5em minus 0.4em
elax CRC Press Boca Raton, FL, 2008, pMID: 21204402. [Online]: http://www.ncbi.nlm.nih.gov/books/NBK3932
[^69]: T.Denison, K.Consoer, W.Santa, A.T. Avestruz, J.Cooley, and A.Kelly, ''A 2 $\mu$w 100 nv/rthz chopper-stabilized instrumentation amplifier for chronic measurement of neural field potentials,'' IEEE Journal of Solid-State Circuits, vol.42, no.12, pp. 2934--2945, December 2007. [Online]: http://dx.doi.org/10.1109/JSSC.2007.908664
[^70]: B.Johnson, S.T. Peace, A.Wang, T.A. Cleland, and A.Molnar, ''A 768-channel cmos microelectrode array with angle sensitive pixels for neuronal recording,'' IEEE Sensors Journal, vol.13, no.9, pp. 3211--3218, Sept 2013. [Online]: http://dx.doi.org/10.1109/JSEN.2013.2266894
[^71]: C.M. Lopez, A.Andrei, S.Mitra, M.Welkenhuysen, W.Eberle, C.Bartic, R.Puers, R.F. Yazicioglu, and G.G.E. Gielen, ''An implantable 455-active-electrode 52-channel cmos neural probe,'' IEEE Journal of Solid-State Circuits, vol.49, no.1, pp. 248--261, January 2014. [Online]: http://dx.doi.org/10.1109/JSSC.2013.2284347
[^72]: J.Scholvin, J.P. Kinney, J.G. Bernstein, C.Moore-Kochlacs, N.Kopell, C.G. Fonstad, and E.S. Boyden, ''Close-packed silicon microelectrodes for scalable spatially oversampled neural recording,'' IEEE Transactions on Biomedical Engineering, vol.63, no.1, pp. 120--130, Jan 2016. [Online]: http://dx.doi.org/10.1109/TBME.2015.2406113
[^73]: M.Han, B.Kim, Y.A. Chen, H.Lee, S.H. Park, E.Cheong, J.Hong, G.Han, and Y.Chae, ''Bulk switching instrumentation amplifier for a high-impedance source in neural signal recording,'' IEEE Transactions on Circuits and Systems---Part II: Express Briefs, vol.62, no.2, pp. 194--198, Feb 2015. [Online]: http://dx.doi.org/10.1109/TCSII.2014.2368615
[^74]: R.Muller, S.Gambini, and J.M. Rabaey, ''A 0.013$ $mm$^2$, 5$ \mu$w, dc-coupled neural signal acquisition ic with 0.5 v supply,'' IEEE Journal of Solid-State Circuits, vol.47, no.1, pp. 232--243, Jan 2012. [Online]: http://dx.doi.org/10.1109/JSSC.2011.2163552
[^75]: ''Rhd2164 digital electrophysiology interface chip - data sheet,'' Intan Technologies, Los Angeles, California, December 2013. [Online]: http://www.intantech.com/files/Intan_RHD2164_datasheet.pdf
[^76]: K.M. Al-Ashmouny, S.I. Chang, and E.Yoon, ''A 4 $\mu$w/ch analog front-end module with moderate inversion and power-scalable sampling operation for 3-d neural microsystems,'' IEEE Transactions on Biomedical Circuits and Systems, vol.6, no.5, pp. 403--413, October 2012. [Online]: http://dx.doi.org/10.1109/TBCAS.2012.2218105
[^77]: D.Han, Y.Zheng, R.Rajkumar, G.S. Dawe, and M.Je, ''A 0.45 v 100-channel neural-recording ic with sub-$\mu$w/channel consumption in 0.18$\mu$m cmos,'' IEEE Transactions on Biomedical Circuits and Systems, vol.7, no.6, pp. 735--746, December 2013. [Online]: http://dx.doi.org/10.1109/TBCAS.2014.2298860
[^78]: S.B. Lee, H.M. Lee, M.Kiani, U.M. Jow, and M.Ghovanloo, ''An inductively powered scalable 32-channel wireless neural recording system-on-a-chip for neuroscience applications,'' IEEE Transactions on Biomedical Circuits and Systems, vol.4, no.6, pp. 360--371, Dec 2010. [Online]: http://dx.doi.org/10.1109/TBCAS.2010.2078814
[^79]: J.Yoo, L.Yan, D.El-Damak, M.A.B. Altaf, A.H. Shoeb, and A.P. Chandrakasan, ''An 8-channel scalable eeg acquisition soc with patient-specific seizure classification and recording processor,'' IEEE Journal of Solid-State Circuits, vol.48, no.1, pp. 214--228, Jan 2013. [Online]: http://dx.doi.org/10.1109/JSSC.2012.2221220
[^80]: M.A.B. Altaf and J.Yoo, ''A 1.83$ \mu$j/classification, 8-channel, patient-specific epileptic seizure classification soc using a non-linear support vector machine,'' IEEE Transactions on Biomedical Circuits and Systems, vol.10, no.1, pp. 49--60, Feb 2016. [Online]: http://dx.doi.org/10.1109/TBCAS.2014.2386891
[^81]: K.Abdelhalim, H.M. Jafari, L.Kokarovtseva, J.L.P. Velazquez, and R.Genov, ''64-channel uwb wireless neural vector analyzer soc with a closed-loop phase synchrony-triggered neurostimulator,'' IEEE Journal of Solid-State Circuits, vol.48, no.10, pp. 2494--2510, Oct 2013. [Online]: http://dx.doi.org/10.1109/JSSC.2013.2272952
[^82]: A.Bagheri, S.R.I. Gabran, M.T. Salam, J.L.P. Velazquez, R.R. Mansour, M.M.A. Salama, and R.Genov, ''Massively-parallel neuromonitoring and neurostimulation rodent headset with nanotextured flexible microelectrodes,'' IEEE Transactions on Biomedical Circuits and Systems, vol.7, no.5, pp. 601--609, Oct 2013. [Online]: http://dx.doi.org/10.1109/TBCAS.2013.2281772
[^83]: H.G. Rhew, J.Jeong, J.A. Fredenburg, S.Dodani, P.G. Patil, and M.P. Flynn, ''A fully self-contained logarithmic closed-loop deep brain stimulation soc with wireless telemetry and wireless power management,'' IEEE Journal of Solid-State Circuits, vol.49, no.10, pp. 2213--2227, Oct 2014. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2346779
[^84]: W.Biederman, D.J. Yeager, N.Narevsky, J.Leverett, R.Neely, J.M. Carmena, E.Alon, and J.M. Rabaey, ''A 4.78 mm 2 fully-integrated neuromodulation soc combining 64 acquisition channels with digital compression and simultaneous dual stimulation,'' IEEE Journal of Solid-State Circuits, vol.50, no.4, pp. 1038--1047, April 2015. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2384736
[^85]: A.Mendez, A.Belghith, and M.Sawan, ''A dsp for sensing the bladder volume through afferent neural pathways,'' IEEE Transactions on Biomedical Circuits and Systems, vol.8, no.4, pp. 552--564, Aug 2014. [Online]: http://dx.doi.org/10.1109/TBCAS.2013.2282087
[^86]: T.T. Liu and J.M. Rabaey, ''A 0.25 v 460 nw asynchronous neural signal processor with inherent leakage suppression,'' IEEE Journal of Solid-State Circuits, vol.48, no.4, pp. 897--906, April 2013. [Online]: http://dx.doi.org/10.1109/JSSC.2013.2239096
[^87]: D.Han, Y.Zheng, R.Rajkumar, G.S. Dawe, and M.Je, ''A 0.45 v 100-channel neural-recording ic with sub-$\mu$w/channel consumption in 0.18$ \mu$m cmos,'' IEEE Transactions on Biomedical Circuits and Systems, vol.7, no.6, pp. 735--746, Dec 2013. [Online]: http://dx.doi.org/10.1109/TBCAS.2014.2298860
[^88]: R.Muller, H.P. Le, W.Li, P.Ledochowitsch, S.Gambini, T.Bjorninen, A.Koralek, J.M. Carmena, M.M. Maharbiz, E.Alon, and J.M. Rabaey, ''A minimally invasive 64-channel wireless $\mu$ecog implant,'' IEEE Journal of Solid-State Circuits, vol.50, no.1, pp. 344--359, Jan 2015. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2364824
[^89]: B.Vigraham, J.Kuppambatti, and P.R. Kinget, ''Switched-mode operational amplifiers and their application to continuous-time filters in nanoscale cmos,'' IEEE Journal of Solid-State Circuits, vol.49, no.12, pp. 2758--2772, December 2014. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2354641
[^90]: V.Karkare, H.Chandrakumar, D.Rozgić, and D.Marković, ''Robust, reconfigurable, and power-efficient biosignal recording systems,'' in IEEE Proceedings of the Custom Integrated Circuits Conference, Sept 2014, pp. 1--8. [Online]: http://dx.doi.org/10.1109/CICC.2014.6946018
[^91]: L.B. Leene and T.G. Constandinou, ''A 0.45v continuous time-domain filter using asynchronous oscillator structures,'' in IEEE Proceedings of the International Conference on Electronics, Circuits and Systems, December 2016.
[^92]: R.Mohan, L.Yan, G.Gielen, C.V. Hoof, and R.F. Yazicioglu, ''0.35 v time-domain-based instrumentation amplifier,'' Electronics Letters, vol.50, no.21, pp. 1513--1514, October 2014. [Online]: http://dx.doi.org/10.1049/el.2014.2471
[^93]: X.Zhang, Z.Zhang, Y.Li, C.Liu, Y.X. Guo, and Y.Lian, ''A 2.89$ \mu$w dry-electrode enabled clockless wireless ecg soc for wearable applications,'' IEEE Journal of Solid-State Circuits, vol.51, no.10, pp. 2287--2298, Oct 2016. [Online]: http://dx.doi.org/10.1109/JSSC.2016.2582863
[^94]: M.Elia, L.B. Leene, and T.G. Constandinou, ''Continuous-time micropower interface for neural recording applications,'' in IEEE Proceedings of the International Symposium on Circuits and Systems, May 2016, pp. 534--537. [Online]: http://dx.doi.org/10.1109/ISCAS.2016.7527295
[^95]: N.Guo, Y.Huang, T.Mai, S.Patil, C.Cao, M.Seok, S.Sethumadhavan, and Y.Tsividis, ''Energy-efficient hybrid analog/digital approximate computation in continuous time,'' IEEE Journal of Solid-State Circuits, vol.51, no.7, pp. 1514--1524, July 2016. [Online]: http://dx.doi.org/10.1109/JSSC.2016.2543729
[^96]: B.Bozorgzadeh, D.R. Schuweiler, M.J. Bobak, P.A. Garris, and P.Mohseni, ''Neurochemostat: A neural interface soc with integrated chemometrics for closed-loop regulation of brain dopamine,'' IEEE Transactions on Biomedical Circuits and Systems, vol.10, no.3, pp. 654--667, June 2016. [Online]: http://dx.doi.org/10.1109/TBCAS.2015.2453791
[^97]: E.B. Myers and M.L. Roukes, ''Comparative advantages of mechanical biosensors,'' Nature nanotechnology, vol.6, no.4, pp. 1748--3387, April 2011. [Online]: http://dx.doi.org/10.1038/nnano.2011.44
[^98]: R.Machado, N.Soltani, S.Dufour, M.T. Salam, P.L. Carlen, R.Genov, and M.Thompson, ''Biofouling-resistant impedimetric sensor for array high-resolution extracellular potassium monitoring in the brain,'' Biosensors, vol.6, no.4, p.53, October 2016. [Online]: http://dx.doi.org/10.3390/bios6040053
[^99]: J.Guo, W.Ng, J.Yuan, S.Li, and M.Chan, ''A 200-channel area-power-efficient chemical and electrical dual-mode acquisition ic for the study of neurodegenerative diseases,'' IEEE Transactions on Biomedical Circuits and Systems, vol.10, no.3, pp. 567--578, June 2016. [Online]: http://dx.doi.org/10.1109/TBCAS.2015.2468052
[^100]: D.A. Dombeck, A.N. Khabbaz, F.Collman, T.L. Adelman, and D.W. Tank, ''Imaging large-scale neural activity with cellular resolution in awake, mobile mice.'' Neuron, vol.56, no.1, pp. 43--57, October 2007. [Online]: http://dx.doi.org/10.1016/j.neuron.2007.08.003
[^101]: T.York, S.B. Powell, S.Gao, L.Kahan, T.Charanya, D.Saha, N.W. Roberts, T.W. Cronin, J.Marshall, S.Achilefu, S.P. Lake, B.Raman, and V.Gruev, ''Bioinspired polarization imaging sensors: From circuits and optics to signal processing algorithms and biomedical applications,'' Proceedings of the IEEE, vol. 102, no.10, pp. 1450--1469, Oct 2014. [Online]: http://dx.doi.org/10.1109/JPROC.2014.2342537
[^102]: K.Paralikar, P.Cong, O.Yizhar, L.E. Fenno, W.Santa, C.Nielsen, D.Dinsmoor, B.Hocken, G.O. Munns, J.Giftakis, K.Deisseroth, and T.Denison, ''An implantable optical stimulation delivery system for actuating an excitable biosubstrate,'' IEEE Journal of Solid-State Circuits, vol.46, no.1, pp. 321--332, Jan 2011. [Online]: http://dx.doi.org/10.1109/JSSC.2010.2074110
[^103]: N.Ji and S.L. Smith, ''Technologies for imaging neural activity in large volumes,'' Nature Neuroscience, vol.19, pp. 1154--1164, September 2016. [Online]: http://dx.doi.org/10.1038/nn.4358
[^104]: S.Song, K.D. Miller, and L.F. Abbott, ''Competitive hebbian learning through spike-timing-dependent synaptic plasticity,'' Nature Neuroscience, vol.3, pp. 919--926, September 2000. [Online]: http://dx.doi.org/10.1038/78829
[^105]: T.Kurafuji, M.Haraguchi, M.Nakajima, T.Nishijima, T.Tanizaki, H.Yamasaki, T.Sugimura, Y.Imai, M.Ishizaki, T.Kumaki, K.Murata, K.Yoshida, E.Shimomura, H.Noda, Y.Okuno, S.Kamijo, T.Koide, H.J. Mattausch, and K.Arimoto, ''A scalable massively parallel processor for real-time image processing,'' IEEE Journal of Solid-State Circuits, vol.46, no.10, pp. 2363--2373, October 2011. [Online]: http://dx.doi.org/10.1109/JSSC.2011.2159528
[^106]: J.Y. Kim, M.Kim, S.Lee, J.Oh, K.Kim, and H.J. Yoo, ''A 201.4 gops 496 mw real-time multi-object recognition processor with bio-inspired neural perception engine,'' IEEE Journal of Solid-State Circuits, vol.45, no.1, pp. 32--45, Jan 2010. [Online]: http://dx.doi.org/10.1109/JSSC.2009.2031768
[^107]: C.C. Cheng, C.H. Lin, C.T. Li, and L.G. Chen, ''ivisual: An intelligent visual sensor soc with 2790 fps cmos image sensor and 205 gops/w vision processor,'' IEEE Journal of Solid-State Circuits, vol.44, no.1, pp. 127--135, Jan 2009. [Online]: http://dx.doi.org/10.1109/JSSC.2008.2007158
[^108]: H.Noda, M.Nakajima, K.Dosaka, K.Nakata, M.Higashida, O.Yamamoto, K.Mizumoto, T.Tanizaki, T.Gyohten, Y.Okuno, H.Kondo, Y.Shimazu, K.Arimoto, K.Saito, and T.Shimizu, ''The design and implementation of the massively parallel processor based on the matrix architecture,'' IEEE Journal of Solid-State Circuits, vol.42, no.1, pp. 183--192, Jan 2007. [Online]: http://dx.doi.org/10.1109/JSSC.2006.886545
[^109]: M.S. Chae, W.Liu, and M.Sivaprakasam, ''Design optimization for integrated neural recording systems,'' IEEE Journal of Solid-State Circuits, vol.43, no.9, pp. 1931--1939, September 2008. [Online]: http://dx.doi.org/10.1109/JSSC.2008.2001877
[^110]: K.J. Miller, L.B. Sorensen, J.G. Ojemann, and M.den Nijs, ''Power-law scaling in the brain surface electric potential,'' PLoS Comput Biol, vol.5, no.12, pp. 1--10, 12 2009. [Online]: http://dx.doi.org/10.1371%2Fjournal.pcbi.1000609
[^111]: R.Harrison and C.Charles, ''A low-power low-noise cmos amplifier for neural recording applications,'' IEEE Journal of Solid-State Circuits, vol.38, no.6, pp. 958--965, June 2003. [Online]: http://dx.doi.org/10.1109/JSSC.2003.811979
[^112]: W.Sansen, ''1.3 analog cmos from 5 micrometer to 5 nanometer,'' in IEEE Proceedings of the International Solid-State Circuits Conference.\hskip 1em plus 0.5em minus 0.4em
elax IEEE, February 2015, pp. 1--6. [Online]: http://dx.doi.org/10.1109/ISSCC.2015.7062848
[^113]: M.S.J. Steyaert and W.M.C. Sansen, ''A micropower low-noise monolithic instrumentation amplifier for medical purposes,'' IEEE Journal of Solid-State Circuits, vol.22, no.6, pp. 1163--1168, December 1987. [Online]: http://dx.doi.org/10.1109/JSSC.1987.1052869
[^114]: W.Wattanapanitch, M.Fee, and R.Sarpeshkar, ''An energy-efficient micropower neural recording amplifier,'' IEEE Transactions on Biomedical Circuits and Systems, vol.1, no.2, pp. 136--147, June 2007. [Online]: http://dx.doi.org/10.1109/TBCAS.2007.907868
[^115]: B.Johnson and A.Molnar, ''An orthogonal current-reuse amplifier for multi-channel sensing,'' IEEE Journal of Solid-State Circuits, vol.48, no.6, pp. 1487--1496, June 2013. [Online]: http://dx.doi.org/10.1109/JSSC.2013.2257478
[^116]: C.Qian, J.Parramon, and E.Sanchez-Sinencio, ''A micropower low-noise neural recording front-end circuit for epileptic seizure detection,'' IEEE Journal of Solid-State Circuits, vol.46, no.6, pp. 1392--1405, June 2011. [Online]: http://dx.doi.org/10.1109/JSSC.2011.2126370
[^117]: X.Zou, L.Liu, J.H. Cheong, L.Yao, P.Li, M.-Y. Cheng, W.L. Goh, R.Rajkumar, G.Dawe, K.-W. Cheng, and M.Je, ''A 100-channel 1-mw implantable neural recording ic,'' IEEE Transactions on Circuits and Systems---Part I: Regular Papers, vol.60, no.10, pp. 2584--2596, October 2013. [Online]: http://dx.doi.org/10.1109/TCSI.2013.2249175
[^118]: V.Majidzadeh, A.Schmid, and Y.Leblebici, ''Energy efficient low-noise neural recording amplifier with enhanced noise efficiency factor,'' IEEE Transactions on Biomedical Circuits and Systems, vol.5, no.3, pp. 262--271, June 2011. [Online]: http://dx.doi.org/10.1109/TBCAS.2010.2078815
[^119]: ibitemQ-basedC.C. Enz and E.A. Vittoz, Charge-based MOS transistor modeling: the EKV model for low-power AND RF IC design.\hskip 1em plus 0.5em minus 0.4em
elax John Wiley & Sons, August 2006. [Online]: http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470855452.html
[^120]: Y.Yasuda, T.-J.K. Liu, and C.Hu, ''Flicker-noise impact on scaling of mixed-signal cmos with hfsion,'' IEEE Transactions on Electron Devices, vol.55, no.1, pp. 417--422, January 2008. [Online]: http://dx.doi.org/10.1109/TED.2007.910759
[^121]: S.-Y. Wu, C.Lin, M.Chiang, J.Liaw, J.Cheng, S.Yang, M.Liang, T.Miyashita, C.Tsai, B.Hsu, H.Chen, T.Yamamoto, S.Chang, V.Chang, C.Chang, J.Chen, H.Chen, K.Ting, Y.Wu, K.Pan, R.Tsui, C.Yao, P.Chang, H.Lien, T.Lee, H.Lee, W.Chang, T.Chang, R.Chen, M.Yeh, C.Chen, Y.Chiu, Y.Chen, H.Huang, Y.Lu, C.Chang, M.Tsai, C.Liu, K.Chen, C.Kuo, H.Lin, S.Jang, and Y.Ku, ''A 16nm finfet cmos technology for mobile soc and computing applications,'' in IEEE Proceedings of the International Electron Devices Meeting, December 2013, pp. 9.1.1--9.1.4. [Online]: http://dx.doi.org/10.1109/IEDM.2013.6724591
[^122]: L.B. Leene, Y.Liu, and T.G. Constandinou, ''A compact recording array for neural interfaces,'' in IEEE Proceedings of the Biomedical Circuits and Systems Conference, October 2013, pp. 97--100. [Online]: http://dx.doi.org/10.1109/BioCAS.2013.6679648
[^123]: Q.Fan, F.Sebastiano, J.Huijsing, and K.Makinwa, ''A $1.8 \mu w\:60 nv/√Hz$ capacitively-coupled chopper instrumentation amplifier in 65 nm cmos for wireless sensor nodes,'' IEEE Journal of Solid-State Circuits, vol.46, no.7, pp. 1534--1543, July 2011. [Online]: http://dx.doi.org/10.1109/JSSC.2011.2143610
[^124]: H.Chandrakumar and D.Markovic, ''A simple area-efficient ripple-rejection technique for chopped biosignal amplifiers,'' IEEE Transactions on Circuits and Systems---Part II: Express Briefs, vol.62, no.2, pp. 189--193, February 2015. [Online]: http://dx.doi.org/10.1109/TCSII.2014.2387686
[^125]: H.Chandrakumar and D.Markovic, ''A 2$\mu$w 40mvpp linear-input-range chopper-stabilized bio-signal amplifier with boosted input impedance of 300mohm and electrode-offset filtering,'' in IEEE Proceedings of the International Solid-State Circuits Conference.\hskip 1em plus 0.5em minus 0.4em
elax IEEE, January 2016, pp. 96--97. [Online]: http://dx.doi.org/10.1109/ISSCC.2016.7417924
[^126]: H.Rezaee-Dehsorkh, N.Ravanshad, R.Lotfi, K.Mafinezhad, and A.M. Sodagar, ''Analysis and design of tunable amplifiers for implantable neural recording applications,'' IEEE Transactions on Emerging and Selected Topics in Circuits and Systems, vol.1, no.4, pp. 546--556, December 2011. [Online]: http://dx.doi.org/10.1109/JETCAS.2011.2174492
[^127]: X.Zou, X.Xu, L.Yao, and Y.Lian, ''A 1-v 450-nw fully integrated programmable biomedical sensor interface chip,'' IEEE Journal of Solid-State Circuits, vol.44, no.4, pp. 1067--1077, April 2009. [Online]: http://dx.doi.org/10.1109/JSSC.2009.2014707
[^128]: L.Leene and T.Constandinou, ''Ultra-low power design strategy for two-stage amplifier topologies,'' Electronics Letters, vol.50, no.8, pp. 583--585, April 2014. [Online]: http://dx.doi.org/10.1049/el.2013.4196
[^129]: H.G. Rey, C.Pedreira, and R.Q. Quiroga, ''Past, present and future of spike sorting techniques,'' Brain Research Bulletin, vol. 119, Part B, pp. 106--117, October 2015, advances in electrophysiological data analysis. [Online]: http://www.sciencedirect.com/science/article/pii/S0361923015000684
[^130]: Y.Chen, A.Basu, L.Liu, X.Zou, R.Rajkumar, G.S. Dawe, and M.Je, ''A digitally assisted, signal folding neural recording amplifier,'' IEEE Transactions on Biomedical Circuits and Systems, vol.8, no.4, pp. 528--542, August 2014. [Online]: http://dx.doi.org/10.1109/TBCAS.2013.2288680
[^131]: X.Yue, ''Determining the reliable minimum unit capacitance for the dac capacitor array of sar adcs,'' Microelectronics Journal, vol.44, no.6, pp. 473 -- 478, 2013. [Online]: http://www.sciencedirect.com/science/article/pii/S0026269213000815
[^132]: Y.Zhu, C.-H. Chan, U.-F. Chio, S.-W. Sin, S.-P. U, R.Martins, and F.Maloberti, ''Split-sar adcs: Improved linearity with power and speed optimization,'' IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.22, no.2, pp. 372--383, February 2014. [Online]: http://dx.doi.org/10.1109/TVLSI.2013.2242501
[^133]: L.Xie, G.Wen, J.Liu, and Y.Wang, ''Energy-efficient hybrid capacitor switching scheme for sar adc,'' Electronics Letters, vol.50, no.1, pp. 22--23, January 2014. [Online]: http://dx.doi.org/10.1049/el.2013.2794
[^134]: P.Nuzzo, F.DeBernardinis, P.Terreni, and G.Vander Plas, ''Noise analysis of regenerative comparators for reconfigurable adc architectures,'' IEEE Transactions on Circuits and Systems---Part I: Regular Papers, vol.55, no.6, pp. 1441--1454, July 2008. [Online]: http://dx.doi.org/10.1109/TCSI.2008.917991
[^135]: G.Heinzel, A.R\"udiger, and R.Schilling, ''Spectrum and spectral density estimation by the discrete fourier transform (dft), including a comprehensive list of window functions and some new at-top windows,'' pp. 25--27, February 2002. [Online]: http://hdl.handle.net/11858/00-001M-0000-0013-557A-5
[^136]: F.Gerfers, M.Ortmanns, and Y.Manoli, ''A 1.5-v 12-bit power-efficient continuous-time third-order sigma; delta; modulator,'' IEEE Journal of Solid-State Circuits, vol.38, no.8, pp. 1343--1352, Aug 2003. [Online]: http://dx.doi.org/10.1109/JSSC.2003.814432
[^137]: Y.Chae, K.Souri, and K.A.A. Makinwa, ''A 6.3$ \mu$w 20$ $bit incremental zoom-adc with 6 ppm inl and 1 $\mu$v offset,'' IEEE Journal of Solid-State Circuits, vol.48, no.12, pp. 3019--3027, Dec 2013. [Online]: http://dx.doi.org/10.1109/JSSC.2013.2278737
[^138]: Y.S. Shu, L.T. Kuo, and T.Y. Lo, ''An oversampling sar adc with dac mismatch error shaping achieving 105db sfdr and 101db sndr over 1khz bw in 55nm cmos,'' in IEEE Proceedings of the International Solid-State Circuits Conference, January 2016, pp. 458--459. [Online]: http://dx.doi.org/10.1109/ISSCC.2016.7418105
[^139]: P.Harpe, E.Cantatore, and A.van Roermund, ''An oversampled 12/14b sar adc with noise reduction and linearity enhancements achieving up to 79.1db sndr,'' in IEEE Proceedings of the International Solid-State Circuits Conference, February 2014, pp. 194--195. [Online]: http://dx.doi.org/10.1109/ISSCC.2014.6757396
[^140]: ibitemchrch-turingM.Braverman, J.Schneider, and C.Rojas, ''Space-bounded church-turing thesis and computational tractability of closed systems,'' Physical Review Letters, vol. 115, August 2015. [Online]: http://link.aps.org/doi/10.1103/PhysRevLett.115.098701
[^141]: M.Verhelst and A.Bahai, ''Where analog meets digital: Analog-to-information conversion and beyond,'' IEEE Solid-State Circuits Magazine, vol.7, no.3, pp. 67--80, September 2015. [Online]: http://dx.doi.org/10.1109/MSSC.2015.2442394
[^142]: H.A. Marblestone, M.B. Zamft, G.Y. Maguire, G.M. Shapiro, R.T. Cybulski, I.J. Glaser, D.Amodei, P.B. Stranges, R.Kalhor, A.D. Dalrymple, D.Seo, E.Alon, M.M. Maharbiz, M.J. Carmena, M.J. Rabaey, S.E. Boyden, M.G. Church, and P.K. Kording, ''Physical principles for scalable neural recording,'' Frontiers in Computational Neuroscience, vol.7, no. 137, 2013. [Online]: http://www.frontiersin.org/computational_neuroscience/10.3389/fncom.2013.00137
[^143]: L.Traver, C.Tarin, P.Marti, and N.Cardona, ''Adaptive-threshold neural spike by noise-envelope tracking,'' Electronics Letters, vol.43, no.24, pp. 1333--1335, November 2007. [Online]: http://dx.doi.org/10.1049/el:20071631
[^144]: I.Obeid and P.Wolf, ''Evaluation of spike-detection algorithms fora brain-machine interface application,'' IEEE Transactions on Biomedical Engineering, vol.51, no.6, pp. 905--911, June 2004. [Online]: http://dx.doi.org/10.1109/TBME.2004.826683
[^145]: P.Watkins, G.Santhanam, K.Shenoy, and R.Harrison, ''Validation of adaptive threshold spike detector for neural recording,'' in IEEE Proceedings of the International Conference on Engineering in Medicine and Biology Society, vol.2, September 2004, pp. 4079--4082. [Online]: http://dx.doi.org/10.1109/IEMBS.2004.1404138
[^146]: T.Takekawa, Y.Isomura, and T.Fukai, ''Accurate spike sorting for multi-unit recordings,'' European Journal of Neuroscience, vol.31, no.2, pp. 263--272, 2010. [Online]: http://dx.doi.org/10.1111/j.1460-9568.2009.07068.x
[^147]: A.Zviagintsev, Y.Perelman, and R.Ginosar, ''Low-power architectures for spike sorting,'' in IEEE Proceedings of the International Conference on Neural Engineering, March 2005, pp. 162--165. [Online]: http://dx.doi.org/10.1109/CNE.2005.1419579
[^148]: A.Rodriguez-Perez, J.Ruiz-Amaya, M.Delgado-Restituto, and A.Rodriguez-Vazquez, ''A low-power programmable neural spike detection channel with embedded calibration and data compression,'' IEEE Transactions on Biomedical Circuits and Systems, vol.6, no.2, pp. 87--100, April 2012. [Online]: http://dx.doi.org/10.1109/TBCAS.2012.2187352
[^149]: U.Rutishauser, E.M. Schuman, and A.N. Mamelak, ''Online detection and sorting of extracellularly recorded action potentials in human medial temporal lobe recordings, in vivo,'' Journal of Neuroscience Methods, vol. 154, no. 12, pp. 204 -- 224, 2006. [Online]: http://www.sciencedirect.com/science/article/pii/S0165027006000033
[^150]: F.Franke, M.Natora, C.Boucsein, M.Munk, and K.Obermayer, ''\BIBforeignlanguageEnglishAn online spike detection and spike classification algorithm capable of instantaneous resolution of overlapping spikes,'' \BIBforeignlanguageEnglishJournal of Computational Neuroscience, vol.29, no. 1-2, pp. 127--148, 2010. [Online]: http://dx.doi.org/10.1007/s10827-009-0163-5
[^151]: M.S. Chae, Z.Yang, M.Yuce, L.Hoang, and W.Liu, ''A 128-channel 6 mw wireless neural recording ic with spike feature extraction and uwb transmitter,'' IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol.17, no.4, pp. 312--321, August 2009. [Online]: http://dx.doi.org/10.1109/TNSRE.2009.2021607
[^152]: P.H. Thakur, H.Lu, S.S. Hsiao, and K.O. Johnson, ''Automated optimal detection and classification of neural action potentials in extra-cellular recordings,'' Journal of Neuroscience Methods, vol. 162, no. 12, pp. 364 -- 376, 2007. [Online]: ttp://www.sciencedirect.com/science/article/pii/S0165027007000477
[^153]: J.Zhang, Y.Suo, S.Mitra, S.Chin, S.Hsiao, R.Yazicioglu, T.Tran, and R.Etienne-Cummings, ''An efficient and compact compressed sensing microsystem for implantable neural recordings,'' IEEE Transactions on Biomedical Circuits and Systems, vol.8, no.4, pp. 485--496, August 2014. [Online]: http://dx.doi.org/10.1109/TBCAS.2013.2284254
[^154]: Y.Suo, J.Zhang, T.Xiong, P.S. Chin, R.Etienne-Cummings, and T.D. Tran, ''Energy-efficient multi-mode compressed sensing system for implantable neural recordings,'' IEEE Transactions on Biomedical Circuits and Systems, vol.8, no.5, pp. 648--659, October 2014. [Online]: http://dx.doi.org/10.1109/TBCAS.2014.2359180
[^155]: B.Yu, T.Mak, X.Li, F.Xia, A.Yakovlev, Y.Sun, and C.S. Poon, ''Real-time fpga-based multichannel spike sorting using hebbian eigenfilters,'' IEEE Transactions on Emerging and Selected Topics in Circuits and Systems, vol.1, no.4, pp. 502--515, December 2011. [Online]: http://dx.doi.org/10.1109/JETCAS.2012.2183430
[^156]: V.Ventura, ''Automatic spike sorting using tuning information,'' Neural computation, vol.21, no.9, pp. 2466--2501, September 2009. [Online]: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4167425/
[^157]: D.Y. Barsakcioglu, A.Eftekhar, and T.G. Constandinou, ''Design optimisation of front-end neural interfaces for spike sorting systems,'' in IEEE Proceedings of the International Symposium on Circuits and Systems, May 2013, pp. 2501--2504. [Online]: http://dx.doi.org/10.1109/ISCAS.2013.6572387
[^158]: A.M. Sodagar, K.D. Wise, and K.Najafi, ''A fully integrated mixed-signal neural processor for implantable multichannel cortical recording,'' IEEE Transactions on Biomedical Engineering, vol.54, no.6, pp. 1075--1088, June 2007. [Online]: http://dx.doi.org/10.1109/TBME.2007.894986
[^159]: Y.Xin, W.X. Li, R.C. Cheung, R.H. Chan, H.Yan, D.Song, and T.W. Berger, ''An fpga based scalable architecture of a stochastic state point process filter (ssppf) to track the nonlinear dynamics underlying neural spiking,'' Microelectronics Journal, vol.45, no.6, pp. 690 -- 701, June 2014. [Online]: http://www.sciencedirect.com/science/article/pii/S0026269214000913
[^160]: C.Qian, J.Shi, J.Parramon, and E.Sánchez-Sinencio, ''A low-power configurable neural recording system for epileptic seizure detection,'' IEEE Transactions on Biomedical Circuits and Systems, vol.7, no.4, pp. 499--512, August 2013. [Online]: http://dx.doi.org/10.1109/TBCAS.2012.2228857
[^161]: K.C. Chun, P.Jain, J.H. Lee, and C.H. Kim, ''A 3t gain cell embedded dram utilizing preferential boosting for high density and low power on-die caches,'' IEEE Journal of Solid-State Circuits, vol.46, no.6, pp. 1495--1505, June 2011. [Online]: http://dx.doi.org/10.1109/JSSC.2011.2128150
[^162]: R.E. Matick and S.E. Schuster, ''Logic-based edram: Origins and rationale for use,'' IBM Journal of Research AND Development, vol.49, no.1, pp. 145--165, January 2005. [Online]: http://dx.doi.org/10.1147/rd.491.0145
[^163]: R.Nair, ''Evolution of memory architecture,'' Proceedings of the IEEE, vol. 103, no.8, pp. 1331--1345, August 2015. [Online]: http://dx.doi.org/10.1109/JPROC.2015.2435018
[^164]: C.E. Molnar and I.W. Jones, ''Simple circuits that work for complicated reasons,'' in IEEE Proceedings of the International Symposium on Advanced Research in Asynchronous Circuits and Systems, 2000, pp. 138--149. [Online]: http://dx.doi.org/10.1109/ASYNC.2000.836995
[^165]: ibitemBN-formH.Schorr, ''Computer-aided digital system design and analysis using a register transfer language,'' IEEE Transactions on Electronic Computers, vol. EC-13, no.6, pp. 730--737, December 1964. [Online]: http://dx.doi.org/10.1109/PGEC.1964.263907
[^166]: D.Wang, A.Rajendiran, S.Ananthanarayanan, H.Patel, M.Tripunitara, and S.Garg, ''Reliable computing with ultra-reduced instruction set coprocessors,'' IEEE Micro, vol.34, no.6, pp. 86--94, November 2014. [Online]: http://dx.doi.org/10.1109/MM.2013.130
[^167]: ''Msp430g2x53 mixed signal microcontroller - data sheet,'' Texas Instruments Incorporated, Dallas, Texas, pp. 403--413, May 2013. [Online]: http://www.ti.com/lit/ds/symlink/msp430g2553.pdf
[^168]: F.L. Yuan, C.C. Wang, T.H. Yu, and D.Marković, ''A multi-granularity fpga with hierarchical interconnects for efficient and flexible mobile computing,'' IEEE Journal of Solid-State Circuits, vol.50, no.1, pp. 137--149, January 2015. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2372034
[^169]: B.Vigraham, J.Kuppambatti, and P.R. Kinget, ''Switched-mode operational amplifiers and their application to continuous-time filters in nanoscale cmos,'' IEEE Journal of Solid-State Circuits, vol.49, no.12, pp. 2758--2772, December 2014. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2354641
[^170]: Y.Tsividis, ''Event-driven data acquisition and continuous-time digital signal processing,'' in IEEE Proceedings of the Custom Integrated Circuits Conference, September 2010, pp. 1--8. [Online]: http://dx.doi.org/10.1109/CICC.2010.5617618
[^171]: I.Lee, D.Sylvester, and D.Blaauw, ''A constant energy-per-cycle ring oscillator over a wide frequency range for wireless sensor nodes,'' IEEE Journal of Solid-State Circuits, vol.51, no.3, pp. 697--711, March 2016. [Online]: http://dx.doi.org/10.1109/JSSC.2016.2517133
[^172]: B.Drost, M.Talegaonkar, and P.K. Hanumolu, ''Analog filter design using ring oscillator integrators,'' IEEE Journal of Solid-State Circuits, vol.47, no.12, pp. 3120--3129, December 2012. [Online]: http://dx.doi.org/10.1109/JSSC.2012.2225738
[^173]: V.Unnikrishnan and M.Vesterbacka, ''Time-mode analog-to-digital conversion using standard cells,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.61, no.12, pp. 3348--3357, December 2014. [Online]: http://dx.doi.org/10.1109/TCSI.2014.2340551
[^174]: K.Yang, D.Blaauw, and D.Sylvester, ''An all-digital edge racing true random number generator robust against pvt variations,'' IEEE Journal of Solid-State Circuits, vol.51, no.4, pp. 1022--1031, April 2016. [Online]: http://dx.doi.org/10.1109/JSSC.2016.2519383
[^175]: ibitem0.5V-CircuitS.Chatterjee, Y.Tsividis, and P.Kinget, ''0.5-v analog circuit techniques and their application in ota and filter design,'' IEEE Journal of Solid-State Circuits, vol.40, no.12, pp. 2373--2387, December 2005. [Online]: http://dx.doi.org/10.1109/JSSC.2005.856280
[^176]: M.Alioto, ''Understanding dc behavior of subthreshold cmos logic through closed-form analysis,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.57, no.7, pp. 1597--1607, July 2010. [Online]: http://dx.doi.org/10.1109/TCSI.2009.2034233
[^177]: A.Hajimiri and T.Lee, ''A general theory of phase noise in electrical oscillators,'' IEEE Journal of Solid-State Circuits, vol.33, no.2, pp. 179--194, February 1998. [Online]: http://dx.doi.org/10.1109/4.658619
[^178]: A.Demir, A.Mehrotra, and J.Roychowdhury, ''Phase noise in oscillators: a unifying theory and numerical methods for characterization,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.47, no.5, pp. 655--674, May 2000. [Online]: http://dx.doi.org/10.1109/81.847872
[^179]: A.Hajimiri, S.Limotyrakis, and T.Lee, ''Phase noise in multi-gigahertz cmos ring oscillators,'' in IEEE Proceedings of the Custom Integrated Circuits Conference, May 1998, pp. 49--52. [Online]: http://dx.doi.org/10.1109/CICC.1998.694905
[^180]: W.Jiang, V.Hokhikyan, H.Chandrakumar, V.Karkare, and D.Markovic, ''A ±50mv linear-input-range vco-based neural-recording front-end with digital nonlinearity correction,'' in IEEE Proceedings of the International Solid-State Circuits Conference, January 2016, pp. 484--485. [Online]: http://dx.doi.org/10.1109/ISSCC.2016.7418118
[^181]: C.Weltin-Wu and Y.Tsividis, ''An event-driven clockless level-crossing adc with signal-dependent adaptive resolution,'' IEEE Journal of Solid-State Circuits, vol.48, no.9, pp. 2180--2190, September 2013. [Online]: http://dx.doi.org/10.1109/JSSC.2013.2262738
[^182]: H.Y. Yang and R.Sarpeshkar, ''A bio-inspired ultra-energy-efficient analog-to-digital converter for biomedical applications,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.53, no.11, pp. 2349--2356, November 2006. [Online]: http://dx.doi.org/10.1109/TCSI.2006.884463
[^183]: F.Corradi and G.Indiveri, ''A neuromorphic event-based neural recording system for smart brain-machine-interfaces,'' IEEE Transactions on Biomedical Circuits and Systems, vol.9, no.5, pp. 699--709, October 2015. [Online]: http://dx.doi.org/10.1109/TBCAS.2015.2479256
[^184]: K.A. Ng and Y.P. Xu, ''A compact, low input capacitance neural recording amplifier,'' IEEE Transactions on Biomedical Circuits and Systems, vol.7, no.5, pp. 610--620, October 2013. [Online]: http://dx.doi.org/10.1109/TBCAS.2013.2280066
[^185]: J.Agustin and M.Lopez-Vallejo, ''An in-depth analysis of ring oscillators: Exploiting their configurable duty-cycle,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.62, no.10, pp. 2485--2494, October 2015. [Online]: http://dx.doi.org/10.1109/TCSI.2015.2476300
[^186]: K.Ng and Y.P. Xu, ''A compact, low input capacitance neural recording amplifier,'' IEEE Transactions on Biomedical Circuits and Systems, vol.7, no.5, pp. 610--620, October 2013. [Online]: http://dx.doi.org/10.1109/TBCAS.2013.2280066
[^187]: M.Elia, L.B. Leene, and T.G. Constandinou, ''Continuous-time micropower interface for neural recording applications,'' in IEEE Proceedings of the International Symposium on Circuits and Systems, May 2016.
[^188]: Y.W. Li, K.L. Shepard, and Y.P. Tsividis, ''A continuous-time programmable digital fir filter,'' IEEE Journal of Solid-State Circuits, vol.41, no.11, pp. 2512--2520, November 2006. [Online]: http://dx.doi.org/10.1109/JSSC.2006.883314
[^189]: B.Schell and Y.Tsividis, ''A continuous-time adc/dsp/dac system with no clock and with activity-dependent power dissipation,'' IEEE Journal of Solid-State Circuits, vol.43, no.11, pp. 2472--2481, November 2008. [Online]: http://dx.doi.org/10.1109/JSSC.2008.2005456
[^190]: S.Aouini, K.Chuai, and G.W. Roberts, ''Anti-imaging time-mode filter design using a pll structure with transfer function dft,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.59, no.1, pp. 66--79, January 2012. [Online]: http://dx.doi.org/10.1109/TCSI.2011.2161411
[^191]: X.Xing and G.G.E. Gielen, ''A 42 fj/step-fom two-step vco-based delta-sigma adc in 40 nm cmos,'' IEEE Journal of Solid-State Circuits, vol.50, no.3, pp. 714--723, March 2015. [Online]: http://dx.doi.org/10.1109/JSSC.2015.2393814
[^192]: K.Reddy, S.Rao, R.Inti, B.Young, A.Elshazly, M.Talegaonkar, and P.K. Hanumolu, ''A 16-mw 78-db sndr 10-mhz bw ct $\delta \sigma$ adc using residue-cancelling vco-based quantizer,'' IEEE Journal of Solid-State Circuits, vol.47, no.12, pp. 2916--2927, December 2012. [Online]: http://dx.doi.org/10.1109/JSSC.2012.2218062
[^193]: J.Daniels, W.Dehaene, M.S.J. Steyaert, and A.Wiesbauer, ''A/d conversion using asynchronous delta-sigma modulation and time-to-digital conversion,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.57, no.9, pp. 2404--2412, September 2010. [Online]: http://dx.doi.org/10.1109/TCSI.2010.2043169
[^194]: F.M. Yaul and A.P. Chandrakasan, ''A sub-$\mu$w 36nv/$√Hz$ chopper amplifier for sensors using a noise-efficient inverter-based 0.2v-supply input stage,'' in IEEE Proceedings of the International Solid-State Circuits Conference, January 2016, pp. 94--95. [Online]: http://dx.doi.org/10.1109/ISSCC.2016.7417923
[^195]: S.Patil, A.Ratiu, D.Morche, and Y.Tsividis, ''A 3-10 fj/conv-step error-shaping alias-free continuous-time adc,'' IEEE Journal of Solid-State Circuits, vol.51, no.4, pp. 908--918, April 2016. [Online]: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7433385&isnumber=7446371
[^196]: J.M. Duarte-Carvajalino and G.Sapiro, ''Learning to sense sparse signals: Simultaneous sensing matrix and sparsifying dictionary optimization,'' IEEE Transactions on Image Processing, vol.18, no.7, pp. 1395--1408, July 2009. [Online]: http://dx.doi.org/10.1109/TIP.2009.2022459

View File

@ -0,0 +1,651 @@
---
title: "Brain machine interfaces: Neuron Processor Interface"
date: 2016-08-08T15:26:46+01:00
draft: false
toc: true
math: true
type: posts
tags:
- chapter
- thesis
- CMOS
- biomedical
---
Lieuwe B. Leene, Yan Liu, Timothy G. Constandinou
Department of Electrical and Electronic Engineering, Imperial College London, SW7 2BT, UK
Centre for Bio-Inspired Technology, Institute of Biomedical Engineering, Imperial College London, SW7 2AZ, UK
A core aspect of emerging neuroscience is quintessentially performing real-time data analysis at a massive scale. However when we observe its manifestation in state-of-the-art neural interfaces we find the hardware is very limited to specific methods that can be objectively short-sighted. This chapter aims to direct our attention to a different point of view with respect to how these sensor systems can be structured. In particular we are guided by the concept where an implant is capable of performing software defined instrumentation. This is associated with a focus that lies with enabling real time & in-vivo testing of a more diverse set of signal characterization methods. More importantly we will demonstrate that this can be made feasible for large scale distributed systems.
This particular approach is motivated by a number of factors that aim to increase performance and enable research opportunities. The first is that many aspects with regard to the signal quality of an implant cannot be predicted beforehand. As a result implementing a specific algorithm for specific signal characteristics may lead to failure or an overly conservative design because the environment can potentially be excessively noisy. By introducing the capacity to dynamically execute different processing methods on neural data the implanted system to use either LFP and EAP activity in real-time. This may be a significant element to improving the success for chronic BMI implants. Moreover the prolific development in characterization methods used for decoding neural data inhibits a general consensus for DSP techniques. This prevents a single method and corresponding architecture to be applicable in most scenarios. The second factor is that this approach conceptually enables the development of real-time resource constrained algorithms which are virtues often neglected when working with data sets. Currently most BMI development platforms have limited capabilities to allow algorithms to use external or multi-modal features to inform local operation and simultaneously provide recordings from hundreds of electrodes. This construction may be a key factor to allowing high level algorithms to directly manipulate machine learning parameters local to each implant. This hierarchical fashion should improve the efficiency of distributed BMIs for decoding information. In contrast we question the feasibility for scaling the current supervised methods that require fine measurements of each electrode's recording to approach a optimal decoding strategy. Typically the computational efficiency of this approach remains exhaustive when reconfiguring sensor parameters because it use a centralized unit that recalibrates all recording channels in an elaborate fashion.
This chapter is organized as follows; Section 31 motivates localized processing for increased efficiency and estimates to what extent we can perform on-chip processing. This is followed by Section 32 where typical methods used for neural signal analysis are introduced and the respective hardware complexity is demonstrated in Section 33. This leads into the proposed distributed processing architecture in Section 38 where the design is discussed with respect to the implementation. Section 41 demonstrates the realization of this platform. Finally Section 42 draws conclusions with respect to the digital approach to neural instrumentation.
# 31 Processing at the interface
Ideally a neural interface device is tasked with recording from a large ensemble of electrodes and transmitting information with the lowest bandwidth because the harvested power is a scarce resource for implants inside the body. However over the course of an implant's life time most signal characteristics are dynamically changing which implies that there should be an involved learning process that similarly adapts to these changes. This can also mean that the output bandwidth is constrained by the total amount of mutual information that can be retained within the device. Such a device will predict the expected recording from one time interval to the next and differentiate any new information that needs to be transmitted. Hence we should be convinced that the processing capacity or complexity for a closed or memory limited system should reflect in its fundamental ability to store information [^140].
In order to capture some high level trends with respect to processing requirements let us normalise memory capacity in terms of state variables that is independent of modality. This is particularly useful because the number of state variables in a dynamic process is a good indicator for complexity whether is a digital classifier or an analogue filter. Here we will exclusively focus on processing by assuming the signal being operated on is idealized with respect amplitude and its representation. This extends from our analysis in Section\ref{ch:T1_model} by elaborating specifically on comparing digital and analogue resource allocation associated with processing.
$$ R_{A} = \underbrace{\frac{2\pi BW kT SNR^2 U_T}{V_{DD} L}}_{power} \cdot \underbrace{\left( \frac{A_{min}}{L} +\frac{kT SNR^2}{L Vdd^2 C_{dens}} \right)}_{Area} $$
If we represent the resource required as the power area product for a state variable then in the analogue domain it would be represented by Equation 32. Here \\(BW\\), \\(V_{DD}\\), \\(C_{dense}\\), \\(A_{min}\\) reflect the signal bandwidth, supply voltage, capacitor density, and typical transconductance area overhead for a particular technology respectively. \\(L\\) is a normalized feature size that allows us to evaluate parameters for a particular technology and extrapolate them based on constant field scaling factors.
$$ R_{D} = \underbrace{ 2BW \alpha \log_2(SNR) C_{gate} V_{dd}^2 L^2 }_{power} \cdot\underbrace{ \alpha \log_2(SNR) A_{gate} L^2}_{Area} $$
Similarly Equation 33 represents the power area product for a digital state variable. \\(C_{gate}\\), \\(A_{gate}\\), \\(\alpha\\) parametrise typical gate capacitance, area, and overhead for each register respectively. Generally the dependency of both parameters \\(R_A\\) and \\(R_D\\) are well understood and guide maximizing system efficiency in an abstract sense [^141].
{{< figure src="technical_2/impact.png" title="Figure 49: Impact of technology on \\(R_A\\) analogue (green) and \\(R_D\\) digital (blue) processing resource requirements extrapolated from a \\(180 nm\\) CMOS technology under constant field scaling." width="500" >}}
For neural instrumentation however both power and area requirements must be highly constrained in order to realize a device that can accommodate a large number of recording channels and remain implantable. Figure 49 shows the resource requirements for processing in the analogue and digital domain with respect to signal fidelity and CMOS technology. Either approach can present an advantage over the other under specific conditions. Digital systems appear favourable beyond \\(65 nm\\) CMOS where analogue will do better at lower SNR conditions given a technology with a larger feature size. The discussion in Section \ref{ch:T1_model} suggested the analogue preconditioning requires a resource allocation of $10^{-15} Wm^2$ with a weak dependence on technology. Moreover if quantization is not considered then power can be entirely determined by the noise specification and the area requirements are dependent on the gain configuration. Comparing this figure with the estimate on \\(R_D\\) indicates that we should be able to integrate a considerable amount of processing capabilities before the DSP uses a comparable amount of resources. This is important because improving on-chip processing capacity ideally results in requiring less supervision and a lower wireless communication bandwidth.
While we may expect other sources of over-head and extra power dissipation components, we should instead take a moment to consider the implications of this result. Particularly when considering the claim that electronic sensing of brain activity on larger scales is not viable due to the excessive data rates derived from the principle entropy relations and the associated communication bandwidth [^142]. Clearly any degree of on-chip processing undermines this limitation because it enables us to achieve data rates far lower than that of the electrical signals by finding a more appropriate basis of encoding information. On the other hand it does raise an important point regarding the relationship of the generated output data rate and the recorded signal to noise ratio. In some sense we are simply faced with the challenge of best consolidating the recorded information towards high level indicators for specific objective functions. This allows us to approach data rates extracted from current BMI studies which are negligible in comparison to the Nyquist rates. In this light we argue that electronic methods for recording activity are the closest to realizing a viable neuroprosthetic solution in the near future when comparing optical, magnetic and less invasive BMI architectures.
Now we can make the assertion that there should be two approaches to solving the system level challenge of integrating wireless neural instrumentation systems. The first approach would be a mixed signal topology that extensively uses analogue processing such that technology has a weak impact on improving efficiency. Instead the critical component lies with the effectiveness of analogue dimensionality reduction. In such a case we need to adopt a well established algorithm that can accommodate analogue variability an still deliver exceptional signal characterization. The second approach is to rely on digital methods that deliver robust and reconfigurable compute resources that scale well with technology. This should grantee the capacity for a variety of fully adaptive algorithms capable of extracting multi-modal characteristics from recordings. This could much more valuable for experimental neuroscience at this point in time. Unfortunately not all forms of algorithms can use the low power characteristics of processing in the analogue domain. Moreover they are typically limited by underlying assumptions regarding noisy perturbations. When we introduce different contexts of operation reconfiguring the analogue is not done as arbitrarily as a digital structure would. For this reason we will adopt the digital approach in order to leverage robust reconfigurable capabilities and reconsider the analogue approach in Section 48.
The significance here is that these trends allow us to roughly estimate the complexity of algorithms for different technologies if their resource requirements made to be equivalent to that of the instrumentation circuits. We show in Figure 50 that using a $0.18 \mu m$ CMOS process should give way to approximately 100 state variables or equivalently perform about 100 operations per sample taken. In fact looking at image processors that similarly rely extensively on data intensive post processing we can see an identical dependency on technology scaling as we have predicted for various levels of digital performance at different technology nodes. It is important to note that the normalized efficiency evaluated here is in fact independent of signal bandwidth and only depends on signal to noise ratio and its relation to the supply voltage.
{{< figure src="./technical_2/Operations.pdf}" title="Figure 50: Analytic number of digital operations available with respect to different technologies (red) with references to the normalized performance of image processors (blue)." width="500" >}}
$$ P_{system} = \underbrace{ N_{channel} \cdot \left( P_{Algo} + P_{Transmit} \right) }_{In \: Channel} + \underbrace{ P_{Control} + P_{Comms} }_{System \: Level} $$
System level design of a embedded processing system for BMIs should be guided entirely by the optimization of compute power efficiency. As shown by Equation 34 we expect are two primary components in the system power breakdown. The over all objective should lie with minimizing the channel level power dissipation of the algorithm \\(P_{algo}\\) by increasing that of the system level control \\(P_{Control}\\) such that the component that scales with channel count is reduced. Secondly we should keep in mind that reducing in processed output data alleviates the dissipation of the on-chip communication \\(P_{Transmit}\\) and the external telemetry power consumption \\(P_{Comms}\\).
## 32 Methods for Neural Signals
A key component to developing this platform is a discussion on the diverse set of signal processing methods performed on neural data and their computational requirements in order to determine our system's specifications. More importantly we want to judge what is the expected complexity for some of these operators and how many variables are allocated during each process. There are in principle four different categories for the methods that are applied to process neural signals which are listed below. In practice, a single integrated system will utilize a multitude of different techniques to achieve denoising and feature extraction.
\textbf{Pre-Processing} is the filtering and conditioning of each ADC output sample using FIR, IIR, or non-linear filters. Here the objective is simply to de-noise the features or signal components. The resulting signal allows for better detection or more precise evaluation of the signal characteristics and is often closely related to the characteristics of the instrumentation circuits. This output may be considered as the raw signal recording that is used to bench mark any post processing methods or other instrumentation systems.
**Detection** is associated particularly with capturing the intermittent spike events. There is no quantitative evaluation made with regard to the nature of the detected spike but any detection may trigger the process that records contiguous samples around the detection event that are then subsequently characterized. These events are commonly triggered upon simple threshold crossings of the signal or its integral power over several samples. In some systems the interest lies only with the accurate detection of spike events which is sufficient to perform closed loop therapeutic treatment or control actuators external to the body.
**Data reduction** can be interpreted either as representing a spike waveform in terms of defining features or with a reduced basis to allow approximate reconstruction by using spike amplitude or wavelet decomposition. Representing waveforms in terms of their quintessential components allows for more efficient post processing and reduces data rates in the case of wireless telecommunication. In its primitive sense this is simply dimensionality reduction of the recorded data and is usually followed by supervised or unsupervised signal classification.
**Classification** is the predominant objective for BMIs and is the main difficulty to realize inside a embedded recording system. Such a task in spike based systems primarily performs a generalization of the detected spike shape in terms the previously detected neurons. This reflects the fact that in most cases multiple neurons can be detected by a single electrode and by distinguishing these events the integrity of information can be preserved. The objective here lies with having an equivalent spike event output as simple detection to perform actuation but with better fidelity.
It should also be clear that the above operations primarily focus on reducing the recorded signal to its primitive components in terms of spike events at the rate of \\(100 b/s\\) instead of the \\(256 Kb/s\\) data stream typically generated by the ADC. The processing layer on top of this elementary function will either aim to evaluate neural connectivity or use collections spike rates to perform inference of high-level dynamics. This application specific processing of these systems will not be considered here primarily because the nature of such a problem is very different from the more generic information extraction from recordings. As a result that system architecture should revolve specifically around multichannel trained dimensionality reduction. Even when such a task can adjoin to what is presented here it will be out side the scope of this discussion which targets more generic signal instrumentation.
{{< figure src="technical_2/Survay_I.pdf}" title="Figure 51: Estimated resource requirements for different classes of algorithms use for processing neural recordings found in literature." width="500" >}}
When we survey the various algorithms found in recent literature and estimate the expected memory/computational requirements we might observe a distribution like that shown in Figure 51. For fair comparison we have adopted these methods to operate on a window size of 32 samples with three types of dectected spike waveforms when applicable and only accounted for memory allocation that cannot be shared across channels. This should give a good normalized indication of the limiting components for each method which we could then further optimize in a more specialized manner. Notice that there is a strong correlation in the memory usage and the required number of operations for most embedded systems.
Some of the most efficient spike detection and feature extraction is associated with using temporal characteristics of the spiking waveforms. Common examples include using the time interval between minimum and maximum peaks or the duration of threshold crossing for detected spikes. The defining characteristic here is that alignment buffers are not needed which leads to using very few operations per sample. Inherently the drawback is the increased sensitivity to noise which implies a very limited capacity to distinguish and classify different spike shapes. Unless more filtering is performed. There are other methods that also operate with reduced signal buffers. For instance using compressed sensing where signals a continuously re-projected with a sensing matrix. This requires a few accumulators for each coefficient being extracted but this is strictly data compression. Ultimately This may not help directly with classification or signal characterization that must now be performed off-chip before any benefit can be realized.
Many other classes of algorithms operate on a windowed basis that exploit learned mean spike shapes that are expected in the recording. Here a convolution or distance operator will indicate which class of spike is detected. For terminology let each convolution of the signal result in a feature that is used for detection and/or classification. The adaptive component of these methods leverage a significant amount of noise shaping and separation depending on what the objective function entails when the basis for convolution is being determined. The most prevalent approach is convolving with principle components which simply maximizes the signal variance in the projected space. In contrast to using temporal feature that struggle with sample limited denoising, the windowed operators should be more robust towards noise. Instead there is some difficulty in systematic alignment of the window with the spike. This aspect often motivates increased sampling rates or interpolation in order to perform accurate alignment.
The objective function for determining the convolution kernels can be oriented towards maximising sparsity[^154], signal to noise[^155], cluster separation[^49], or expectation maximization[^156] each reflecting different signal modalities and analysis methods. Although the complexity for training can be varied extensively the local operations for classification after adaptation is almost equivalent. This operation is the \\(F\\) linear projections using \\(W\\) samples onto the feature space where \\(F\\) and \\(W\\) are the number of features used and the number of samples in the window respectively. There may be some deviation from this operation if we also consider the different confidence intervals for each class is taken into account. This can be done by evaluating centroid distances in terms of the variance for each respective centroid. Conversely to such training, generalized templates averaged over a number of recording channels may also be used for feature extraction in order to share the memory requirements at the loss of not achieving maximally separated clusters for each individual channel.
Naturally it is challenging to objectively judge feature extraction and classification methods. We could always reduce the dimensionality of the search space or by simplifying the convergence strategies to reduce memory and computational requirements. For this reason the details in Figure 51 should be considered in relative terms due to the generalizations made with regard to system specifications. As many methods in the literature are not performed at the sensor interface they typically will not take advantage of processing on a sample to sample basis and opt for a batched or sequenced processing methodology.
We claim however that the segregation between methods primarily lies with whether the classification features are based on sample space characteristics or alternatively use windowed convolution operators. The former is the less rigorously justified as the features relating to amplitude and spike width have weak physiological significance but are aggressively more efficient than other methods. The characteristic requirement of the later later approach is typically related to the window size that is indirectly associated with the sampling speed in order to fit the relevant spike shape into the window. One important objective of current decoding research is related to introducing adaptive techniques that iteratively improve classification without supervision. Particularly without excessive memory requirements in order to keep track of long term statistics. To demonstrate why this can be particularly challenging, consider a simplified example of using K-means to directly cluster the sample space adaptively where we are fortunate to know the number of classes is three.
Clearly for each detected spike we would need to evaluate the distance between our data the centroids having a memory and complexity requirement of \\((1+F) W\\) variables and \\(2 F W\\) operations respectively. Then once the class is determined an additional \\(2W\\) operations are needed to adjust the centroid with the new data. This may include keeping track of the additional \\(F W\\) truncation residues that allow using a small enough adjustment weight for convergence when our quantization is limited by an 8 bit system. Now for completion, assume our window is 32 samples and we come to the conclusion that for each recording channel we to actively need to allocate nearly \\(2 K bits\\).
Typically this primitive aspect of a generic classification algorithm is intensive enough warrant not performing it locally in contrast to the complexity of the aforementioned temporal features. It also raises the challenge of trained dimensionality reduction without supervision which exclusively relies on evaluating the covariance matrix in order to minimize the correlation of non-signal components with our new basis. The above algorithm may be representative in terms of complexity with the exception that basis pursuit relies heavily on inner products. There is some relief from the fact these optimal basis change very slowly and actively adapting is only needed once few hours as the electrode recording changes slowly with respect to the neural activity. As will be demonstrated the feasibility of these more involved methods rely very much on the careful construction and memory allocation of the algorithm with respect to the processing architecture. Operations like k-means and PCA decomposition can be performed to a certain extent if the operations are broken down into a incremental procedures.
# 33 Resource Constrained Classification
Primarily to substantiate our expectations for performing processing at the sensor interface and evaluate where the system level requirements should lie. Ultimately such a system needs to encompass a significant variety of different application requirements. We will consider the implementation of two well known methods that process neural recording and generate classified events. This will be applied to the equivalent scenario where the proposed instrumentation front-end is used from Section 17 such that the digitized signal will have considerable dynamic range but lacks analogue filtering. In particular this implies the recorded signal is filtered by a first order butterworth low pass filter in addition to near-DC rejection while being sampled at \\(25 KS/s\\). We will address the rejection of low frequency aggressors in addition to the computational requirement of typical processing algorithms.
Many of the filter and processing considerations are guided by evaluating accuracy empirically and justified though constrained parametric optimization [^157]. The particular algorithms implemented here are structured in such a way they perform specific considerations for the underlying hardware. On numerous occasions we will employ single bit accumulators as approximations to the IIR equivalent feedback structures in order to improve our effective register depth through feedback. This is a primary advantage of in-channel processing where we may exhaustively make use of recorded data without having concern for the communication of these components.
Empirical validation is demonstrated by using of synthetic data sets that are publicly available online. This data is based on characterized extracellular recording where both background activity and spike morphologies are extracted from a human neocortex and basal ganglia. The synthesized recording was originally used to evaluate the performance of super-paramagnetic clustering with wavelet decomposition at different background noise levels in [^1]. Synthetic data specifically allows the inference of the ground truth resulting in unbiased performance indicators. For fair comparison of analogue and digital techniques we additionally include low frequency content from \\(1-300 Hz\\) at \\(10\times\\) of the largest peak to peak amplitude of the extracellular action potentials found in the recordings.
## 34 Spike Detection & Filtering
Arguably the most influential aspect to neural detection and classification algorithms is the signal preconditioning for systematic and accurate detection of spike events. The importance lies with the fact that detection behaviour has the significant influence on how the feature space appears when the spike is characterized with various methods. Although amplitude noise can usually be accounted for in terms of filtering. Any misalignment in the time domain due to noisy aggressors in the detection operator can up modulate low frequency components. The tendency to perform detection in the digital domain is entirely related to the instantaneous characteristic of discreet time processing which is superior to the group delay inherent to analogue implementations. Minimizing this factor will minimize additional memory for capturing any signal before the detection event.
The method proposed here tracks both the mean spike amplitude and back ground noise levels in order to assert the detection level of spike events. Motivated by using physiological characteristics to specify the underlying operation parameter \\(k_3\\) is introduced to represent the relative amplitude of background activity to that the maximum spike waveform that is intended to be detected. Or in other words if we are only interested in the closest neurons to the electrode \\(k_3\\) should be close to 1, otherwise if we also want to detect background activity with an amplitude at \\(25%\\) of the largest spiking events \\(k_3\\) should approach \\(4\\). In actuality this term should also be related to how well our classification can separate noisy detection or actual spike events.
\begin{algorithm}
\DontPrintSemicolon
\KwData{Sample from ADC \\(X[n]\\)}
\KwResult{Detection events & spike window \\(W_1\\)}
\Begin{
\ShowLn
Update \\(V_{LFP}\\) with $k_1 \cdot (X[n]-V_{LFP})$ \tcp*{ Track low frequency content}
Set \\(S[n]\\) with $X[n] - V_{LFP} + k_2 \cdot S[n-1]$ \tcp*{ IIR bandpass filter}
Set \\(G[n]\\) as $\sum S[n]\cdot FIR(2*R) $ \tcp*{ FIR bandpass filter }
Set \\(ES[n]\\) as $S[n] \cdot G[n-R]$ \tcp*{ Energy Estimate from IIR & FIR product }
Update \\(V_{noise}\\) with $k_1 \cdot (|ES[n]|-V_{noise})$ \tcp*{ Estimate variance on energy estimate }
\uIf{$ES[n] > V_{th}$ **and** $ES[n] > max_{local}$ \tcp*{ Find threshold crossing or new local max } }{
Update \\(V_{th}\\) with $k_1 (ES[n-1]+2V_{noise} - k_3 \cdot V_{th})$ \tcp*{ Adapt peaks and varience}
Initiate Spike Alignment \;
Set \\(max_{local}\\) to \\(ES[n]\\) \tcp*{Set local maximum}
Set \\(index\\) to \\(0\\) \tcp*{Initiate data pointer}
}
\uElseIf{Currently alligning spike (\\(index<16\\)) }{
Set \\(W_1[index]\\) to \\(G[n-10]\\) \tcp*{Store spike waveform with delayed samples}
Set \\(index\\) to \\(index+1\\) \tcp*{Increment data pointer}
}
\uElseIf{Idle state (\\(index>31\\))}{
Set \\(max_{local}\\) to \\(0\\) \tcp*{Finish classification & find next local maximum}
}
\uElse{
Accumulate \\(index\\) with \\(1\\) \tcp*{Increment data pointer}
%Perform Classification on \\(W_1[index]\\)\;
}
}
\BlankLine
\caption{Spike Detection and Alignment}
\label{aglo:T2_Detection}
\end{algorithm}
The specifics of this operation is reflected in Alg. \ref{aglo:T2_Detection}. Here the term Update, Set, and Accumulate represent recurrence, instantaneous, and integrated relations respectively. The state variable \\(V_{LFP}\\) primarily removes low frequency drift that is not associated with individual spiking events and \\(S[n]\\) is as a result a bandpass equivalent of our sampled signal \\(X[n]\\). The signal's instantaneous energy is represented by \\(ED[n]\\) which is a product of \\(S[n]\\) and the delayed derivative computed by the FIR of even order \\(2R\\) with the coefficients $a_n= -a_{2R-n} = 1-2/R \cdot(n-1)$ for \\(n\\) from \\(1\\) to \\(R\\). The factor \\(R\\) is in association with the ratio of sampling interval to spike polarization interval, equivalently as $R=f_{nyquist} / 5KHz$. At the maximum of \\(ED[n]\\) operation on line 5 essentially measures the product of the maximum spike intensity with the maximum derivative that proceeds it by \\(R\\) samples. This method primarily depends on the fact that spike detection looks for highly correlated narrow band energy which rejects a substantial amount of white noise. Moreover the operator compresses uncorrelated components in amplitude as it exhibits a square dependency in terms of $ED[n] \propto S[n]^2$ making variation in the threshold less sensitive to detection. The fact that the operator is narrow band limits the detection of slower spike waveforms that do not contain large derivative components but on the other hand this grantees more systematic alignment. In this case alignment is done simply with respect to where the peak value of \\(ES[n]\\) is detected.
{{< figure src="technical_2/freq_pfd.pdf}" width="500" >}}
{{< figure src="technical_2/phase_pfd.pdf}" title="Figure 52: Extracted frequency characteristics of digital filter used in Algorithm \ref{aglo:T2_Detection}." width="500" >}}
The overall filtering characteristic of \\(S[n]\\) and \\(G[n]\\) is shown in Figure 52. The IIR bandpass is a result of \\(k_2\\) being \\(0.5\\) such that both filters suppress components around the Nyquist frequency. The group delay should be equivalent to a single high pass pole at $250 Hz$ but the FIR assists in further suppressing high and low frequency components. We should not expect significant contribution from group delay induced distortion as the features of interest will predominantly have $1 KHz - 5 KHz$ components. Besides \\(R\\) and \\(k_1\\) can always be adjusted to reposition the high-pass poles closer to DC. Note that \\(V_{LFP}\\) will represent the \\(DC-250 Hz\\) signal components that can be used to infer characteristics about the background activity.
{{< figure src="technical_2/C05.pdf}" width="500" >}}
{{< figure src="technical_2/C01.pdf}" width="500" >}}
{{< figure src="technical_2/C02.pdf}" title="Figure 53: False alarm rates normalized by true positives for data sets with different background activity." width="500" >}}
The overall performance shown in Figure 53 reflects how spike detection is systematically accurate until the noise level approached \\(50%\\) of the signal intensity irrespective of the data set with the default case where \\(k_3=3\\). Note that the white noise is additive to the background activity implying \\(-14dB\\) of white noise and \\(-14dB\\) of background activity should evaluate to around \\(-8dB\\) accumulated SNR. When the noise level exceeds the anticipated background activity for \\(K_3\\) we observe a strong increase in the number of detected false positives. The rate of erroneously detected false negatives presents a more gradual increase but at this point classification is much more challenging. As expected background has a considerably bigger impact on false alarm rate because spectral content and signal structure is equivalent to that of the foreground activity.
We can observe that the main component for computational complexity in this detection operator arises from the FIR & IIR high-pass filters where the order is closely related to the sampling frequency. In fact if we ignore the buffer used to capture features before the alignment event then this filter accounts for 70% of the memory utilization and 53% of the elementary operations while the rest is used for evaluating the instantaneous energy and performing overhead control. Note that the classification operator should be introduced in line 19 with a sample basis using the index as referenced pointer. This implies that we will be classifying while repolarization occurs at the electrode and our detection trigger is blanked out during this interval. This implies that we lose the capacity to detect any over lapping spikes. Such events have limited occurrences and missing such events can be acceptable because proper classification will likely fail as well.
## 35 Recursive Variance Decomposition
Another commonly used technique is that of principle component analysis (PCA) which extracts the largest loading vectors \\(\nu_n\\) of the covariance matrix. This predominantly negates the systematic components of the captured signal and reduces the dimensionality of the spike window to a sub-set of maximally varying features by linear transformation. These components are particularly useful as indication for spiking activity in the signal due to structure in \\(\nu_n\\) but typically also suffice for providing a basis for classification in low noise conditions and reducing complexity once these vectors are found. The challenge specifically lies with the fact that determining this is basis requires both the computation of the covariance matrix that evolves over time as well as finding the transformation that diagonalizes that matrix. The implication is that in order to extract the first two principle components we need to track a total of \\(W(W+3)\\) state variables where \\(W\\) is the number of samples in the spike window.
The iterative method employed here referred to as recursive variance decomposition (RVD) and is an approximation to standard PCA by recursively tracking the largest two loading vectors reducing the minimum number of state variables to $3W + 3$. Similarly to PCA estimators like hebbian eigenfilters [^155], every iteration incrementally updates the the learned basis without requiring extensive computation. The methodology is based on recursive extraction of the largest loading vector $|\nu_1|$ that is normalized by \\(g_1\\) by checking the condition $(x - x \cdot \nu)\cdot \nu = 0 $. This condition checks if there is any residue in the direction of \\(\nu_1\\) after removing its component to see if it is appropriately scaled. Moreover due to the strong correlation between the mean and first principle component we approximate that $sign(\mu) \approx sign(\nu_1)$ completing the extraction of \\(\nu_1\\). In fact these two statements allow a significant reduction in complexity as normalization is achieved through feedback. The noise shaping and orthogonality properties associated with PCA is preserved using this extraction which is the most important aspect.
\begin{algorithm}
\DontPrintSemicolon
\KwData{Spike window \\(W_1\\)}
\KwResult{First two aggregate loading vectors \\(\nu_1\\) & \\(\nu_2\\)}
\Begin{
\ForEach{Sample **n** in window \tcp*{ Projection phase } }{
$D_1[n] = W_1[n] - \mu[n]$ \tcp*{ Get distance from mean spike }
Accumulate \\(p_1\\) with $D_1[n] \cdot \nu_1 \cdot sign(\mu[n])$ \tcp*{ Project spike with \\(\nu_1\\) }
Accumulate \\(p_2\\) with $D_1[n] \cdot \nu_2 $ \tcp*{ Project spike with \\(\nu_2\\) }
}
\;
\ForEach{Sample **n** in window \tcp*{ Training phase } }{
Update \\(\mu[n]\\) with $ k_1 \cdot sign(W_1[n] - \mu[n])$ \tcp*{ Track mean spike }
Accumulate \\(\nu_{1}[n]\\) with $k_1 \cdot sign(| D_1[n]\cdot g_1 | - \nu_{1}[n])$ \tcp*{ Move \\(\nu_1\\) towards \\(D_1[n]\\) }
Accumulate \\(\nu_{2}[n]\\) with $k_1 \cdot sign( (D_1[n] - \nu_{1}[n] \cdot p_1)\cdot g_2 - \nu_{2}[n])$ \; \tcp*{ Move \\(\nu_2\\) towards \\(D_1[n]-p_1\cdot\nu_1\\) }
Accumulate \\(p_3\\) with $(|D_1[n]| - \nu_{1}[n]\cdot p_1) \cdot \nu_{1}[n]$ \tcp*{ Get gain error }
Accumulate \\(p_4\\) with $(|D_1[n]| - \nu_{2}[n]\cdot p_2) \cdot \nu_{2}[n]$ \tcp*{ Get gain error }
}
Accumulate \\(g_1\\) with $k_1 \cdot sign(p_3)$ \tcp*{ Adjust gain on \\(\nu_1\\) }
Accumulate \\(g_2\\) with $k_1 \cdot sign(p_4)$ \tcp*{ Adjust gain on \\(\nu_2\\) }
}
\BlankLine
\caption{Recursive variance decomposition}
\label{aglo:T2_PC_l1min}
\end{algorithm}
Algorithm \ref{aglo:T2_PC_l1min} shows the operation for estimating the first two PCA components. Here \\(D[n]\\) is the new data point off set by the mean spike waveform $\mu [n]$ which allows the long term estimation of aggregate variance. Similarly parameter \\(k_1\\) specifies how the state variables are exponentially averaged over the preceding data points. Because the projection of the first loading vector must be evaluated before the second vector these operations must be sequenced in time or with memory buffers. The evaluation of \\(p_4\\) is strictly for illustrating the iterative method at which other components are evaluated while \\(g_2\\) can also be adjusted to normalize the values of \\(p_2\\) to prevent overflow without needing \\(p_4\\).
## 36 Template Matching using K-means
Finally we consider the implementation of template matching in channel. This can be seen as simply a K-means clustering method without dimensionality reduction on the input vector. The implication is that it is characteristically more memory intensive but requires less computationally intensive operators.
\begin{algorithm}
\DontPrintSemicolon
\KwData{Spike window \\(W_1\\)}
\KwResult{Classification with respect to aggregate clusters}
\Begin{
Accumulate $Spike \: Count$ with \\(1\\) \tcp*{track accumulated statistics}
\ForEach{Sample **n** in window \tcp*{Projection Phase} } {
\ForEach{Template **k** in memory} {
Accumulate \\(p_k\\) with $W_1[n] - T_k[n] $ \tcp*{Get \\(l_1\\) distance for each spike class}
}
}
Find $p_{min}=min{[|p_1|, \: |p_2|, \: |p_3|, \: |p_4|]}$ and Set \\(c\\) to index \tcp*{Find most similar}
\ForEach{Sample **n** in window \tcp*{Training phase} }{
Accumulate \\(K_c[n]\\) with $k_1 \cdot sign(W_1[n] - T_c[n])$ \tcp*{ Adjust most similar class}
\If{ Not all templates generated **and** $Spike \: Count > k_2$ } {
Duplicate exiting templates \;
Set $Spike \: Count$ to 0 \;
}
}
}
\BlankLine
\caption{Incremental K-Means classification}
\label{algo:T2_Kmean}
\end{algorithm}
The implementation considered in Algorithm \ref{algo:T2_Kmean} is relatively straightforward where one section evaluates the generation of new templates and the other adjust existing templates with new data. The template approach in general has good noise performance due to the redundancy in correlated features that average out white noise. There is some usually some concern with respect to the convergence of k-means centroids. Typically due the the fact that noisy sample points may be initialized as new clusters and thereby wasting memory. The method used here is iteratively duplicating centroids after convergence. This minimizes the impact of noisy data in the feature space. As illustrated in Figure 54 during each iteration the centroids converge to mean positions. Due to the morphology that these centroids may be in we generally need more centroids than there are clusters but this approach works well when there are few spike classes. The assumption here is that we are clustering features that are characteristically Gaussian mixtures.
{{< figure src="technical_2/Cdup.pdf" title="Figure 54: Illustration of centroid evolution over several iterations." width="500" >}}
## 37 Complexity Evaluation
Generally the application of these methods should reflect a system level objective. For the configuration used here the memory and algorithmic operations are estimated in Table 7. Multiplications are equivalent to eight elementary operations and the memory calls are not considered as a computation but as load/store cycles. The impression made here is that template matching is strictly very efficient if the the memory allows large allocation of active spike waveforms. Similarly RVD could show a considerable reduction in operations if a dedicated multiplier is introduced but that depends on how much we value compactness over execution speed. The disparity in memory requirement will dramatically worsen when the number of centroids is increased which is not the case for the computational complexity in RVD.
Table 7: Estimation on memory and computational resource requirements for each algorithm.
| **Algorithm** | **Memory** | **Operations** | **cycles per sample** |
|----|----|----|----|
| NEO Peak Detection | 20 Elements | 30 | 56 |
| RVD / training | 63 Elements | 29 / 88 | 57 / 116 |
| Template / training | 85 Elements | 9 / 16 | 27 / 34 |
{{< figure src="technical_2/P05.pdf}" width="500" >}}
{{< figure src="technical_2/D05.pdf}" title="Figure 55: RVD and template based classification for data sets with \\(-26 dB\\) background activity." width="500" >}}
{{< figure src="technical_2/P01.pdf}" width="500" >}}
{{< figure src="technical_2/D01.pdf}" title="Figure 56: RVD and template based classification for data sets with \\(-20 dB\\) background activity." width="500" >}}
{{< figure src="technical_2/P02.pdf}" width="500" >}}
{{< figure src="technical_2/D02.pdf}" title="Figure 57: RVD and Template based classification for data sets with \\(-16 dB\\) background activity." width="500" >}}
The empirical results in Figure 56 generally show that in moderate noise conditions our classification accuracy is typically better than $85 %$ which is calculated in terms of the aggregate probability of correct classification multiplied with the probability of missing a spike event. Unsurprisingly RVD is not very effective in noisy conditions where the variance accentuates irrelevant components. The classification accuracy from template matching is also shown in Figure 57. These results should primarily show an improved noise rejection characteristic but more generally this approach is more resilient at dealing with false positives. In principle a new cluster will be assigned to a zero mean template representing the false positives while maintaining the other templates intact. Strictly the detection circuit should be readjusted to favour increased detection of false positives as long as the rate of false negatives remains low. But instead exactly the same parameters are used for every test.
We should be careful to judge the effectiveness of these implementations particularly with respect to efficiency. While we can generally increase performance by allocating more memory or introducing additional computation we need to quantitatively evaluate the objective. We suggest normalizing the resource allocation with respect to increased information extracted from the signal by classifying. That is how much more processing are we allocating for classification by proportionally increasing the signal to noise ratio of our output. In the optimistic case when the three classes neurons being detected are uncorrelated our base-line accuracy would be \\(33%\\) while needing \\(56\\) cycle of operation for spike detection. In fact this leads us to believe both algorithms in this respect decrease resource efficiency by a factor around \\(2\\) accounting for an increased memory, processing requirement compensated with increased accuracy. While this claim is very sceptical with respect to the motivation for classifying spike events it also reasons the aggressive reduction in algorithmic complexity through approximations presented here. There is genuine benefit in classification that assists the convergence of further processing algorithms. In addition we simply argue that excessive dedication of resources that exceed that needed for signal conditioning may not be worth while. They key point demonstrated here is that these methods appear very much attainable in terms of on-chip processing capacity. Here we considered the case without supervision specifically in consideration for scalability. It is likely that further reductions or optimizations can be made in that regard to the structure of these methods to improve accuracy and noise tolerance.
# 38 System Architecture
The conceptual architecture of the system proposed here is foremost based on the opportunity for software defined real-time instrumentation that has not yet be exploited in chronic implantable systems at this scale. Currently it is common place to see synthesized logic that performs all processing and data handling procedures in such a way that they have very limited high-level reconfigurability. This is strictly in order to save power and reduce complexity at the system level. It is important to note that for any recording device there are a multitude of phases during its operation where this flexibility can be highly advantageous once sensor characteristics are learned. Like discussed in Section 32 many classification algorithms benefit from training or characterizing the recording conditions.
The approach to specialized DSP in the literature reflects two problems in this field. The first is signal extraction from recordings that relates to what we have discussed in terms of spike detection to extract compressed spike train data. The other is associated with accelerating adaptive filters that map these spike trains on to estimates for cognitive dynamics or invoked limb movement. Typical examples for spike sorting are fully synthesized cores [^49] [^158] that can be integrated and are capable in achieving respectable processing capacity for specific algorithms. In contrast to spike train decoding that is predominantly performed by FPGAs as integration make less sense for the development high dimensional adaptive filters like [^159] that do not need to be embedded within body. Interestingly the work in [^40] proposes a application specific instruction set processor (ASIP) that similarly argues for high performance computation for these structures with a high degree of flexibility that reflects the different models used for spike encoding. Additionally we see the advantage of using off-chip microcontrollers like MSP430 that interface with a highly reconfigurable instrumentation front-end to leverage both adaptive and involved noise shaping to perform more intricate operations such as seizure detection or artefact removal [^160]. While these works may not be viable for high channel count implantable devices it does highlight the considerations for designing fully integrated prosthetics that is in-line with this work.
Here we will consider a particular type of microcontroller topology that can support reconfigurable functionality and reflects the fact that although multichannel BMIs are highly parallel in nature the associated processing can also be algorithmically intensive. The feasibility of this notion has been estimated to an extent but many components are subject to implementation. In essence we optimistically approach this design problem with a strategy that exploits both the homogeneity in processing and the information locality in order to realize a feasible solution. This lets us focus on the in-channel operation where efficiency is maximized through the topology of the execution unit. Regardless of the end result this proposed system will be one step towards the goal for more effective chronic neural implants.
## 39 Distributed System
The system illustrated in Figure 58 represents the distributed microcontroller architecture. The primary mechanism of operation is the program memory that continuously feeds the stored instructions into the pipelined array of processors that operate locally on the recorded data. The execution of these instructions is handled with what is essentially a instruction decoder, memory module and an arithmetic unit that is interfaced with four analogue recording channels. This approach guarantees that the absolute minimum amount of energy is required for the communication of recorded data as the information is processed and consolidated to its elementary component at the quantization interface.
{{< figure src="technical_2/Sys_sH.pdf" title="Figure 58: Illustration of the proposed distributed \\(\mu\\)C array for homogeneous program execution at the sensor interface." width="500" >}}
Inherently this implementation will sacrifice the availability of more intricate functionality found in DSPs since the data is not funnelled into one processing unit that can be very elaborate in complexity. The distributed structure is rationalized by the fact that the intensive operations such as clustering methods operate at a much lower speed due to the sporadic spiking activity that make statistical convergence slow. Furthermore these adaptations need to be performed on the order of minutes by which such functions may also be implemented through the redundancy of elementary operations. Moreover multiplexing loses effectiveness in memory intensive applications as it does mitigate the power & area scaling associated with memory allocation.
Also consider that the program control that gives this implementation its capability for generic computation does not scale with the number of processing units. This is an important distinction when addressing a hundreds of channels on chip that will allow this implementation to outperform any other architecture and leverage the fully integrated form factor. We also note that whether this architecture is realized by synthesized logic, FPGA fabric, or more custom logic cells is insignificant to the extent that the memory structure plays a more profound role. This claim is based on the algorithms in Section32 that allocate significantly more resources to memory than algorithmic operators. In particular memory density and efficiency is a critical component to the success of this type of large scale sensor system. Here 3-T eDRAM is employed which is more effective than alternative solutions memory solution and can still be realized on a standard CMOS process [^161]. When compared to an SRAM equivalent we find it can readily achieve a factor 8 improvement in density [^162].
{{< figure src="technical_2/NPI_TLT2.pdf" title="Figure 59: System architecture for NPI sensing platform with digital interfaces annotated." width="500" >}}
The high-level interfaces are illustrated in Figure 59. There are multiple layers with respect to how internal resources are accessed for reconfiguration. This is primarily for robustness where each layer increases in complexity and chance of failure. The low-speed interface is the simplest element which acquires commands from an external device with very relaxed requirements on input timing. These commands allow us to reconfigure the high level sub-blocks like tuning the generated reference voltages provided by the power management, control reset/power of individual sub-blocks and selecting which digital test signals should be monitored. In particular the processor array and program memory layers almost operate in isolation to the peripherals. These blocks are timed by the internal PLL structure that drives significantly higher data rates that do not need to propagate to the pad level in order to save power. The back-end of the system similarly communicates data uni-directionally between two different clock domains to send data packets off-chip using a number of handshaking protocols.
The implementation of the analogue circuitry has been discussed in Section 23 where we additionally constrain all algorithms to a maximum of 1024 cycles per sample while maximally allocating \\(128\\) words of memory. With respect to our previous discussion this amount of hardware should allow a large set algorithms that are resource efficient. If not the topology will promote the construction of processing with more aggressive memory efficiency and using feedback dynamics to implement more complex operators such as division. It should be noted that these specifications have flexibility by sampling multiple times per program cycle or reducing the system clock using the configurable phase locked loop in order to reduce power.
{{< figure src="technical_2/Lay_sH.pdf}" width="500" >}}
{{< figure src="technical_2/Lay_sH.png}" title="Figure 60: Physical implementation of NPI system using a 6-metal $0.18 \mu m$ CMOS process. " width="500" >}}
Figure 60 presents the fabricated prototype device. It can be seen that integrating many peripheral blocks such as a phase locked loop, voltage supply regulators, and program memory on chip minimizes the pad count required for the digital and power domains. However even for a 64 channels system the number of analogue pads required for the sensor interface play a significant role on top level organization. In addition careful consideration has to be made with respect to how the digital signals propagate where minimizing track length not only reduces digital noise coupled to the substrate but more significantly the associated power dissipation. The number of processing elements can in fact quite easily be scaled up by extending the instruction pipeline where the system level timing constraint for speed and fanout lies with the program memory which has an internal pipeline that needs to connect the program memory together.
## 40 Processing Core
In order to allow the hardware to provides generic processing capabilities in a distributed fashion a number of considerations have to be made. In particular we need to reflect the typical operations with certain modalities of operation. It is clear that although all recording channels should execute the same algorithm they will typically not share the same state of operation. This state dependency is exemplified with respect to intermittent processing during bursting neural activity and idling during quiet periods. This is an inherent limitation to sharing the program memory as the dynamic execution of the code where each core has its own program counter or a top level scheduler is not feasible for an arbitrary number of channels. The quasi-out-of-order execution makes it challenging for us to adopt scalable tile structures found in image processing [^105] that excel in maximizing area and power efficiency in a scalable sense.
Lead by maximizing the locality of data execution [^163] where this aspect of branch control or conditional execution is mediated by skipping a section of the incoming instructions if a condition is not met. The approach of skipping sections of code up on branching is relatively in-efficient with respect to throughput. This approach is optimal at the system level when individual cores may need to execute any section code and branching will only be limited by the dissipation related to the registers pipe-lining the instructions across the chip.
{{< figure src="technical_2/Sys_uC.pdf" title="Figure 61: Organization of the distributed execution unit detailing components and the interconnect." width="500" >}}
The individual components of the execution unit are shown in Figure 61 and details the main data buses used for exchanging data. The majority operations revolve around manipulating data in the registers R1-R16 as A operand in association with any other data sources that can be used as B operand. The operation performed by the arithmetic logic unit (ALU) will always overwrite the result to the location of the A operand but can in extension also be used to to write to other locations (i.e. memory, periphery, etc.). This implies that in terms of instructions there are always two components where the first is simply the operation executed by the ALU in addition to the two memory sources. The second component optionally extends this simple functionality by writing these intermediate values to multiple other locations or arbitrary branching operations that will take the unit out of sleep.
On that note we mention that the local execution controller consists of three registers that assist in branching operations or conditional execution. When either of these registers have logic one the instruction is gated by a null operation before execution. One of these registers will self reset allowing for if-else functionality by skipping a single instruction. The two other registers need to be cleared actively but in combination this will allow for nested conditioning of up to three levels. While in idle state no internal registers are clocked with the exception of the instruction pipeline and the branch controller saving a significant amount of power as the instruction does not need to be decoded.
The digital data interface provides the means for communicating data either off chip or to adjacent execution units. This functionality allows granular consolidation of features or signal structure and correlate measurements with system level parameters. For example consider each execution unit is listening to the most informative analogue instrumentation channel, it is conceivable that comparing its spike train with that of an adjacent units to evaluate neural interconnect level features. The Asynchronous data bus on the other hand is a key feature that allows this system to appear as a slave at the network level that does not need to be coherent with the system or off-chip clock. This bus is in essence a large buffer distributed across many channels utilising asynchronous hand-shake protocols to funnel the data to a SPI module that is clocked either externally or internally [^164]. This solves a number of coherence problems that mitigates the need of having a FPGA to drive this system as the SPI module is not timing critical. Furthermore this alleviates clock distribution as the timing constraints are always local to each execution unit and not the data bus that is distributed across the chip which may either be very restricting or power intensive.
The dynamic control with respect to the analogue channel is enabled by one designated 8 bit register per analogue channel. In this particular case 4 bits are used to specify gain, 2 bits for configuring the biasing current as 0x,1x,2x,3x, and 1 bit of the reset function. In particular the reset phase will temporarily boost the transconductance on the band-limited filtering stage to allow sub-microsecond auto-zero for active noise shaping. For both the ADC and the amplifiers there is one bit that controls a multiplexer at the input that can switch in the sensor or a global differential test net for calibration or verification. Similarly the ADC has 2 bits to select which analogue channel another bit to clock at the full rate or half the rate of ad joint micro-controller. In addition there are 3 bits to control the how the chopper frequency is divided from the sampling signal which is the final control bit. Understandably the analogue configuration will remain static after the appropriately being set. The ADC configuration register is considerably more dynamic as the multiplexer needs to be reconfigured and sampling needs to toggle persistently.
There are two modes of getting quantized data from the ADC depending on the desired functionality. The first is simply reading the 8 bit quantization register that shadows the 7bits quantized by successive approximation and the LSB from the first integration result. In order to utilize the higher resolution capability the comparator output is used to integrate coefficients from the instruction onto a local register where the comparators will decrement or increment the register accordingly. If no calibration data is locally stored this operation first integrates binary weights on one register during the SAR cycles and then integrates the FIR window onto another register. This is large investment of cycles to perform high resolution quantization but this can be optimized for specific applications when it is necessary. If the calibration data is available for the 7 SAR weights then the ADC must be configured to run at half the system speed and before quantization these weights are loaded from the memory onto registers R2-R7. Followed by the usual process of SAR quantization while these weights as simultaneously also integrated on a second register. Then after the integration phase three registers will contain quantized data. The scaling of coefficients is key and should be such that the $\Sigma \Delta$ result simply copies the sign bit of the SAR operation and can concatenate the lower 7 bits with the SAR result. Then the calibration data is scaled appropriately and added to the 14bit signed double with carry logic. Clearly there a number of conventions suggested here that will best exploit the capabilities of the design.
The memory module local to each execution unit hold 128 words of data which can be shared across the analogue channels with 32 locations each. Particularly when the DSP is mainly performing filtering the recorded data can be buffered for FIR filtering or keep its high precision filter state variables for IIR structures. These filter and program coefficients are stored in the shared program memory such that the execution unit does not experience an overhead in memory requirement. However for other memory intensive algorithms such as template matching, serving the most informative of the four analogue channels will have to suffice because the memory requirement is beyond the capabilities of this configuration. The DRAM architecture has a refresh-up-on-read mechanism which implies that the used memory locations will have to systematically be read to keep the data stored valid. Fortunately this requirement is self fulfilling as the program recycles itself every $100 \mu s$ and the DRAM retention time is on the order of $1 m s$ implying that as long as there is a guaranteed read on the memory location it will stay valid. The physical read mechanism however does require a minimum of two cycles. The first is in the background which simply prepares the internal registers of the module while a different execution is taking place and the second is in the foreground where the location is read and the data bus is driven by the DRAM.
{{< figure src="./technical_2/Lay_uC.pdf}" width="500" >}}
{{< figure src="./technical_2/uCm.png}" title="Figure 62: Physical implementation of execution unit using a 6-metal $0.18 \mu m$ CMOS process " width="500" >}}
As the illustration in Figure 62 shows, keeping the 8 bit structure in terms of parallel operations maintains a very compact floor plan. This is typical of data flow intensive designs where the digital logic should be placed underneath the associated data buses. This is difficult to replicate by automated synthesis tool where signal congestion is the most stringent aspect. The digital signals for the two operands and the data line span horizontally where sub-blocks extensively take advantage of the gated output buffers for each sub-block that is controlled by the decoders. The full custom approach taken here sacrifices design effort for additional performance in terms of reduced parasitics and more aggressive power gating.
$$ \mbox{\textbf{\textless C\textgreater,[\textless CE\textgreater],\textless A\textgreater,\textless B\textgreater,[\textless OE\textgreater*]}} $$
The syntax for constructing instructions needs to be in the Backus Normal Form [^165] as formatted in Equation35 with reference to Table 8 which summarises all possible compositions. A parser is implemented that will translate an ordered set of these instructions directly into hardware specific machine code that needs to fed into the instruction pipeline any violations or exceptions will be caught by this script automatically. Although there is no dedicated multiplication hardware there are specialized registers that allow shift add based multiplication over eight cycles. Any other primitive logical or arithmetic function can be realized with this instruction set as it is turning-complete. This assertion is made by noting that it can evaluate the operation; subtract and branch if less than or equal to zero, which is sufficient for a one instruction set computer [^166].
Table 8: Overview of instruction sub-components.
| **Index** | **Operation Subset** | **Summary of Possible Entries** |
|----|----|----|
| C | Logic Operation: | Logical Shift Left/Right, Arithmetic Shift Left/Right, XOR, XNOR, AND, OR, MOVE-A, MOVE-B |
| C | Arith. Operation: | Compare, Add, Carry Add, Multiply, Complete Multiply |
| CE | Compare Option: | \textgreater, =, \textless, Overflow |
| CE | Add Option: | Subtract, Absolute Value, Increment Overflow Bit |
| CE | Mov Option: | Mem. Address is from Data line. Default is from Instruction |
| A | Operand A: | R1-R8, R9-R13, ID, Count, Memory |
| B | Operand B: | R1-R8, Left uC, Right uC, Instruction, ADC, Memory, Null |
| OE | IO Extension: | Write to Left uC, Write to Right uC, ADC Sample enable, Write Output Buffer |
| OE | Branch Extension: | Write to Branch Register BR1-BR3, Invert Branch Result |
| OE | Memory Extension: | Write Address, Write Data, Read Data |
It should be mentioned that there a number of hardware specific details with respect to how certain instructions behave that need careful consideration towards the implementation details. For example if no comparison is made but a branch register is accessed the output of the comparator will be treated as false no matter what logic the overflow bit is. This allows us to clear or set branch registers while simultaneously performing an operation. Another example is that by default the instruction data is ready at the input of the memory address to prepare a read in the background. In most cases it is intuitive and we simply strive to maximize the cycle efficiency. At all times the execution unit is capable of dealing with the compute aspects while performing branching and memory access simultaneously.
This work also provides an elaborate set of test tools that allows compilation of instruction code and the generation of piece-wise-linear 'csv' files for test sources that can be used in the circuit simulators. This can be used in association with the transistor or verilog implementation of the processing core. The behavioural models in particular are important for the translation of this architecture to other implementations.
{{< figure src="./technical_2/uC_PS.pdf" title="Figure 63: Power dissipation with respect to specific operations for the same operand A=113 & B=114 in randomized order." width="500" >}}
The results in Figure 63 exemplifies the dependency of power dissipation with respect to different operators for the same operand A and B. It should be expected that the is a strong operand dependency with respect to power consumption but these results follow our expectations closely. Generally the simpler the operation the lower the current dissipation is because less complexity is involved with the switching losses. Here again we observe that when the unit is in a sleep or branching state the power dissipation is mainly associated with the instruction pipeline. As this 32bit pipeline transverses the entire execution unite it plays a significant contribution towards the baseline power consumption. The typical power consumption for full activity will lie around $45 \mu A$ it should be noted that sporadic spiking activity will gate the majority of operations and it is likely that running at half the designed rate with 512 cycles is more than sufficient. Note the typical figure of power is \\(2.7 pJ/Cycle\\) or $2.7 \mu W/MIPS$ which is several orders of magnitude better than 16-bit microcontrollers such as the MSP-430[^167].
Table 9: Summary of performance specifications for the NPI system and state-of-the-art specialized integrated processing architectures. \\(^\star\\) Reconfigurable topology.
| Parameter | Unit | This Work | Markovic [^168] | Arimoto [^105]|
|----|----|----|----|----|
| Architecture | | Distributed (\mu)C Array | Multi-Grain FPGA | Dedicated Tile Array|
| Technology | [nm] | 180 | 40 | 65 |
| Supply Voltage | [V] | (1.2) | (1) | (1.2)|
| Parallel Units | | (64) | (16^\star) | (2048)|
| Instruction Size | [bits] | (32) | - | (32) |
| Operational Frequency | [MHz] | (20) | (400) | (300) |
| Sampling Frequency | [S/s] | (32k) | (100M) | - |
| Operations per Sample | [Cycles] | $256 $ | (4) | - |
| (P_{Digital}) per Channel | [(\mu)A] | (44) | - | -|
| (P_{Analogue}) per Channel | [(\mu)A] | (16) | - | - |
| System Power | [mA] | (1.42) | $ 11.6 $ | (300)|
| Program Memory Capacity | [kb] | (32) | - | - |
| Processor Memory Capacity | [kb] | (1) | (36) | (1) |
| Processor Array Area | [mm(^2)] | $1.04 \times 1.32$ | $3.8 \times 5.4$ | $1.60 \times 3.19$ |
| Power Efficiency | [GOPS/mW] | (1.52) | $ 0.86 $ | (0.31) |
| Area Efficiency | [GOPS/mm(^2)] | (0.88) | $ 2.34$ | (36.1)|
The specifications given in Table 9 summarize the main features associated with this system on chip for processing neural data at the sensor interface. As the total power consumption is on the order of \\(1.5 mW\\) there is some concern with respect to the power density associated with the system in full operation that in this particular case is \\(26 mW/cm^2\\). In fact if the number of channels is scaled up beyond 64 channels this power density will tend to \\(29 mW/cm^2\\) but will not exceed it. Either figure will likely be smaller subject to the physical & software implementations but more importantly will not result in a thermal agitation or the heating of cortical tissue that exceeds \\(2^{\circ}C\\) [^68]. More generally we have the advantage of tuning processing capabilities to the heat-capacity of the implanted package. In fact comparing this work to state of the art FPGA topologies[^168] and highly parallel ASIC structures[^105] that follow the same design methodology we find that power and area efficiency that exceeds that of stand-alone microprocessors by orders of magnitude. These figures also reflect the expectation that technology scaling should lead to even more compact configurations. In addition Gate leakage may introduces some diminishing returns with respect to power efficiency. We mention that these figures are extrapolated based on the performance of a single execution unit and we expect more overhead from other components that is not accounted for in this comparison.
$$ R_{D} = \frac{P_{\mu C} \cdot A_{\mu C}}{N^{2}_{chan} \cdot Cycles} = \frac{44 \mu W \cdot 196 \times 158 \mu m^2} {4^2 \cdot 256}\approx 3.3 \cdot 10^{-16} \: \left[W mm^2 \: per \: OP \right] $$
Re-evaluating our power/area figure of merit in Figure 49 with Equation 36 we observe that practically we lose a factor of ten in efficiency when compared to a dedicated ASIC implementation because resource utilization inside the execution unit can not be maximized. This was expected given that instead we attain high-level reconfigurability. However this does achieve a very good understanding with regard to where the system scales from this point both with respect to area and power requirements.
## 41 Testing Platform
As this system is directed at generic use for the neuroscience community where high level programming and interfaces are essential for end user adoption. The testing platform presented here is aligned in such a fashion that its fundamental components can be extended upon greatly to serve a multitude of needs. This ambitious design criteria is primarily provided by the real-time platform illustrated in Figure 64 that supports a standard Linux operating system. The thee components compromise of the custom NPI system on chip, the Raspberry Pi platform, and networked resources.
{{< figure src="technical_2/Sys_iP.pdf" title="Figure 64: Block diagram of the instrumentation platform developed as framework for real-time applications." width="500" >}}
The software stack running on the Raspberry Pi primarily handles the high speed SPI link that fetches data from the NPI system at \\(10 Mb/s\\) and stores it to a local buffer for some of the data visualization. This data stream is then forwarded to a network routine that is connected to a server over the local area network via a UDP protocol to allow large quantities of data to be stored in a scalable fashion. The graphical user interface is built on top of this process in order to give a means to both configure the device actively and provide some form of interactive interrogation with respect to the recorded data and the algorithm being executed.
The application of a generic internet of things platform plays a important role with respect to long term development objectives. It signifies that the ASIC is there to provide a specialized interface with the sensor and a generic digital interface with the external control to allow rapid adoption of new techniques or other components as software extensions. This substantiates the modular approach where design effort is explicitly focused towards specialized hardware for the sensor and software development at the system level. This is important given the complexity of these systems where overspecialisation limits the versatility of existing designs thereby limiting the utility of other commercially available tools/devices.
The advantage here is that a multitude of procedures can be run on the real-time platform without supervision that are detailed in high-level programming code that have fast development and turn-around capabilities. In this case it significantly improved test procedures by enabling automated exhaustive characterization of logical integrity. In fact the standalone module of the microcontroller structure can run 1 MIPS of on the fly randomly generated operations. This can be seen in Figure 65 where the Saleae logic analyser is used to probe the internal data bus of one particular core.
{{< figure src="technical_2/Scope.png" title="Figure 65: Digital waveform of the internal data bus BIT 1-8 as new instructions are being loaded into the device using the clocked Latch and Configure signals." width="500" >}}
Table 10: Section of Instructions and recorded outputs from $\mu C$ structure with the associated machine code.
| **BITLINE**| **INSTRUCTION** | **Machine Code** |
|----|----|----|
| 00011010 | MOVB R5 DINST 26 |0011111100000000011000000011010|
| 11101111 | MOVA R3 DINST -17 |0011011000000000011010011101111|
| 00001010 | AND R5 R3 |1110111100000000000000000000000|
| 00011010 | MOVB R5 DINST 26 |0011111100000000011000000011010|
| 11101111 | MOVA R3 DINST -17 |0011011000000000011010011101111|
| 11111111 | OR R5 R3 |1110111100000000000010000000000|
| 00011010 | MOVB R5 DINST 26 |0011111100000000011000000011010|
| 11101111 | MOVA R3 DINST -17 |0011011000000000011010011101111|
| 11110101 | XOR R5 R3 |1110111100000000000100000000000|
This is partly shown in Table 10 where the internal bit-line of one such execution unit could be directly accessed. Because it is not viable for us to exhaustively simulate the hardware in various conditions we use a physical test bench in order to record the performance tolerance with respect to voltage supply and operating frequency. Moreover what the user sees is reduced to latent frames of data over several milliseconds and the corresponding instruction code executed by the platform. The physical interfacing protocols are very much transparent. By construction each core has a hard wired ID that will allow the active supervision of internal variables for development and debugging of single units. Due to the specialized hardware the instrumentation programs currently still require careful tailoring of the instruction code but this can be extended towards compiling directly from C++ code that is also used to construct the rest of the platform.
{{< figure src="technical_2/TPlat.pdf" title="Figure 66: Graphical user interface used for configuring the NPI system showing test data." width="500" >}}
Figure 66 depicts the GPU accelerated graphical set-up used for testing the device where the functionality is mainly associated with reconfiguration and powering different system sub-blocks for validation. From a engineering point of view it is more of a convenience to have automated reconfiguration of the device as one interacts with the various settings. Particularly in associated with probing the supply voltages or analogue reference signals generated on chip. It would be more typical that during experimentation this functionality can be reduced to simply selecting from a set of predetermined programs.
{{< figure src="technical_2/TPhw.pdf" title="Figure 67: Test bed used for characterization with various components illustrated." width="500" >}}
In order to move towards fully isolated operation which will be the case for a implanted device the system on chip architecture relies on a minimum amount of off-chip components in order to bring the resource requirements of the topology into scope. This is shown in Figure 67. These feasibility considerations are generally with respect to reasonable assumptions associated with a wireless implant that is hermetically sealed. In this particular case we will allow a number of off-chip decoupling capacitors, a reference resistor and a reference voltage which may very well be integrated on chip in one way or another if necessary. The system also uses a \\(1 MHz\\) external clock reference which may be realized at the wireless power carrier frequency and is locked onto with a phase locked loop to generate the internal \\(20 MHz\\) system clock. Three linear LDOs were integrated to provide a \\(1.2 V\\) supply to the digital,analogue, and memory separately. Where the analogue supply voltage used to derive internal ADC voltages references of \\(1.2 V,0.9 V,0.6 V,0.3 V\\) from the unregulated supply using high speed buffers.
# 42 Conclusion
This chapter substantiates a scalable and long-term approach for the development of programmable neural interfaces. In particular we discuss why moving away from the fixed purpose DSP architectures seen in many conventional systems is of significance with respect to performance and reliability. In addition we provide indicators that show the majority of modern CMOS technologies using dedicated on-chip processing hardware is viable to perform local signal analysis. Furthermore we highlight the importance of efficient algorithm construction were operators should revolve around execution per sample and processing structures that improve scalability for systems with many recording channels in association with the near-data-processing paradigm. PCA & template maching methods are proposed for embedded systems that require 57 operations per sample and 680 bits of memory with entirely unsupervised operation that can achieve over 80% accuracy during spike detection & classification.
A distributed micro-controller structure is proposed in effort to realize these characteristics and reveal underlying constraints. The topology reflects the nature of processing neural data in the context of achieving generic computational capacity. This discussion details both low-level and system level considerations that address the software stack. The impact of memory requirement that results from being able to execute arbitrary algorithms in isolation is evident both in-channel and chip level. In the proposed configuration the amount of resources allocated for this function is comparable to that of the signal processing but depends very much on the number of channels that are integrated together. We point out that if the number of channels is increased this component does not change and allows this topology to become more effective. The distributed processing architecture operates with an efficiency of 1.52 GOPS/mW and each core only requires a 0.02mm\\(^2\\) silicon foot print with fully reconfigurable 8 bit processing capabilities.
The foregoing discussion has depicted the intricate complexity associated with these sensing systems and revealed the diversity of aspects that should be taken into consideration. Sustainable development for these systems will need long-term solutions due to the excessive design effort that prevents rapid turn around and progress. Moreover innovation needs to be contextualized at the system level to ascertain whether new techniques and methods have significant impact. This requires the abstraction and modelling of these implementations to gauge impact using empirical indicators.
# References:
[^1]: R.Q. Quiroga, Z.Nadasdy, and Y.Ben-Shaul, ''Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering,'' Neural Computation, vol.16, pp. 1661--1687, April 2004. [Online]: http://dx.doi.org/10.1162/089976604774201631
[^2]: R.A. Normann, ''Technology insight: future neuroprosthetic therapies for disorders of the nervous system,'' Nature Clinical Practice Neurology, vol.3, pp. 444--452, August 2007. [Online]: http://dx.doi.org/10.1038/ncpneuro0556
[^3]: K.Birmingham, V.Gradinaru, P.Anikeeva, W.M. Grill, B.Pikov, VictorMcLaughlin, P.Pasricha, K.Weber, DouglasLudwig, and K.Famm, ''Bioelectronic medicines: a research roadmap,'' Nature Reviews Drug Discovery, vol.13, pp. 399--400, May 2014. [Online]: http://dx.doi.org/10.1038/nrd4351
[^4]: ''Bridging the bio-electronic divide,'' Defense Advanced Research Projects Agency, Arlington, Texas, January 2016. [Online]: http://www.darpa.mil/news-events/2015-01-19
[^5]: G.Fritsch and E.Hitzig, ''ber die elektrische erregbarkeit des grosshirns,'' Archiv für Anatomie, Physiologie und Wissenschaftliche Medicin., vol.37, pp. 300--332, 1870.
[^6]: G.E. Loeb, ''Cochlear prosthetics,'' Annual Review of Neuroscience, vol.13, no.1, pp. 357--371, 1990, pMID: 2183680. [Online]: http://dx.doi.org/10.1146/annurev.ne.13.030190.002041
[^7]: ''Annual update bcig uk cochlear implant provision,'' British Cochlear Implant Group, London WC1X 8EE, UK, pp. 1--2, March 2015. [Online]: http://www.bcig.org.uk/wp-content/uploads/2015/12/CI-activity-2015.pdf
[^8]: M.Alexander, ''Neuro-numbers,'' Association of British Neurologists (ABN), London SW9 6WY, UK, pp. 1--12, April 2003. [Online]: http://www.neural.org.uk/store/assets/files/20/original/NeuroNumbers.pdf
[^9]: A.Jackson and J.B. Zimmermann, ''Neural interfaces for the brain and spinal cord — restoring motor function,'' Nature Reviews Neurology, vol.8, pp. 690--699, December 2012. [Online]: http://dx.doi.org/10.1038/nrneurol.2012.219
[^10]: M.Gilliaux, A.Renders, D.Dispa, D.Holvoet, J.Sapin, B.Dehez, C.Detrembleur, T.M. Lejeune, and G.Stoquart, ''Upper limb robot-assisted therapy in cerebral palsy: A single-blind randomized controlled trial,'' Neurorehabilitation AND Neural Repair, vol.29, no.2, pp. 183--192, February 2015. [Online]: http://nnr.sagepub.com/content/29/2/183.abstract
[^11]: P.Osten and T.W. Margrie, ''Mapping brain circuitry with a light microscope,'' Nature Methods, vol.10, pp. 515--523, June 2013. [Online]: http://dx.doi.org/10.1038/nmeth.2477
[^12]: S.M. Gomez-Amaya, M.F. Barbe, W.C. deGroat, J.M. Brown, J.Tuite, Gerald F.ANDCorcos, S.B. Fecho, A.S. Braverman, and M.R. RuggieriSr, ''Neural reconstruction methods of restoring bladder function,'' Nature Reviews Urology, vol.12, pp. 100--118, February 2015. [Online]: http://dx.doi.org/10.1038/nrurol.2015.4
[^13]: H.Yu, W.Xiong, H.Zhang, W.Wang, and Z.Li, ''A parylene self-locking cuff electrode for peripheral nerve stimulation and recording,'' IEEE/ASME Journal of Microelectromechanical Systems, vol.23, no.5, pp. 1025--1035, Oct 2014. [Online]: http://dx.doi.org/10.1109/JMEMS.2014.2333733
[^14]: J.S. Ho, S.Kim, and A.S.Y. Poon, ''Midfield wireless powering for implantable systems,'' Proceedings of the IEEE, vol. 101, no.6, pp. 1369--1378, June 2013. [Online]: http://dx.doi.org/10.1109/JPROC.2013.2251851
[^15]: R.D. KEYNES, ''Excitable membranes,'' Nature, vol. 239, pp. 29--32, September 1972. [Online]: http://dx.doi.org/10.1038/239029a0
[^16]: A.D. Grosmark and G.Buzs\'aki, ''Diversity in neural firing dynamics supports both rigid and learned hippocampal sequences,'' Science, vol. 351, no. 6280, pp. 1440--1443, March 2016. [Online]: http://science.sciencemag.org/content/351/6280/1440
[^17]: B.Sakmann and E.Neher, ''Patch clamp techniques for studying ionic channels in excitable membranes,'' Annual Review of Physiology, vol.46, no.1, pp. 455--472, October 1984, pMID: 6143532. [Online]: http://dx.doi.org/10.1146/annurev.ph.46.030184.002323
[^18]: M.P. Ward, P.Rajdev, C.Ellison, and P.P. Irazoqui, ''Toward a comparison of microelectrodes for acute and chronic recordings,'' Brain Research, vol. 1282, pp. 183 -- 200, July 2009. [Online]: http://www.sciencedirect.com/science/article/pii/S0006899309010841
[^19]: J.E.B. Randles, ''Kinetics of rapid electrode reactions,'' Discuss. Faraday Soc., vol.1, pp. 11--19, 1947. [Online]: http://dx.doi.org/10.1039/DF9470100011
[^20]: M.E. Spira and A.Hai, ''Multi-electrode array technologies for neuroscience and cardiology,'' Nature Nanotechnology, vol.8, pp. 83 -- 94, February 2013. [Online]: http://dx.doi.org/10.1038/nnano.2012.265
[^21]: G.E. Moore, ''Cramming more components onto integrated circuits,'' Proceedings of the IEEE, vol.86, no.1, pp. 82--85, January 1998. [Online]: http://dx.doi.org/10.1109/JPROC.1998.658762
[^22]: I.Ferain, C.A. Colinge, and J.-P. Colinge, ''Multigate transistors as the future of classical metal-oxide-semiconductor field-effect transistors,'' Nature, vol. 479, pp. 310--316, November 2011. [Online]: http://dx.doi.org/10.1038/nature10676
[^23]: I.H. Stevenson and K.P. Kording, ''How advances in neural recording affect data analysis,'' Nature neuroscience, vol.14, no.2, pp. 139--142, February 2011. [Online]: http://dx.doi.org/10.1038/nn.2731
[^24]: C.Thomas, P.Springer, G.Loeb, Y.Berwald-Netter, and L.Okun, ''A miniature microelectrode array to monitor the bioelectric activity of cultured cells,'' Experimental cell research, vol.74, no.1, pp. 61--66, September 1972. [Online]: http://dx.doi.org/0.1016/0014-4827(72)90481-8
[^25]: R.A. Andersen, E.J. Hwang, and G.H. Mulliken, ''Cognitive neural prosthetics,'' Annual review of Psychology, vol.61, pp. 169--190, December 2010, pMID: 19575625. [Online]: http://dx.doi.org/10.1146/annurev.psych.093008.100503
[^26]: L.A. Jorgenson, W.T. Newsome, D.J. Anderson, C.I. Bargmann, E.N. Brown, K.Deisseroth, J.P. Donoghue, K.L. Hudson, G.S. Ling, P.R. MacLeish etal., ''The brain initiative: developing technology to catalyse neuroscience discovery,'' Philosophical Transactions of the Royal Society of London B: Biological Sciences, vol. 370, no. 1668, p. 20140164, 2015.
[^27]: E.DAngelo, G.Danese, G.Florimbi, F.Leporati, A.Majani, S.Masoli, S.Solinas, and E.Torti, ''The human brain project: High performance computing for brain cells hw/sw simulation and understanding,'' in Proceedings of the Digital System Design Conference, August 2015, pp. 740--747. [Online]: http://dx.doi.org/10.1109/DSD.2015.80
[^28]: K.Famm, B.Litt, K.J. Tracey, E.S. Boyden, and M.Slaoui, ''Drug discovery: a jump-start for electroceuticals,'' Nature, vol. 496, no. 7444, pp. 159--161, April 2013. [Online]: http://dx.doi.org/0.1038/496159a
[^29]: K.Deisseroth, ''Optogenetics,'' Nature methods, vol.8, no.1, pp. 26--29, January 2011. [Online]: http://dx.doi.org/10.1038/nmeth.f.324
[^30]: M.Velliste, S.Perel, M.C. Spalding, A.S. Whitford, and A.B. Schwartz, ''Cortical control of a prosthetic arm for self-feeding,'' Nature, vol. 453, no. 7198, pp. 1098--1101, June 2008. [Online]: http://dx.doi.org/10.1038/nature06996
[^31]: T.N. Theis and P.M. Solomon, ''In quest of the "next switch" prospects for greatly reduced power dissipation in a successor to the silicon field-effect transistor,'' Proceedings of the IEEE, vol.98, no.12, pp. 2005--2014, December 2010. [Online]: http://dx.doi.org/10.1109/JPROC.2010.2066531
[^32]: G.M. Amdahl, ''Validity of the single processor approach to achieving large scale computing capabilities, reprinted from the afips conference proceedings, vol. 30 (atlantic city, n.j., apr. 18-20), afips press, reston, va., 1967, pp. 483-485, when dr. amdahl was at international business machines corporation, sunnyvale, california,'' in AFIPS Conference Proceedings, Vol. 30 (Atlantic City, N.J., Apr. 18-20), vol.12, no.3.\hskip 1em plus 0.5em minus 0.4em
elax IEEE, Summer 2007, pp. 19--20. [Online]: http://dx.doi.org/0.1109/N-SSC.2007.4785615
[^33]: J.G. Koller and W.C. Athas, ''Adiabatic switching, low energy computing, and the physics of storing and erasing information,'' in IEEE Proceedings of the Workshop on Physics and Computation.\hskip 1em plus 0.5em minus 0.4em
elax IEEE, October 1992, pp. 267--270. [Online]: http://dx.doi.org/10.1109/PHYCMP.1992.615554
[^34]: E.P. DeBenedictis, J.E. Cook, M.F. Hoemmen, and T.S. Metodi, ''Optimal adiabatic scaling and the processor-in-memory-and-storage architecture (oas :pims),'' in IEEE Proceedings of the International Symposium on Nanoscale Architectures.\hskip 1em plus 0.5em minus 0.4em
elax IEEE, July 2015, pp. 69--74. [Online]: http://dx.doi.org/10.1109/NANOARCH.2015.7180589
[^35]: S.Houri, G.Billiot, M.Belleville, A.Valentian, and H.Fanet, ''Limits of cmos technology and interest of nems relays for adiabatic logic applications,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.62, no.6, pp. 1546--1554, June 2015. [Online]: http://dx.doi.org/10.1109/TCSI.2015.2415177
[^36]: S.K. Arfin and R.Sarpeshkar, ''An energy-efficient, adiabatic electrode stimulator with inductive energy recycling and feedback current regulation,'' IEEE Transactions on Biomedical Circuits and Systems, vol.6, no.1, pp. 1--14, February 2012. [Online]: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6036003&isnumber=6138606
[^37]: P.R. Kinget, ''Scaling analog circuits into deep nanoscale cmos: Obstacles and ways to overcome them,'' in IEEE Proceedings of the Custom Integrated Circuits Conference.\hskip 1em plus 0.5em minus 0.4em
elax IEEE, September 2015, pp. 1--8. [Online]: http://dx.doi.org/10.1109/CICC.2015.7338394
[^38]: K.Bernstein, D.J. Frank, A.E. Gattiker, W.Haensch, B.L. Ji, S.R. Nassif, E.J. Nowak, D.J. Pearson, and N.J. Rohrer, ''High-performance cmos variability in the 65-nm regime and beyond,'' IBM Journal of Research AND Development, vol.50, no. 4.5, pp. 433--449, July 2006. [Online]: http://dx.doi.org/10.1147/rd.504.0433
[^39]: L.L. Lewyn, T.Ytterdal, C.Wulff, and K.Martin, ''Analog circuit design in nanoscale cmos technologies,'' Proceedings of the IEEE, vol.97, no.10, pp. 1687--1714, October 2009. [Online]: http://dx.doi.org/10.1109/JPROC.2009.2024663
[^40]: Y.Xin, W.X.Y. Li, Z.Zhang, R.C.C. Cheung, D.Song, and T.W. Berger, ''An application specific instruction set processor (asip) for adaptive filters in neural prosthetics,'' IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.12, no.5, pp. 1034--1047, September 2015. [Online]: http://dx.doi.org/10.1109/TCBB.2015.2440248
[^41]: G.Schalk, P.Brunner, L.A. Gerhardt, H.Bischof, and J.R. Wolpaw, ''Brain-computer interfaces (bcis): detection instead of classification,'' Journal of neuroscience methods, vol. 167, no.1, pp. 51--62, 2008, brain-Computer Interfaces (BCIs). [Online]: http://www.sciencedirect.com/science/article/pii/S0165027007004116
[^42]: Z.Li, J.E. O'Doherty, T.L. Hanson, M.A. Lebedev, C.S. Henriquez, and M.A. Nicolelis, ''Unscented kalman filter for brain-machine interfaces,'' PloS one, vol.4, no.7, pp. 1--18, 2009. [Online]: http://dx.doi.org/10.1371/journal.pone.0006243
[^43]: A.L. Orsborn, H.G. Moorman, S.A. Overduin, M.M. Shanechi, D.F. Dimitrov, and J.M. Carmena, ''Closed-loop decoder adaptation shapes neural plasticity for skillful neuroprosthetic control,'' Neuron, vol.82, pp. 1380 -- 1393, March 2016. [Online]: http://dx.doi.org/10.1016/j.neuron.2014.04.048
[^44]: Y.Yan, X.Qin, Y.Wu, N.Zhang, J.Fan, and L.Wang, ''A restricted boltzmann machine based two-lead electrocardiography classification,'' in IEEE Proceedings of the International Conference on Wearable and Implantable Body Sensor Networks.\hskip 1em plus 0.5em minus 0.4em
elax IEEE, June 2015, pp. 1--9. [Online]: http://dx.doi.org/10.1109/BSN.2015.7299399
[^45]: B.M. Yu and J.P. Cunningham, ''Dimensionality reduction for large-scale neural recordings,'' Nature Neuroscience, vol.17, pp. 1500 -- 1509, November 2014. [Online]: http://dx.doi.org/10.1038/nn.3776
[^46]: S.Makeig, C.Kothe, T.Mullen, N.Bigdely-Shamlo, Z.Zhang, and K.Kreutz-Delgado, ''Evolving signal processing for brain: Computer interfaces,'' Proceedings of the IEEE, vol. 100, no. Special Centennial Issue, pp. 1567--1584, May 2012. [Online]: http://dx.doi.org/10.1109/JPROC.2012.2185009
[^47]: G.Indiveri and S.C. Liu, ''Memory and information processing in neuromorphic systems,'' Proceedings of the IEEE, vol. 103, no.8, pp. 1379--1397, August 2015. [Online]: http://dx.doi.org/10.1109/JPROC.2015.2444094
[^48]: Y.Chen, E.Yao, and A.Basu, ''A 128-channel extreme learning machine-based neural decoder for brain machine interfaces,'' IEEE Transactions on Biomedical Circuits and Systems, vol.10, no.3, pp. 679--692, June 2016. [Online]: http://dx.doi.org/10.1109/TBCAS.2015.2483618
[^49]: V.Karkare, S.Gibson, and D.Marković, ''A 75- $\mu$w, 16-channel neural spike-sorting processor with unsupervised clustering,'' IEEE Journal of Solid-State Circuits, vol.48, no.9, pp. 2230--2238, September 2013. [Online]: http://dx.doi.org/10.1109/JSSC.2013.2264616
[^50]: T.C. Chen, W.Liu, and L.G. Chen, ''128-channel spike sorting processor with a parallel-folding structure in 90nm process,'' in IEEE Proceedings of the International Symposium on Circuits and Systems, May 2009, pp. 1253--1256. [Online]: http://dx.doi.org/10.1109/ISCAS.2009.5117990
[^51]: G.Baranauskas, ''What limits the performance of current invasive brain machine interfaces?'' Frontiers in Systems Neuroscience, vol.8, no.68, April 2014. [Online]: http://www.frontiersin.org/systems_neuroscience/10.3389/fnsys.2014.00068
[^52]: E.F. Chang, ''Towards large-scale, human-based, mesoscopic neurotechnologies,'' Neuron, vol.86, pp. 68--78, March 2016. [Online]: http://dx.doi.org/10.1016/j.neuron.2015.03.037
[^53]: M.A.L. Nicolelis and M.A. Lebedev, ''Principles of neural ensemble physiology underlying the operation of brain-machine,'' Nature Reviews Neuroscience, vol.10, pp. 530--540, July 2009. [Online]: http://dx.doi.org/10.1038/nrn2653
[^54]: Z.Fekete, ''Recent advances in silicon-based neural microelectrodes and microsystems: a review,'' Sensors AND Actuators B: Chemical, vol. 215, pp. 300 -- 315, 2015. [Online]: http://www.sciencedirect.com/science/article/pii/S092540051500386X
[^55]: N.Saeidi, M.Schuettler, A.Demosthenous, and N.Donaldson, ''Technology for integrated circuit micropackages for neural interfaces, based on goldsilicon wafer bonding,'' Journal of Micromechanics AND Microengineering, vol.23, no.7, p. 075021, June 2013. [Online]: http://stacks.iop.org/0960-1317/23/i=7/a=075021
[^56]: K.Seidl, S.Herwik, T.Torfs, H.P. Neves, O.Paul, and P.Ruther, ''Cmos-based high-density silicon microprobe arrays for electronic depth control in intracortical neural recording,'' IEEE Journal of Microelectromechanical Systems, vol.20, no.6, pp. 1439--1448, December 2011. [Online]: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6033040&isnumber=6075219
[^57]: T.D.Y. Kozai, N.B. Langhals, P.R. Patel, X.Deng, H.Zhang, K.L. Smith, J.Lahann, N.A. Kotov, and D.R. Kipke, ''Ultrasmall implantable composite microelectrodes with bioactive surfaces for chronic neural interfaces,'' Nature Materials, vol.11, pp. 1065--1073, December 2012. [Online]: http://dx.doi.org/10.1038/nmat3468
[^58]: D.A. Schwarz, M.A. Lebedev, T.L. Hanson, D.F. Dimitrov, G.Lehew, J.Meloy, S.Rajangam, V.Subramanian, P.J. Ifft, Z.Li, A.Ramakrishnan, A.Tate, K.Z. Zhuang, and M.A.L. Nicolelis, ''Chronic, wireless recordings of large-scale brain activity in freely moving rhesus monkeys,'' Nature Methods, vol.11, pp. 670--676, April 2014. [Online]: http://dx.doi.org/10.1038/nmeth.2936
[^59]: P.Ruther, S.Herwik, S.Kisban, K.Seidl, and O.Paul, ''Recent progress in neural probes using silicon mems technology,'' IEEJ Transactions on Electrical and Electronic Engineering, vol.5, no.5, pp. 505--515, 2010. [Online]: http://dx.doi.org/10.1002/tee.20566
[^60]: ibitem3d-printH.-W. Kang, S.J. Lee, I.K. Ko, C.Kengla, J.J. Yoo, and A.Atala, ''A 3d bioprinting system to produce human-scale tissue constructs with structural integrity,'' Nature Biotechnology, vol.34, pp. 312--319, March 2016. [Online]: http://dx.doi.org/10.1038/nbt.3413
[^61]: ibitemdistrib-electC.Xie, J.Liu, T.-M. Fu, X.Dai, W.Zhou, and C.M. Lieber, ''Three-dimensional macroporous nanoelectronic networks as minimally invasive brain probes,'' Nature Materials, vol.14, pp. 1286--1292, May 2015. [Online]: http://dx.doi.org/10.1038/nmat4427
[^62]: R.R. Harrison, P.T. Watkins, R.J. Kier, R.O. Lovejoy, D.J. Black, B.Greger, and F.Solzbacher, ''A low-power integrated circuit for a wireless 100-electrode neural recording system,'' IEEE Journal of Solid-State Circuits, vol.42, no.1, pp. 123--133, Jan 2007. [Online]: http://dx.doi.org/10.1109/JSSC.2006.886567
[^63]: J.Guo, W.Ng, J.Yuan, S.Li, and M.Chan, ''A 200-channel area-power-efficient chemical and electrical dual-mode acquisition ic for the study of neurodegenerative diseases,'' IEEE Transactions on Biomedical Circuits and Systems, vol.10, no.3, pp. 567--578, June 2016. [Online]: http://dx.doi.org/10.1109/TBCAS.2015.2468052
[^64]: W.Biederman, D.J. Yeager, N.Narevsky, J.Leverett, R.Neely, J.M. Carmena, E.Alon, and J.M. Rabaey, ''A 4.78 mm 2 fully-integrated neuromodulation soc combining 64 acquisition channels with digital compression and simultaneous dual stimulation,'' IEEE Journal of Solid-State Circuits, vol.50, no.4, pp. 1038--1047, April 2015. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2384736
[^65]: R.Muller, S.Gambini, and J.M. Rabaey, ''A 0.013mm$^2$, $5 \mu w$, dc-coupled neural signal acquisition ic with 0.5v supply,'' IEEE Journal of Solid-State Circuits, vol.47, no.1, pp. 232--243, Jan 2012. [Online]: http://dx.doi.org/10.1109/JSSC.2011.2163552
[^66]: H.Kassiri, A.Bagheri, N.Soltani, K.Abdelhalim, H.M. Jafari, M.T. Salam, J.L.P. Velazquez, and R.Genov, ''Battery-less tri-band-radio neuro-monitor and responsive neurostimulator for diagnostics and treatment of neurological disorders,'' IEEE Journal of Solid-State Circuits, vol.51, no.5, pp. 1274--1289, May 2016. [Online]: http://dx.doi.org/10.1109/JSSC.2016.2528999
[^67]: M.Ballini, J.Müller, P.Livi, Y.Chen, U.Frey, A.Stettler, A.Shadmani, V.Viswam, I.L. Jones, D.Jäckel, M.Radivojevic, M.K. Lewandowska, W.Gong, M.Fiscella, D.J. Bakkum, F.Heer, and A.Hierlemann, ''A 1024-channel cmos microelectrode array with 26,400 electrodes for recording and stimulation of electrogenic cells in vitro,'' IEEE Journal of Solid-State Circuits, vol.49, no.11, pp. 2705--2719, Nov 2014. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2359219
[^68]: P.D. Wolf, Thermal considerations for the design of an implanted cortical brain--machine interface (BMI).\hskip 1em plus 0.5em minus 0.4em
elax CRC Press Boca Raton, FL, 2008, pMID: 21204402. [Online]: http://www.ncbi.nlm.nih.gov/books/NBK3932
[^69]: T.Denison, K.Consoer, W.Santa, A.T. Avestruz, J.Cooley, and A.Kelly, ''A 2 $\mu$w 100 nv/rthz chopper-stabilized instrumentation amplifier for chronic measurement of neural field potentials,'' IEEE Journal of Solid-State Circuits, vol.42, no.12, pp. 2934--2945, December 2007. [Online]: http://dx.doi.org/10.1109/JSSC.2007.908664
[^70]: B.Johnson, S.T. Peace, A.Wang, T.A. Cleland, and A.Molnar, ''A 768-channel cmos microelectrode array with angle sensitive pixels for neuronal recording,'' IEEE Sensors Journal, vol.13, no.9, pp. 3211--3218, Sept 2013. [Online]: http://dx.doi.org/10.1109/JSEN.2013.2266894
[^71]: C.M. Lopez, A.Andrei, S.Mitra, M.Welkenhuysen, W.Eberle, C.Bartic, R.Puers, R.F. Yazicioglu, and G.G.E. Gielen, ''An implantable 455-active-electrode 52-channel cmos neural probe,'' IEEE Journal of Solid-State Circuits, vol.49, no.1, pp. 248--261, January 2014. [Online]: http://dx.doi.org/10.1109/JSSC.2013.2284347
[^72]: J.Scholvin, J.P. Kinney, J.G. Bernstein, C.Moore-Kochlacs, N.Kopell, C.G. Fonstad, and E.S. Boyden, ''Close-packed silicon microelectrodes for scalable spatially oversampled neural recording,'' IEEE Transactions on Biomedical Engineering, vol.63, no.1, pp. 120--130, Jan 2016. [Online]: http://dx.doi.org/10.1109/TBME.2015.2406113
[^73]: M.Han, B.Kim, Y.A. Chen, H.Lee, S.H. Park, E.Cheong, J.Hong, G.Han, and Y.Chae, ''Bulk switching instrumentation amplifier for a high-impedance source in neural signal recording,'' IEEE Transactions on Circuits and Systems---Part II: Express Briefs, vol.62, no.2, pp. 194--198, Feb 2015. [Online]: http://dx.doi.org/10.1109/TCSII.2014.2368615
[^74]: R.Muller, S.Gambini, and J.M. Rabaey, ''A 0.013$ $mm$^2$, 5$ \mu$w, dc-coupled neural signal acquisition ic with 0.5 v supply,'' IEEE Journal of Solid-State Circuits, vol.47, no.1, pp. 232--243, Jan 2012. [Online]: http://dx.doi.org/10.1109/JSSC.2011.2163552
[^75]: ''Rhd2164 digital electrophysiology interface chip - data sheet,'' Intan Technologies, Los Angeles, California, December 2013. [Online]: http://www.intantech.com/files/Intan_RHD2164_datasheet.pdf
[^76]: K.M. Al-Ashmouny, S.I. Chang, and E.Yoon, ''A 4 $\mu$w/ch analog front-end module with moderate inversion and power-scalable sampling operation for 3-d neural microsystems,'' IEEE Transactions on Biomedical Circuits and Systems, vol.6, no.5, pp. 403--413, October 2012. [Online]: http://dx.doi.org/10.1109/TBCAS.2012.2218105
[^77]: D.Han, Y.Zheng, R.Rajkumar, G.S. Dawe, and M.Je, ''A 0.45 v 100-channel neural-recording ic with sub-$\mu$w/channel consumption in 0.18$\mu$m cmos,'' IEEE Transactions on Biomedical Circuits and Systems, vol.7, no.6, pp. 735--746, December 2013. [Online]: http://dx.doi.org/10.1109/TBCAS.2014.2298860
[^78]: S.B. Lee, H.M. Lee, M.Kiani, U.M. Jow, and M.Ghovanloo, ''An inductively powered scalable 32-channel wireless neural recording system-on-a-chip for neuroscience applications,'' IEEE Transactions on Biomedical Circuits and Systems, vol.4, no.6, pp. 360--371, Dec 2010. [Online]: http://dx.doi.org/10.1109/TBCAS.2010.2078814
[^79]: J.Yoo, L.Yan, D.El-Damak, M.A.B. Altaf, A.H. Shoeb, and A.P. Chandrakasan, ''An 8-channel scalable eeg acquisition soc with patient-specific seizure classification and recording processor,'' IEEE Journal of Solid-State Circuits, vol.48, no.1, pp. 214--228, Jan 2013. [Online]: http://dx.doi.org/10.1109/JSSC.2012.2221220
[^80]: M.A.B. Altaf and J.Yoo, ''A 1.83$ \mu$j/classification, 8-channel, patient-specific epileptic seizure classification soc using a non-linear support vector machine,'' IEEE Transactions on Biomedical Circuits and Systems, vol.10, no.1, pp. 49--60, Feb 2016. [Online]: http://dx.doi.org/10.1109/TBCAS.2014.2386891
[^81]: K.Abdelhalim, H.M. Jafari, L.Kokarovtseva, J.L.P. Velazquez, and R.Genov, ''64-channel uwb wireless neural vector analyzer soc with a closed-loop phase synchrony-triggered neurostimulator,'' IEEE Journal of Solid-State Circuits, vol.48, no.10, pp. 2494--2510, Oct 2013. [Online]: http://dx.doi.org/10.1109/JSSC.2013.2272952
[^82]: A.Bagheri, S.R.I. Gabran, M.T. Salam, J.L.P. Velazquez, R.R. Mansour, M.M.A. Salama, and R.Genov, ''Massively-parallel neuromonitoring and neurostimulation rodent headset with nanotextured flexible microelectrodes,'' IEEE Transactions on Biomedical Circuits and Systems, vol.7, no.5, pp. 601--609, Oct 2013. [Online]: http://dx.doi.org/10.1109/TBCAS.2013.2281772
[^83]: H.G. Rhew, J.Jeong, J.A. Fredenburg, S.Dodani, P.G. Patil, and M.P. Flynn, ''A fully self-contained logarithmic closed-loop deep brain stimulation soc with wireless telemetry and wireless power management,'' IEEE Journal of Solid-State Circuits, vol.49, no.10, pp. 2213--2227, Oct 2014. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2346779
[^84]: W.Biederman, D.J. Yeager, N.Narevsky, J.Leverett, R.Neely, J.M. Carmena, E.Alon, and J.M. Rabaey, ''A 4.78 mm 2 fully-integrated neuromodulation soc combining 64 acquisition channels with digital compression and simultaneous dual stimulation,'' IEEE Journal of Solid-State Circuits, vol.50, no.4, pp. 1038--1047, April 2015. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2384736
[^85]: A.Mendez, A.Belghith, and M.Sawan, ''A dsp for sensing the bladder volume through afferent neural pathways,'' IEEE Transactions on Biomedical Circuits and Systems, vol.8, no.4, pp. 552--564, Aug 2014. [Online]: http://dx.doi.org/10.1109/TBCAS.2013.2282087
[^86]: T.T. Liu and J.M. Rabaey, ''A 0.25 v 460 nw asynchronous neural signal processor with inherent leakage suppression,'' IEEE Journal of Solid-State Circuits, vol.48, no.4, pp. 897--906, April 2013. [Online]: http://dx.doi.org/10.1109/JSSC.2013.2239096
[^87]: D.Han, Y.Zheng, R.Rajkumar, G.S. Dawe, and M.Je, ''A 0.45 v 100-channel neural-recording ic with sub-$\mu$w/channel consumption in 0.18$ \mu$m cmos,'' IEEE Transactions on Biomedical Circuits and Systems, vol.7, no.6, pp. 735--746, Dec 2013. [Online]: http://dx.doi.org/10.1109/TBCAS.2014.2298860
[^88]: R.Muller, H.P. Le, W.Li, P.Ledochowitsch, S.Gambini, T.Bjorninen, A.Koralek, J.M. Carmena, M.M. Maharbiz, E.Alon, and J.M. Rabaey, ''A minimally invasive 64-channel wireless $\mu$ecog implant,'' IEEE Journal of Solid-State Circuits, vol.50, no.1, pp. 344--359, Jan 2015. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2364824
[^89]: B.Vigraham, J.Kuppambatti, and P.R. Kinget, ''Switched-mode operational amplifiers and their application to continuous-time filters in nanoscale cmos,'' IEEE Journal of Solid-State Circuits, vol.49, no.12, pp. 2758--2772, December 2014. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2354641
[^90]: V.Karkare, H.Chandrakumar, D.Rozgić, and D.Marković, ''Robust, reconfigurable, and power-efficient biosignal recording systems,'' in IEEE Proceedings of the Custom Integrated Circuits Conference, Sept 2014, pp. 1--8. [Online]: http://dx.doi.org/10.1109/CICC.2014.6946018
[^91]: L.B. Leene and T.G. Constandinou, ''A 0.45v continuous time-domain filter using asynchronous oscillator structures,'' in IEEE Proceedings of the International Conference on Electronics, Circuits and Systems, December 2016.
[^92]: R.Mohan, L.Yan, G.Gielen, C.V. Hoof, and R.F. Yazicioglu, ''0.35 v time-domain-based instrumentation amplifier,'' Electronics Letters, vol.50, no.21, pp. 1513--1514, October 2014. [Online]: http://dx.doi.org/10.1049/el.2014.2471
[^93]: X.Zhang, Z.Zhang, Y.Li, C.Liu, Y.X. Guo, and Y.Lian, ''A 2.89$ \mu$w dry-electrode enabled clockless wireless ecg soc for wearable applications,'' IEEE Journal of Solid-State Circuits, vol.51, no.10, pp. 2287--2298, Oct 2016. [Online]: http://dx.doi.org/10.1109/JSSC.2016.2582863
[^94]: M.Elia, L.B. Leene, and T.G. Constandinou, ''Continuous-time micropower interface for neural recording applications,'' in IEEE Proceedings of the International Symposium on Circuits and Systems, May 2016, pp. 534--537. [Online]: http://dx.doi.org/10.1109/ISCAS.2016.7527295
[^95]: N.Guo, Y.Huang, T.Mai, S.Patil, C.Cao, M.Seok, S.Sethumadhavan, and Y.Tsividis, ''Energy-efficient hybrid analog/digital approximate computation in continuous time,'' IEEE Journal of Solid-State Circuits, vol.51, no.7, pp. 1514--1524, July 2016. [Online]: http://dx.doi.org/10.1109/JSSC.2016.2543729
[^96]: B.Bozorgzadeh, D.R. Schuweiler, M.J. Bobak, P.A. Garris, and P.Mohseni, ''Neurochemostat: A neural interface soc with integrated chemometrics for closed-loop regulation of brain dopamine,'' IEEE Transactions on Biomedical Circuits and Systems, vol.10, no.3, pp. 654--667, June 2016. [Online]: http://dx.doi.org/10.1109/TBCAS.2015.2453791
[^97]: E.B. Myers and M.L. Roukes, ''Comparative advantages of mechanical biosensors,'' Nature nanotechnology, vol.6, no.4, pp. 1748--3387, April 2011. [Online]: http://dx.doi.org/10.1038/nnano.2011.44
[^98]: R.Machado, N.Soltani, S.Dufour, M.T. Salam, P.L. Carlen, R.Genov, and M.Thompson, ''Biofouling-resistant impedimetric sensor for array high-resolution extracellular potassium monitoring in the brain,'' Biosensors, vol.6, no.4, p.53, October 2016. [Online]: http://dx.doi.org/10.3390/bios6040053
[^99]: J.Guo, W.Ng, J.Yuan, S.Li, and M.Chan, ''A 200-channel area-power-efficient chemical and electrical dual-mode acquisition ic for the study of neurodegenerative diseases,'' IEEE Transactions on Biomedical Circuits and Systems, vol.10, no.3, pp. 567--578, June 2016. [Online]: http://dx.doi.org/10.1109/TBCAS.2015.2468052
[^100]: D.A. Dombeck, A.N. Khabbaz, F.Collman, T.L. Adelman, and D.W. Tank, ''Imaging large-scale neural activity with cellular resolution in awake, mobile mice.'' Neuron, vol.56, no.1, pp. 43--57, October 2007. [Online]: http://dx.doi.org/10.1016/j.neuron.2007.08.003
[^101]: T.York, S.B. Powell, S.Gao, L.Kahan, T.Charanya, D.Saha, N.W. Roberts, T.W. Cronin, J.Marshall, S.Achilefu, S.P. Lake, B.Raman, and V.Gruev, ''Bioinspired polarization imaging sensors: From circuits and optics to signal processing algorithms and biomedical applications,'' Proceedings of the IEEE, vol. 102, no.10, pp. 1450--1469, Oct 2014. [Online]: http://dx.doi.org/10.1109/JPROC.2014.2342537
[^102]: K.Paralikar, P.Cong, O.Yizhar, L.E. Fenno, W.Santa, C.Nielsen, D.Dinsmoor, B.Hocken, G.O. Munns, J.Giftakis, K.Deisseroth, and T.Denison, ''An implantable optical stimulation delivery system for actuating an excitable biosubstrate,'' IEEE Journal of Solid-State Circuits, vol.46, no.1, pp. 321--332, Jan 2011. [Online]: http://dx.doi.org/10.1109/JSSC.2010.2074110
[^103]: N.Ji and S.L. Smith, ''Technologies for imaging neural activity in large volumes,'' Nature Neuroscience, vol.19, pp. 1154--1164, September 2016. [Online]: http://dx.doi.org/10.1038/nn.4358
[^104]: S.Song, K.D. Miller, and L.F. Abbott, ''Competitive hebbian learning through spike-timing-dependent synaptic plasticity,'' Nature Neuroscience, vol.3, pp. 919--926, September 2000. [Online]: http://dx.doi.org/10.1038/78829
[^105]: T.Kurafuji, M.Haraguchi, M.Nakajima, T.Nishijima, T.Tanizaki, H.Yamasaki, T.Sugimura, Y.Imai, M.Ishizaki, T.Kumaki, K.Murata, K.Yoshida, E.Shimomura, H.Noda, Y.Okuno, S.Kamijo, T.Koide, H.J. Mattausch, and K.Arimoto, ''A scalable massively parallel processor for real-time image processing,'' IEEE Journal of Solid-State Circuits, vol.46, no.10, pp. 2363--2373, October 2011. [Online]: http://dx.doi.org/10.1109/JSSC.2011.2159528
[^106]: J.Y. Kim, M.Kim, S.Lee, J.Oh, K.Kim, and H.J. Yoo, ''A 201.4 gops 496 mw real-time multi-object recognition processor with bio-inspired neural perception engine,'' IEEE Journal of Solid-State Circuits, vol.45, no.1, pp. 32--45, Jan 2010. [Online]: http://dx.doi.org/10.1109/JSSC.2009.2031768
[^107]: C.C. Cheng, C.H. Lin, C.T. Li, and L.G. Chen, ''ivisual: An intelligent visual sensor soc with 2790 fps cmos image sensor and 205 gops/w vision processor,'' IEEE Journal of Solid-State Circuits, vol.44, no.1, pp. 127--135, Jan 2009. [Online]: http://dx.doi.org/10.1109/JSSC.2008.2007158
[^108]: H.Noda, M.Nakajima, K.Dosaka, K.Nakata, M.Higashida, O.Yamamoto, K.Mizumoto, T.Tanizaki, T.Gyohten, Y.Okuno, H.Kondo, Y.Shimazu, K.Arimoto, K.Saito, and T.Shimizu, ''The design and implementation of the massively parallel processor based on the matrix architecture,'' IEEE Journal of Solid-State Circuits, vol.42, no.1, pp. 183--192, Jan 2007. [Online]: http://dx.doi.org/10.1109/JSSC.2006.886545
[^109]: M.S. Chae, W.Liu, and M.Sivaprakasam, ''Design optimization for integrated neural recording systems,'' IEEE Journal of Solid-State Circuits, vol.43, no.9, pp. 1931--1939, September 2008. [Online]: http://dx.doi.org/10.1109/JSSC.2008.2001877
[^110]: K.J. Miller, L.B. Sorensen, J.G. Ojemann, and M.den Nijs, ''Power-law scaling in the brain surface electric potential,'' PLoS Comput Biol, vol.5, no.12, pp. 1--10, 12 2009. [Online]: http://dx.doi.org/10.1371%2Fjournal.pcbi.1000609
[^111]: R.Harrison and C.Charles, ''A low-power low-noise cmos amplifier for neural recording applications,'' IEEE Journal of Solid-State Circuits, vol.38, no.6, pp. 958--965, June 2003. [Online]: http://dx.doi.org/10.1109/JSSC.2003.811979
[^112]: W.Sansen, ''1.3 analog cmos from 5 micrometer to 5 nanometer,'' in IEEE Proceedings of the International Solid-State Circuits Conference.\hskip 1em plus 0.5em minus 0.4em
elax IEEE, February 2015, pp. 1--6. [Online]: http://dx.doi.org/10.1109/ISSCC.2015.7062848
[^113]: M.S.J. Steyaert and W.M.C. Sansen, ''A micropower low-noise monolithic instrumentation amplifier for medical purposes,'' IEEE Journal of Solid-State Circuits, vol.22, no.6, pp. 1163--1168, December 1987. [Online]: http://dx.doi.org/10.1109/JSSC.1987.1052869
[^114]: W.Wattanapanitch, M.Fee, and R.Sarpeshkar, ''An energy-efficient micropower neural recording amplifier,'' IEEE Transactions on Biomedical Circuits and Systems, vol.1, no.2, pp. 136--147, June 2007. [Online]: http://dx.doi.org/10.1109/TBCAS.2007.907868
[^115]: B.Johnson and A.Molnar, ''An orthogonal current-reuse amplifier for multi-channel sensing,'' IEEE Journal of Solid-State Circuits, vol.48, no.6, pp. 1487--1496, June 2013. [Online]: http://dx.doi.org/10.1109/JSSC.2013.2257478
[^116]: C.Qian, J.Parramon, and E.Sanchez-Sinencio, ''A micropower low-noise neural recording front-end circuit for epileptic seizure detection,'' IEEE Journal of Solid-State Circuits, vol.46, no.6, pp. 1392--1405, June 2011. [Online]: http://dx.doi.org/10.1109/JSSC.2011.2126370
[^117]: X.Zou, L.Liu, J.H. Cheong, L.Yao, P.Li, M.-Y. Cheng, W.L. Goh, R.Rajkumar, G.Dawe, K.-W. Cheng, and M.Je, ''A 100-channel 1-mw implantable neural recording ic,'' IEEE Transactions on Circuits and Systems---Part I: Regular Papers, vol.60, no.10, pp. 2584--2596, October 2013. [Online]: http://dx.doi.org/10.1109/TCSI.2013.2249175
[^118]: V.Majidzadeh, A.Schmid, and Y.Leblebici, ''Energy efficient low-noise neural recording amplifier with enhanced noise efficiency factor,'' IEEE Transactions on Biomedical Circuits and Systems, vol.5, no.3, pp. 262--271, June 2011. [Online]: http://dx.doi.org/10.1109/TBCAS.2010.2078815
[^119]: ibitemQ-basedC.C. Enz and E.A. Vittoz, Charge-based MOS transistor modeling: the EKV model for low-power AND RF IC design.\hskip 1em plus 0.5em minus 0.4em
elax John Wiley & Sons, August 2006. [Online]: http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470855452.html
[^120]: Y.Yasuda, T.-J.K. Liu, and C.Hu, ''Flicker-noise impact on scaling of mixed-signal cmos with hfsion,'' IEEE Transactions on Electron Devices, vol.55, no.1, pp. 417--422, January 2008. [Online]: http://dx.doi.org/10.1109/TED.2007.910759
[^121]: S.-Y. Wu, C.Lin, M.Chiang, J.Liaw, J.Cheng, S.Yang, M.Liang, T.Miyashita, C.Tsai, B.Hsu, H.Chen, T.Yamamoto, S.Chang, V.Chang, C.Chang, J.Chen, H.Chen, K.Ting, Y.Wu, K.Pan, R.Tsui, C.Yao, P.Chang, H.Lien, T.Lee, H.Lee, W.Chang, T.Chang, R.Chen, M.Yeh, C.Chen, Y.Chiu, Y.Chen, H.Huang, Y.Lu, C.Chang, M.Tsai, C.Liu, K.Chen, C.Kuo, H.Lin, S.Jang, and Y.Ku, ''A 16nm finfet cmos technology for mobile soc and computing applications,'' in IEEE Proceedings of the International Electron Devices Meeting, December 2013, pp. 9.1.1--9.1.4. [Online]: http://dx.doi.org/10.1109/IEDM.2013.6724591
[^122]: L.B. Leene, Y.Liu, and T.G. Constandinou, ''A compact recording array for neural interfaces,'' in IEEE Proceedings of the Biomedical Circuits and Systems Conference, October 2013, pp. 97--100. [Online]: http://dx.doi.org/10.1109/BioCAS.2013.6679648
[^123]: Q.Fan, F.Sebastiano, J.Huijsing, and K.Makinwa, ''A $1.8 \mu w\:60 nv/√Hz$ capacitively-coupled chopper instrumentation amplifier in 65 nm cmos for wireless sensor nodes,'' IEEE Journal of Solid-State Circuits, vol.46, no.7, pp. 1534--1543, July 2011. [Online]: http://dx.doi.org/10.1109/JSSC.2011.2143610
[^124]: H.Chandrakumar and D.Markovic, ''A simple area-efficient ripple-rejection technique for chopped biosignal amplifiers,'' IEEE Transactions on Circuits and Systems---Part II: Express Briefs, vol.62, no.2, pp. 189--193, February 2015. [Online]: http://dx.doi.org/10.1109/TCSII.2014.2387686
[^125]: H.Chandrakumar and D.Markovic, ''A 2$\mu$w 40mvpp linear-input-range chopper-stabilized bio-signal amplifier with boosted input impedance of 300mohm and electrode-offset filtering,'' in IEEE Proceedings of the International Solid-State Circuits Conference.\hskip 1em plus 0.5em minus 0.4em
elax IEEE, January 2016, pp. 96--97. [Online]: http://dx.doi.org/10.1109/ISSCC.2016.7417924
[^126]: H.Rezaee-Dehsorkh, N.Ravanshad, R.Lotfi, K.Mafinezhad, and A.M. Sodagar, ''Analysis and design of tunable amplifiers for implantable neural recording applications,'' IEEE Transactions on Emerging and Selected Topics in Circuits and Systems, vol.1, no.4, pp. 546--556, December 2011. [Online]: http://dx.doi.org/10.1109/JETCAS.2011.2174492
[^127]: X.Zou, X.Xu, L.Yao, and Y.Lian, ''A 1-v 450-nw fully integrated programmable biomedical sensor interface chip,'' IEEE Journal of Solid-State Circuits, vol.44, no.4, pp. 1067--1077, April 2009. [Online]: http://dx.doi.org/10.1109/JSSC.2009.2014707
[^128]: L.Leene and T.Constandinou, ''Ultra-low power design strategy for two-stage amplifier topologies,'' Electronics Letters, vol.50, no.8, pp. 583--585, April 2014. [Online]: http://dx.doi.org/10.1049/el.2013.4196
[^129]: H.G. Rey, C.Pedreira, and R.Q. Quiroga, ''Past, present and future of spike sorting techniques,'' Brain Research Bulletin, vol. 119, Part B, pp. 106--117, October 2015, advances in electrophysiological data analysis. [Online]: http://www.sciencedirect.com/science/article/pii/S0361923015000684
[^130]: Y.Chen, A.Basu, L.Liu, X.Zou, R.Rajkumar, G.S. Dawe, and M.Je, ''A digitally assisted, signal folding neural recording amplifier,'' IEEE Transactions on Biomedical Circuits and Systems, vol.8, no.4, pp. 528--542, August 2014. [Online]: http://dx.doi.org/10.1109/TBCAS.2013.2288680
[^131]: X.Yue, ''Determining the reliable minimum unit capacitance for the dac capacitor array of sar adcs,'' Microelectronics Journal, vol.44, no.6, pp. 473 -- 478, 2013. [Online]: http://www.sciencedirect.com/science/article/pii/S0026269213000815
[^132]: Y.Zhu, C.-H. Chan, U.-F. Chio, S.-W. Sin, S.-P. U, R.Martins, and F.Maloberti, ''Split-sar adcs: Improved linearity with power and speed optimization,'' IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.22, no.2, pp. 372--383, February 2014. [Online]: http://dx.doi.org/10.1109/TVLSI.2013.2242501
[^133]: L.Xie, G.Wen, J.Liu, and Y.Wang, ''Energy-efficient hybrid capacitor switching scheme for sar adc,'' Electronics Letters, vol.50, no.1, pp. 22--23, January 2014. [Online]: http://dx.doi.org/10.1049/el.2013.2794
[^134]: P.Nuzzo, F.DeBernardinis, P.Terreni, and G.Vander Plas, ''Noise analysis of regenerative comparators for reconfigurable adc architectures,'' IEEE Transactions on Circuits and Systems---Part I: Regular Papers, vol.55, no.6, pp. 1441--1454, July 2008. [Online]: http://dx.doi.org/10.1109/TCSI.2008.917991
[^135]: G.Heinzel, A.R\"udiger, and R.Schilling, ''Spectrum and spectral density estimation by the discrete fourier transform (dft), including a comprehensive list of window functions and some new at-top windows,'' pp. 25--27, February 2002. [Online]: http://hdl.handle.net/11858/00-001M-0000-0013-557A-5
[^136]: F.Gerfers, M.Ortmanns, and Y.Manoli, ''A 1.5-v 12-bit power-efficient continuous-time third-order sigma; delta; modulator,'' IEEE Journal of Solid-State Circuits, vol.38, no.8, pp. 1343--1352, Aug 2003. [Online]: http://dx.doi.org/10.1109/JSSC.2003.814432
[^137]: Y.Chae, K.Souri, and K.A.A. Makinwa, ''A 6.3$ \mu$w 20$ $bit incremental zoom-adc with 6 ppm inl and 1 $\mu$v offset,'' IEEE Journal of Solid-State Circuits, vol.48, no.12, pp. 3019--3027, Dec 2013. [Online]: http://dx.doi.org/10.1109/JSSC.2013.2278737
[^138]: Y.S. Shu, L.T. Kuo, and T.Y. Lo, ''An oversampling sar adc with dac mismatch error shaping achieving 105db sfdr and 101db sndr over 1khz bw in 55nm cmos,'' in IEEE Proceedings of the International Solid-State Circuits Conference, January 2016, pp. 458--459. [Online]: http://dx.doi.org/10.1109/ISSCC.2016.7418105
[^139]: P.Harpe, E.Cantatore, and A.van Roermund, ''An oversampled 12/14b sar adc with noise reduction and linearity enhancements achieving up to 79.1db sndr,'' in IEEE Proceedings of the International Solid-State Circuits Conference, February 2014, pp. 194--195. [Online]: http://dx.doi.org/10.1109/ISSCC.2014.6757396
[^140]: ibitemchrch-turingM.Braverman, J.Schneider, and C.Rojas, ''Space-bounded church-turing thesis and computational tractability of closed systems,'' Physical Review Letters, vol. 115, August 2015. [Online]: http://link.aps.org/doi/10.1103/PhysRevLett.115.098701
[^141]: M.Verhelst and A.Bahai, ''Where analog meets digital: Analog-to-information conversion and beyond,'' IEEE Solid-State Circuits Magazine, vol.7, no.3, pp. 67--80, September 2015. [Online]: http://dx.doi.org/10.1109/MSSC.2015.2442394
[^142]: H.A. Marblestone, M.B. Zamft, G.Y. Maguire, G.M. Shapiro, R.T. Cybulski, I.J. Glaser, D.Amodei, P.B. Stranges, R.Kalhor, A.D. Dalrymple, D.Seo, E.Alon, M.M. Maharbiz, M.J. Carmena, M.J. Rabaey, S.E. Boyden, M.G. Church, and P.K. Kording, ''Physical principles for scalable neural recording,'' Frontiers in Computational Neuroscience, vol.7, no. 137, 2013. [Online]: http://www.frontiersin.org/computational_neuroscience/10.3389/fncom.2013.00137
[^143]: L.Traver, C.Tarin, P.Marti, and N.Cardona, ''Adaptive-threshold neural spike by noise-envelope tracking,'' Electronics Letters, vol.43, no.24, pp. 1333--1335, November 2007. [Online]: http://dx.doi.org/10.1049/el:20071631
[^144]: I.Obeid and P.Wolf, ''Evaluation of spike-detection algorithms fora brain-machine interface application,'' IEEE Transactions on Biomedical Engineering, vol.51, no.6, pp. 905--911, June 2004. [Online]: http://dx.doi.org/10.1109/TBME.2004.826683
[^145]: P.Watkins, G.Santhanam, K.Shenoy, and R.Harrison, ''Validation of adaptive threshold spike detector for neural recording,'' in IEEE Proceedings of the International Conference on Engineering in Medicine and Biology Society, vol.2, September 2004, pp. 4079--4082. [Online]: http://dx.doi.org/10.1109/IEMBS.2004.1404138
[^146]: T.Takekawa, Y.Isomura, and T.Fukai, ''Accurate spike sorting for multi-unit recordings,'' European Journal of Neuroscience, vol.31, no.2, pp. 263--272, 2010. [Online]: http://dx.doi.org/10.1111/j.1460-9568.2009.07068.x
[^147]: A.Zviagintsev, Y.Perelman, and R.Ginosar, ''Low-power architectures for spike sorting,'' in IEEE Proceedings of the International Conference on Neural Engineering, March 2005, pp. 162--165. [Online]: http://dx.doi.org/10.1109/CNE.2005.1419579
[^148]: A.Rodriguez-Perez, J.Ruiz-Amaya, M.Delgado-Restituto, and A.Rodriguez-Vazquez, ''A low-power programmable neural spike detection channel with embedded calibration and data compression,'' IEEE Transactions on Biomedical Circuits and Systems, vol.6, no.2, pp. 87--100, April 2012. [Online]: http://dx.doi.org/10.1109/TBCAS.2012.2187352
[^149]: U.Rutishauser, E.M. Schuman, and A.N. Mamelak, ''Online detection and sorting of extracellularly recorded action potentials in human medial temporal lobe recordings, in vivo,'' Journal of Neuroscience Methods, vol. 154, no. 12, pp. 204 -- 224, 2006. [Online]: http://www.sciencedirect.com/science/article/pii/S0165027006000033
[^150]: F.Franke, M.Natora, C.Boucsein, M.Munk, and K.Obermayer, ''\BIBforeignlanguageEnglishAn online spike detection and spike classification algorithm capable of instantaneous resolution of overlapping spikes,'' \BIBforeignlanguageEnglishJournal of Computational Neuroscience, vol.29, no. 1-2, pp. 127--148, 2010. [Online]: http://dx.doi.org/10.1007/s10827-009-0163-5
[^151]: M.S. Chae, Z.Yang, M.Yuce, L.Hoang, and W.Liu, ''A 128-channel 6 mw wireless neural recording ic with spike feature extraction and uwb transmitter,'' IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol.17, no.4, pp. 312--321, August 2009. [Online]: http://dx.doi.org/10.1109/TNSRE.2009.2021607
[^152]: P.H. Thakur, H.Lu, S.S. Hsiao, and K.O. Johnson, ''Automated optimal detection and classification of neural action potentials in extra-cellular recordings,'' Journal of Neuroscience Methods, vol. 162, no. 12, pp. 364 -- 376, 2007. [Online]: ttp://www.sciencedirect.com/science/article/pii/S0165027007000477
[^153]: J.Zhang, Y.Suo, S.Mitra, S.Chin, S.Hsiao, R.Yazicioglu, T.Tran, and R.Etienne-Cummings, ''An efficient and compact compressed sensing microsystem for implantable neural recordings,'' IEEE Transactions on Biomedical Circuits and Systems, vol.8, no.4, pp. 485--496, August 2014. [Online]: http://dx.doi.org/10.1109/TBCAS.2013.2284254
[^154]: Y.Suo, J.Zhang, T.Xiong, P.S. Chin, R.Etienne-Cummings, and T.D. Tran, ''Energy-efficient multi-mode compressed sensing system for implantable neural recordings,'' IEEE Transactions on Biomedical Circuits and Systems, vol.8, no.5, pp. 648--659, October 2014. [Online]: http://dx.doi.org/10.1109/TBCAS.2014.2359180
[^155]: B.Yu, T.Mak, X.Li, F.Xia, A.Yakovlev, Y.Sun, and C.S. Poon, ''Real-time fpga-based multichannel spike sorting using hebbian eigenfilters,'' IEEE Transactions on Emerging and Selected Topics in Circuits and Systems, vol.1, no.4, pp. 502--515, December 2011. [Online]: http://dx.doi.org/10.1109/JETCAS.2012.2183430
[^156]: V.Ventura, ''Automatic spike sorting using tuning information,'' Neural computation, vol.21, no.9, pp. 2466--2501, September 2009. [Online]: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4167425/
[^157]: D.Y. Barsakcioglu, A.Eftekhar, and T.G. Constandinou, ''Design optimisation of front-end neural interfaces for spike sorting systems,'' in IEEE Proceedings of the International Symposium on Circuits and Systems, May 2013, pp. 2501--2504. [Online]: http://dx.doi.org/10.1109/ISCAS.2013.6572387
[^158]: A.M. Sodagar, K.D. Wise, and K.Najafi, ''A fully integrated mixed-signal neural processor for implantable multichannel cortical recording,'' IEEE Transactions on Biomedical Engineering, vol.54, no.6, pp. 1075--1088, June 2007. [Online]: http://dx.doi.org/10.1109/TBME.2007.894986
[^159]: Y.Xin, W.X. Li, R.C. Cheung, R.H. Chan, H.Yan, D.Song, and T.W. Berger, ''An fpga based scalable architecture of a stochastic state point process filter (ssppf) to track the nonlinear dynamics underlying neural spiking,'' Microelectronics Journal, vol.45, no.6, pp. 690 -- 701, June 2014. [Online]: http://www.sciencedirect.com/science/article/pii/S0026269214000913
[^160]: C.Qian, J.Shi, J.Parramon, and E.Sánchez-Sinencio, ''A low-power configurable neural recording system for epileptic seizure detection,'' IEEE Transactions on Biomedical Circuits and Systems, vol.7, no.4, pp. 499--512, August 2013. [Online]: http://dx.doi.org/10.1109/TBCAS.2012.2228857
[^161]: K.C. Chun, P.Jain, J.H. Lee, and C.H. Kim, ''A 3t gain cell embedded dram utilizing preferential boosting for high density and low power on-die caches,'' IEEE Journal of Solid-State Circuits, vol.46, no.6, pp. 1495--1505, June 2011. [Online]: http://dx.doi.org/10.1109/JSSC.2011.2128150
[^162]: R.E. Matick and S.E. Schuster, ''Logic-based edram: Origins and rationale for use,'' IBM Journal of Research AND Development, vol.49, no.1, pp. 145--165, January 2005. [Online]: http://dx.doi.org/10.1147/rd.491.0145
[^163]: R.Nair, ''Evolution of memory architecture,'' Proceedings of the IEEE, vol. 103, no.8, pp. 1331--1345, August 2015. [Online]: http://dx.doi.org/10.1109/JPROC.2015.2435018
[^164]: C.E. Molnar and I.W. Jones, ''Simple circuits that work for complicated reasons,'' in IEEE Proceedings of the International Symposium on Advanced Research in Asynchronous Circuits and Systems, 2000, pp. 138--149. [Online]: http://dx.doi.org/10.1109/ASYNC.2000.836995
[^165]: ibitemBN-formH.Schorr, ''Computer-aided digital system design and analysis using a register transfer language,'' IEEE Transactions on Electronic Computers, vol. EC-13, no.6, pp. 730--737, December 1964. [Online]: http://dx.doi.org/10.1109/PGEC.1964.263907
[^166]: D.Wang, A.Rajendiran, S.Ananthanarayanan, H.Patel, M.Tripunitara, and S.Garg, ''Reliable computing with ultra-reduced instruction set coprocessors,'' IEEE Micro, vol.34, no.6, pp. 86--94, November 2014. [Online]: http://dx.doi.org/10.1109/MM.2013.130
[^167]: ''Msp430g2x53 mixed signal microcontroller - data sheet,'' Texas Instruments Incorporated, Dallas, Texas, pp. 403--413, May 2013. [Online]: http://www.ti.com/lit/ds/symlink/msp430g2553.pdf
[^168]: F.L. Yuan, C.C. Wang, T.H. Yu, and D.Marković, ''A multi-granularity fpga with hierarchical interconnects for efficient and flexible mobile computing,'' IEEE Journal of Solid-State Circuits, vol.50, no.1, pp. 137--149, January 2015. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2372034
[^169]: B.Vigraham, J.Kuppambatti, and P.R. Kinget, ''Switched-mode operational amplifiers and their application to continuous-time filters in nanoscale cmos,'' IEEE Journal of Solid-State Circuits, vol.49, no.12, pp. 2758--2772, December 2014. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2354641
[^170]: Y.Tsividis, ''Event-driven data acquisition and continuous-time digital signal processing,'' in IEEE Proceedings of the Custom Integrated Circuits Conference, September 2010, pp. 1--8. [Online]: http://dx.doi.org/10.1109/CICC.2010.5617618
[^171]: I.Lee, D.Sylvester, and D.Blaauw, ''A constant energy-per-cycle ring oscillator over a wide frequency range for wireless sensor nodes,'' IEEE Journal of Solid-State Circuits, vol.51, no.3, pp. 697--711, March 2016. [Online]: http://dx.doi.org/10.1109/JSSC.2016.2517133
[^172]: B.Drost, M.Talegaonkar, and P.K. Hanumolu, ''Analog filter design using ring oscillator integrators,'' IEEE Journal of Solid-State Circuits, vol.47, no.12, pp. 3120--3129, December 2012. [Online]: http://dx.doi.org/10.1109/JSSC.2012.2225738
[^173]: V.Unnikrishnan and M.Vesterbacka, ''Time-mode analog-to-digital conversion using standard cells,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.61, no.12, pp. 3348--3357, December 2014. [Online]: http://dx.doi.org/10.1109/TCSI.2014.2340551
[^174]: K.Yang, D.Blaauw, and D.Sylvester, ''An all-digital edge racing true random number generator robust against pvt variations,'' IEEE Journal of Solid-State Circuits, vol.51, no.4, pp. 1022--1031, April 2016. [Online]: http://dx.doi.org/10.1109/JSSC.2016.2519383
[^175]: ibitem0.5V-CircuitS.Chatterjee, Y.Tsividis, and P.Kinget, ''0.5-v analog circuit techniques and their application in ota and filter design,'' IEEE Journal of Solid-State Circuits, vol.40, no.12, pp. 2373--2387, December 2005. [Online]: http://dx.doi.org/10.1109/JSSC.2005.856280
[^176]: M.Alioto, ''Understanding dc behavior of subthreshold cmos logic through closed-form analysis,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.57, no.7, pp. 1597--1607, July 2010. [Online]: http://dx.doi.org/10.1109/TCSI.2009.2034233
[^177]: A.Hajimiri and T.Lee, ''A general theory of phase noise in electrical oscillators,'' IEEE Journal of Solid-State Circuits, vol.33, no.2, pp. 179--194, February 1998. [Online]: http://dx.doi.org/10.1109/4.658619
[^178]: A.Demir, A.Mehrotra, and J.Roychowdhury, ''Phase noise in oscillators: a unifying theory and numerical methods for characterization,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.47, no.5, pp. 655--674, May 2000. [Online]: http://dx.doi.org/10.1109/81.847872
[^179]: A.Hajimiri, S.Limotyrakis, and T.Lee, ''Phase noise in multi-gigahertz cmos ring oscillators,'' in IEEE Proceedings of the Custom Integrated Circuits Conference, May 1998, pp. 49--52. [Online]: http://dx.doi.org/10.1109/CICC.1998.694905
[^180]: W.Jiang, V.Hokhikyan, H.Chandrakumar, V.Karkare, and D.Markovic, ''A ±50mv linear-input-range vco-based neural-recording front-end with digital nonlinearity correction,'' in IEEE Proceedings of the International Solid-State Circuits Conference, January 2016, pp. 484--485. [Online]: http://dx.doi.org/10.1109/ISSCC.2016.7418118
[^181]: C.Weltin-Wu and Y.Tsividis, ''An event-driven clockless level-crossing adc with signal-dependent adaptive resolution,'' IEEE Journal of Solid-State Circuits, vol.48, no.9, pp. 2180--2190, September 2013. [Online]: http://dx.doi.org/10.1109/JSSC.2013.2262738
[^182]: H.Y. Yang and R.Sarpeshkar, ''A bio-inspired ultra-energy-efficient analog-to-digital converter for biomedical applications,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.53, no.11, pp. 2349--2356, November 2006. [Online]: http://dx.doi.org/10.1109/TCSI.2006.884463
[^183]: F.Corradi and G.Indiveri, ''A neuromorphic event-based neural recording system for smart brain-machine-interfaces,'' IEEE Transactions on Biomedical Circuits and Systems, vol.9, no.5, pp. 699--709, October 2015. [Online]: http://dx.doi.org/10.1109/TBCAS.2015.2479256
[^184]: K.A. Ng and Y.P. Xu, ''A compact, low input capacitance neural recording amplifier,'' IEEE Transactions on Biomedical Circuits and Systems, vol.7, no.5, pp. 610--620, October 2013. [Online]: http://dx.doi.org/10.1109/TBCAS.2013.2280066
[^185]: J.Agustin and M.Lopez-Vallejo, ''An in-depth analysis of ring oscillators: Exploiting their configurable duty-cycle,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.62, no.10, pp. 2485--2494, October 2015. [Online]: http://dx.doi.org/10.1109/TCSI.2015.2476300
[^186]: K.Ng and Y.P. Xu, ''A compact, low input capacitance neural recording amplifier,'' IEEE Transactions on Biomedical Circuits and Systems, vol.7, no.5, pp. 610--620, October 2013. [Online]: http://dx.doi.org/10.1109/TBCAS.2013.2280066
[^187]: M.Elia, L.B. Leene, and T.G. Constandinou, ''Continuous-time micropower interface for neural recording applications,'' in IEEE Proceedings of the International Symposium on Circuits and Systems, May 2016.
[^188]: Y.W. Li, K.L. Shepard, and Y.P. Tsividis, ''A continuous-time programmable digital fir filter,'' IEEE Journal of Solid-State Circuits, vol.41, no.11, pp. 2512--2520, November 2006. [Online]: http://dx.doi.org/10.1109/JSSC.2006.883314
[^189]: B.Schell and Y.Tsividis, ''A continuous-time adc/dsp/dac system with no clock and with activity-dependent power dissipation,'' IEEE Journal of Solid-State Circuits, vol.43, no.11, pp. 2472--2481, November 2008. [Online]: http://dx.doi.org/10.1109/JSSC.2008.2005456
[^190]: S.Aouini, K.Chuai, and G.W. Roberts, ''Anti-imaging time-mode filter design using a pll structure with transfer function dft,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.59, no.1, pp. 66--79, January 2012. [Online]: http://dx.doi.org/10.1109/TCSI.2011.2161411
[^191]: X.Xing and G.G.E. Gielen, ''A 42 fj/step-fom two-step vco-based delta-sigma adc in 40 nm cmos,'' IEEE Journal of Solid-State Circuits, vol.50, no.3, pp. 714--723, March 2015. [Online]: http://dx.doi.org/10.1109/JSSC.2015.2393814
[^192]: K.Reddy, S.Rao, R.Inti, B.Young, A.Elshazly, M.Talegaonkar, and P.K. Hanumolu, ''A 16-mw 78-db sndr 10-mhz bw ct $\delta \sigma$ adc using residue-cancelling vco-based quantizer,'' IEEE Journal of Solid-State Circuits, vol.47, no.12, pp. 2916--2927, December 2012. [Online]: http://dx.doi.org/10.1109/JSSC.2012.2218062
[^193]: J.Daniels, W.Dehaene, M.S.J. Steyaert, and A.Wiesbauer, ''A/d conversion using asynchronous delta-sigma modulation and time-to-digital conversion,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.57, no.9, pp. 2404--2412, September 2010. [Online]: http://dx.doi.org/10.1109/TCSI.2010.2043169
[^194]: F.M. Yaul and A.P. Chandrakasan, ''A sub-$\mu$w 36nv/$√Hz$ chopper amplifier for sensors using a noise-efficient inverter-based 0.2v-supply input stage,'' in IEEE Proceedings of the International Solid-State Circuits Conference, January 2016, pp. 94--95. [Online]: http://dx.doi.org/10.1109/ISSCC.2016.7417923
[^195]: S.Patil, A.Ratiu, D.Morche, and Y.Tsividis, ''A 3-10 fj/conv-step error-shaping alias-free continuous-time adc,'' IEEE Journal of Solid-State Circuits, vol.51, no.4, pp. 908--918, April 2016. [Online]: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7433385&isnumber=7446371
[^196]: J.M. Duarte-Carvajalino and G.Sapiro, ''Learning to sense sparse signals: Simultaneous sensing matrix and sparsifying dictionary optimization,'' IEEE Transactions on Image Processing, vol.18, no.7, pp. 1395--1408, July 2009. [Online]: http://dx.doi.org/10.1109/TIP.2009.2022459

View File

@ -0,0 +1,561 @@
---
title: "Brain machine interfaces: Time Domain Techniques"
date: 2016-08-08T15:26:46+01:00
draft: false
toc: true
math: true
type: posts
tags:
- chapter
- thesis
- CMOS
- biomedical
---
Lieuwe B. Leene, Yan Liu, Timothy G. Constandinou
Department of Electrical and Electronic Engineering, Imperial College London, SW7 2BT, UK
Centre for Bio-Inspired Technology, Institute of Biomedical Engineering, Imperial College London, SW7 2AZ, UK
# 43 Time Domain Techniques
Thus far our work has detailed numerous design techniques that extend on contemporary work where the classical analogue approach with digital processing has demonstrated its capabilities. However we have also analytically shown that although we can still strive to improve area and efficiency, there are a number of factors that prevent making significant progress in terms of improving system characteristics. Moreover there is a strict need for more efficient computational processing that appears overwhelming if it is made robust and adaptive. If we keep the current processing methodology this component can only be made viable with smaller technologies and voltage scaling that can substantially diminish the performance of analogue operations. Here we will attempt to address the two factors that have the most significant impact on improving sensing electronics based on the observations made in the foregoing discussions to consolidate this work. The first is introducing all-digital instrumentation that is not diminished by technology related scaling and the characteristics of nano-meter transistors. The second objective is developing a mixed signal topology for analogue to information conversion where feature extraction is performed adaptively in the analogue domain.
This chapter will focus on exploring the emerging time domain processing modality in order leverage increased digital performance associated with modern CMOS processes. In fact this motivation is carefully addressed in the literature[^169] where logic-gate based topologies demonstrate better scalability with respect to linearity and bandwidth. Here we will demonstrate how the fundamental limits of noise efficiency can be approached by proposing several topologies and design techniques. Further we will elaborate on the characteristic relations between analogue performance and resource requirements that enable these structures. The organization of this chapter is as follows. Section 44 will introduce the essential design considerations for continuous time-domain circuits by considering the phase characteristics of oscillators in relation to driving transistors that are used for analogue feedback. This structure will be used to implement both amplifying and filtering structures to compose the instrumentation front end. This is followed by Section 48 where we propose a mixed signal topology for analogue domain classification.
# 44 Principles for time domain processing
There are two driving factors to approaching time domain concepts where signals are represented in terms of delays between pulse edges or phase components in oscillators. The first benifit is the inherent digital operation where continuous valued signals are represented by digital events with respect to a global or local reference [^170]. This implies that the typical analogue processing has the same power scaling and advantages as the digital processing in terms of technology parameters. This allows oscillator structures to approach very efficient operation irrespective of the oscillation frequency or supply voltage [^171]. The second is that many operations are not restrained by non-linearity from individual transistors giving way to ideal integrators and other operators [^172]. The overall result is that even with limited power budgets the topologies have an overwhelming excess in bandwidth where performance can scale with digital gate delay or its switching energy. The abundance of digital operations for such systems allows these topologies have the potential for digital synthesis using standard cells and a digital design flow to directly process analogue signals [^173]. Moreover event based representation of continuous valued signals allows for often a surprisingly efficient implementation with reduced complexity for a variety of elementary operations. For example [^174] presents a clock-less PVT invariant true random number generator based on the collapse of a ring oscillator structure.
{{< figure src="technical_3/BW-VDD.pdf}}" width="500" >}}
{{< figure src="technical_3/mLP.pdf}}" title="Figure 68: Voltage supply relationship with respect to the bandwidth and linearity requirements with respect to different technologies " width="500" >}}
Let us elaborate on the notion of scaling analogue with digital characteristics quantitatively. Figure 68 illustrates the drawback of conventional analogue techniques from first principles by looking more closely at the voltage scaling characteristics. Here transitioning to nanometre technologies gives us the capability of reducing our voltage supply because the desired bandwidth can be achieved with a smaller inversion coefficient or equivalent gate voltage. However the transconductance and consequently linearity and noise efficiency can degrade as the drain voltage is reduced. This dependency is because transistor gain requires a large channel resistance which is a function of \\((1-e^{-V_{DS}/U_T})\\) in addition to any DIBL which introduce a asymptotic limit where the former is in fact not process dependent [^176]. This limits the output swing \\(V_{max}\\) with a overhead that is \\(5 U_T\\) [^119]. Figure 68 b) demonstrates the resulting the class-A power efficiency measured as \\(V_{max}/V_{DD}\\) to reflect how efficiently we can use the provided voltage supply. This is evaluated in terms of \\(P_{out}/P_{vdd}\\) where \\(P_{out}\\) and \\(P_{vdd}\\) are the output signal power and the power dissipated by the voltage supply respectively. We can conclude that conventional low noise amplification structures can no longer benefit from technology scaling unless we adopt topologies that do not rely on amplification in the voltage domain. Because the input referred noise of a circuit relies only on its current dissipation scaling supply voltages remains a viable means to reduce power if time-domain structures can mitigate the need for voltage gain.
## 45 Sub-threshold Ring Oscillators
The understanding and the interpretation of principle elements for a given modality has the most influential impact on how well it can be utilised. Moreover encoding signals in the time domain will influence how flexibly certain objectives are approached either using digital operations or analogue feedback. Here we will review some basic understanding for ring oscillator structures that are biased in weak inversion. This component will provide a fundamental basis for the topologies proposed here because it connects analogue signals at the input to phase and time domain signals at its output. The interest here specifically lies with using current controlled oscillators that have a well defined linear relationship associated with any injected charge and the resulting shift in output phase. This will lead to the small signal transfer function that relates to the biasing current of the oscillator. Moreover we must be able to evaluate the different components of phase noise and refer it back to the input of transconductive element because we will apply this structure to instrumentation.
$$ V_{out} (t) = A(t) \cdot f\left[ \omega_0 t + \phi(t) \right] $$
A generalized time dependent model for an oscillator is represented by Equation 37 where \\(A\\) and \\(\phi\\) represent the amplitude and phase state variables of the system. \\(f\\) describes the limit cycle of the oscillator over time that maps the steady state output voltage \\(V_{out}\\) as function of phase. The challenge for sub-threshold current biased ring oscillators is that the non-linearity in \\(f\\) is difficult to analytically predict without well informed priori. This is of significance as it will determine how noise sources perturb coupled to the output phase state.
{{< figure src="technical_3/impulse.png" title="Figure 69: " width="500" >}}
Many principle aspects of phase dependencies in oscillators have been well described in a generalized form using numerical methods [^178] and by approximation [^177]. The underlying characteristics however are illustrated in Figure 69 where charge perturbations integrate on to the phase of state the oscillator with respect to the impulse sensitivity function (ISF) \\(\Gamma(x)\\). This factor is a cyclo stationary function that describes how the coupling changes as a function of the phase state \\(\phi\\) subject to the source of perturbation. Moreover this allows us to predict the accumulated phase noise due to a time varying process according to Equation 38.
$$ \phi (t) = \int_{-\infty}^{\infty} h_{\phi}(t,\tau) i(\tau) \: d\tau = \int_{-\infty}^{t} \Gamma(\omega_0 \tau) i(\tau) \: d\tau $$
The integral dependency on accumulated phase is what leads to the infinite open loop gain for oscillator based amplifiers. This also implies that any white noise source that is incoherent with the oscillator fundamental frequency will translate to the output phase as \\(\Gamma_{rms}\\). This depends on the assertion that incoherence implies uncorrelated which is subject to the beat frequencies of the two sources. In practical cases this is a fair approximation not only because the oscillator frequency drifts freely but also because we explicitly consider closed loop implementations that aggressively shape in band perturbations. The utility of Equation 38 lies with its ability to predict the single-side band output noise spectrum due to a white noise current source with spectral density $i_n^2 / \Delta f$ with carrier off-set frequency \\(f_{off}\\) according to Equation 39 [^179].
$$ L(f_{off}) = \left( \frac{\Gamma_{rms}}{2\pi f_{off}} \right)^2 \cdot \frac{i_n^2 / \Delta f}{2} $$
The \\(N\\) stage ring oscillator structure of interest is illustrated in Figure 70. Opposed to voltage controlled structures this configuration is current biased and the oscillating ring is isolated from the supplies. Here the oscillating frequency in sub-threshold operation can be approximated as $f_0 = I_B / (N C_{gate} V_{RS})$ where \\(I_B\\) is the biasing current and \\(C_{gate}\\) is the input capacitance of the delay element. \\(V_{RS}\\) is the voltage across the oscillating structure and is evaluated using Equation 40 where \\(V_{th}\\) is the transistor threshold voltage. In this case \\(M_2\\) provides a biasing current from the PMOS side and \\(M_1\\) if designed appropriately will allow isolation from the ground supply. This is particularly useful in differential configurations where capacitance on a common \\(V_R\\) or \\(V_S\\) can minimize high frequency noise from coupling directly to the differential phase component through the common mode feedback.
$$ V_{RS} = V_{th} + \eta U_T \ln \left( \frac{2 I_B}{2\eta U_T^2 \mu C_{ox}} \frac{L}{W} \right) $$%
{{< figure src="technical_3/schematic_RO.pdf" title="Figure 70: Schematic of current regulated ring oscillator with capacitively couple noise source. " width="500" >}}
The defining characteristic of the current biased oscillator is that the conduction of the NMOS and PMOS devices in each delay element is strictly non-overlapping. This is different when compared to oscillators biased in strong inversion and implies maximum current efficiency in a large signal sense. In addition it leads to the respective NMOS and PMOS ISF being non-negative. Thus the focus should lie with optimizing its rms value by balancing pull-up pull-down conductance. In fact we can empirically demonstrate that despite the intricacies of non-linear phenomena a current starved ring oscillator presents a significantly superior noise excess factor when compared to that of a transistor biased with the same weak inversion conditions due to change being retained in the high impedance nodes.
{{< figure src="technical_3/state_variables.pdf}" width="500" >}}
{{< figure src="technical_3/ISF_bias_nmos_pmos.pdf}" title="Figure 71: Simulation results outlining the dependency of parameter dynamics as a function of oscillator phase" width="500" >}}
Figure 71 exemplifies the challenge of being able to predict internal parameter dependency analytically. Specifically in \textbf{a)} where the NMOS and PMOS of a single delay slice is evaluated both the saturation and linear conduction phases contribute towards accumulated phase noise. It is indicative to note that the bias transistor has a near uniform ISF equal to \\(2\pi/q_{max}\\) independent of phase state as expected from the linear phase to charge relation. Here \\(q_{max}\\) simply represents the total charge dissipated by the ring oscilator each cycle which is \\(2N V_{RS} C_{gate}\\). In particular this phase independent sensitivity is surprisingly independent of oscillator configuration in terms of number of stages and delay cell input capacitance. Instead the characteristic relies on the capacitance and channel resistance seen at the drain of M2 such that increasing impedance improves linearity.
When the aggregate contribution of all delay elements is taken into account as well as the increased noise excess factor in the linear region the ISF in \textbf{b)} appears predictable when normalized to that of M2. One may expect that the aggregate ISF of the ring oscillator to exceed the sensitivity to that of M2 as its contribution should have a similar profile and more noisy elements are involved. However the soft-switching of each delay element filters out a significant component of injected noise in addition to the fact that the nodes \\(V_R\\) and \\(V_S\\) retain accumulated current noise that feedback on the following stage.
Insight to optimizing the oscillator consideration is drawn from considering the lossy integration phases on \\(V_X\\). Specifically as the transistors M1 and M2 present high impedance when considering the injection of charge or integration of a noisy current. We can infer that resulting voltage fluctuations are either one of two cases; coupled to \\(V_R\\) or \\(V_S\\) through a transistor in the linear region, or coupled to the switching capacitance during a transition. Rejection of the former will rely on increasing \\(q_{max}\\) and minimizing coupling factors as the ISF is equivalent to that of the bias current.
{{< figure src="technical_3/ISF_M2_compensate.pdf}" width="500" >}}
{{< figure src="technical_3/ISF_M2_injct.pdf}" title="Figure 72: The compensation effect of M1 on the ISF for capacitively couple noise sources with reference to Figure 70" width="500" >}}
It is well known that the dominant factor of noise in ring oscillators comes from supply variations that are capacitively coupled as illustrated in Figure 70. This represents the coupling expected from substrate noise and supply noise that is not generated by the transistors them selves. The impact of introducing M1 opposed to grounding \\(V_T\\) is shown by Figure 72 with a dramatic improvement in ISF characteristics. Moreover large drain resistance of M1 allows the peak to peak ISF to be adjusted by exploiting the dynamics previously discussed. On that note it is important to realize that unlike Gm-C differential implementations the rejection of common mode signals is not present due to the coupling dependency of on phase. The matching/minimization of these factors can still allow a considerable improvement towards performance in practice but the process of optimization is challenging due to the fact that these components can not be well predicted as a priori. More generally incoherent perturbations in differential implementations will scale with \\((\Gamma_{rms}-\Gamma_{dc})^2\\).
It may be obvious that there no high impedance analogue nodes in this configuration that could introduce undesirable poles. But more importantly we do not need to provide extra voltage headroom or a second gain stage to let our output signal vary with maximum amplitude. In this case the oscillator mostly reuses the V<sub>SR<sub> voltage headroom. This raises an interesting question; what limits the required voltage headroom for this circuit? Typically the complementary structure necessitates that the source drain voltage of the current bias transistors and differential pairs is sufficient to provide good channel resistance. However there is another component with regard to the noise generated by the oscillator that should be considered in terms of the oscillator voltage overhead V<sub>RS<sub>. This leads us to evaluate the dependency on sampling noise with respect to the loading capacitor of each delay cell. Considering that the scaling the technology can result in a higher oscillator frequencies and equivalently using a small loading capacitance for the same power budget. It is important to realize that it is a charge induced as sampling noise on each capacitor before each up/down transition as residue from the previous cycle. This sampling noise can be referred to the input of M2 which leads to the expression in Equation 41.
$$ v^2_{smp} = \frac{1}{Gm^2_{M2}} \cdot \underbrace{2N f^2_{osc} kT C_{gate}}_{Noise \: power} $$
As Equation 41 suggests this noisy charge injection occurs for every transition in a delay element which is \\(2N\\) times per period. When we expand this expression in terms of the oscillator power dissipation we can show its underlying dependency in Equation 42.
$$ v^2_{smp} = \frac{4kT}{P_{osc}} \cdot \left( \eta U_T \right)^2 $$
Now it should be clear from Equation 42 that this contribution only depends on the total power dissipation of the oscillator \\(P_{osc}\\). This profound result confirms that without considering band-limiting factors all transistor generated noise densities are in fact independent of the frequency or total capacitance when referred to the gate of the biasing transistor. Following our expectation is that the dominant factor for noise is the total biasing current of the structure which is fundamentally identical to that of an conventional amplifier.
## 46 Time Domain Sensor interface
A principle element to these systems is associated with achieving effective conversion from continuous analogue signals to time encoded binary signals without distortion or excess signal corruption. It is typical to see the removal of VCO non-linearity though LMS post-processing[^180] however this level of in-channel DSP can also be avoided through feedback utilizing the linearity of passive components. Our endeavour here lies with applying the discussion and topology selection in Section 23 to VCO based structures that follow closely to our optimization methodology. We suggest thinking of the oscillator's phase as an analogue memory that represents the state variable of the system which we can freely adjust by injecting charge.
This approach is different from that currently seen in the literature for time-domain based instrumentation of low frequency signals. The time domain encoding concept is predominantly used in asynchronous ADCs that aim to avoid quantization noise from being introduced [^181][^182]. There is some motivation here to approach a neuromorphic amplifier topology that generates tokens with time-domain events that encode the input signal intensity [^183]. Many of these structures leverage signal dependent power dissipation that reduces as the input signal varies more slowly. However they are typically open-loop topologies to avoid a complicated feedback DAC where events are generated upon asynchronous level crossings that reset internal integration nodes or toggle the reference voltages. Linearity and dynamic range can become difficult to achieve while maintaining aggressive power efficiency because resetting integrators or changing references are large signal discontinuities.
{{< figure src="technical_3/LNTI.pdf}" width="500" >}}
{{< figure src="technical_3/TDFB.pdf}" title="Figure 73: Time domain instrumentation topology for low noise voltage to time-domain conversion." width="500" >}}
The proposed implementation shown in Figure 73. This structure opts for a direct conversion of analogue to phase domain signals by relying on the integration to filter out oscillator harmonics present in the feedback signal. Abstractly the topology is seen as a ideal integrator with integration factor \\(\frac{Gm}{q_{max}}\\) proceeded with a non-linear element that introduces spurs around N times the oscillator frequency when feeding back. Here N is the number of taps in the ring oscillator used to simultaneously evaluate the phase difference of the differential structure. This allows us to freely adjust N for improving \\(\Gamma\\) through increasing \\(q_{max}\\) without sacrificing the ability to suppress the harmonics. Since the signals at the output of the phase frequency detector represent the phase difference between the two oscillators is full scale. The capacitive network need to scale down by a relatively large factor to assure \\(V_{x}\\) does not exceed the linear range of the transconductor and is implemented using a capacitance area reduction technique [^184]. When the closed loop gain is large however this concern can be dismissed since the quantization levels scale with $\frac{V_{DD}}{A_{cl} N}$ which will typically be the same order of magnitude as the input signal.
While we are free to adjust the transconductance for noise requirements there is a limitation to the increase the complexity resulting from the capacitive feedback DAC and parallel digital phase processing. Because digital power dissipation scales with $N f_{osc}$ which is bounded by \\(I_B\\) it is independent of \\(N\\) for a fixed capacitive load in the oscillator delay cell. In fact increasing \\(N\\) reduces the total power of the oscillator harmonics as we effectively increase the number of quantization levels. This can be seen at the output of the capacitive DAC but this aspect will not be evident with respect to the processing performed in the time domain.
Note that when using ring oscillators with large number of stages in order to reduce leakage and non-linearities in the limit cycle to some extent we can retain a small factor of \\(N\\) by sub-sampling the output taps of the structure. This does require an integer ratio between the total number of stages and \\(N\\) in order to position the harmonics beyond \\(N f_{osc}\\). Also consider that relation between the phases of the oscillator will imply a specific frequency shaping and harmonic modulation at high frequencies [^185].
The primary design criteria for the phase detector structure and it respective time domain encoding should be related to maximizing power-bandwidth efficiency of digital cells. This is because the time-domain characteristics of the detector could introduce a inverse relation with regard to signal level and required logic gate bandwidth. Using conventional \\(1.5 b\\) encoding with up/down signals for example would give rise to this unwanted discontinuity. This is because the encoding scheme will generate narrower pulses for smaller signals that require exceedingly more bandwidth to process and feed into the time domain memory. It is conceivable that if this bandwidth is insufficient a dead-zone is introduced that is characteristically similar to class-B amplifiers.
Using a single bit representation that results from a XOR phase detector inverts this problem such that for small error the minimum bandwidth is required that successively increases as the loop error increases. In extension any asymmetric switching & delays in driving the capacitive feedback that is expected from process variation exacerbates any capacitive mismatch in the different phases of the feedback additively. These components primarily excite distortion on the output depending on the ratio of gate delay to oscillation period. Here Chopping the input will remove off-set and mismatch related components to a certain extent by up modulating them.
The motivation for using the single stage structure or allocating all the gain to the first stage is also associated with how the supply noise couples to the signal. In this respect we suggest that this structure should be thought of equivalent to that of a ADC. Particularly with respect to the digital feedback where providing asymmetric feedback implies that supply noise coupling can not be cancelled out. In addition capacitive mismatch between the positive and negative branches will also contribute to supply noise coupling. As since supply noise sources couple to the output of the amplifier while providing the maximum closed loop gain minimizes the input referred component. It should be noted that this type of supply sensitivity and capacitive mismatch is equivalent to that found in analogue to digital converters hence this drawback is only with reference to an all analogue solution. Further more once our signal has been encoded in the time domain which we expect to exhibit improved resilience to supply noise because its influence is proportional to the gate delay of the technology used.
{{< figure src="technical_3/schematic_TDI.pdf}" width="500" >}}
{{< figure src="technical_3/schematic_PR.pdf}" title="Figure 74: Transistor level implementation of the phase domain integrator structure with phase detector feedback." width="500" >}}
The schematic implementation of the VCO is show in Figure 74 which is derived from the complementary amplifier structure used in prior work. The fact that both ring oscillators are isolated from the supplies and floating in the middle of the rails presents an improved ISF as well as assuring the buffer that amplifies the clock phases to the full scale is guaranteed to be centred around the switching point of a balanced inverter. The most crucial component for effective operation however lies with the sizing of the input NMOS M2 with respect to loading ring oscillator. The DC operating point M2 and R1 will present an load equivalent to that of a diode connected transistor. If the delay element is balanced the current bias of the oscillator is evaluated with \\(K_{M2}\\) and \\(K_{N}\\) representing the \\(W/L\\) ratio of transistors M2 and the NMOS in the delay cell respectively.
{{< figure src="technical_3/DIG2.pdf" title="Figure 75: Simulated transient behaviour of the differential oscillator and the generated digital output. " width="500" >}}
Figure 75 clarifies the principle operation of this topology. We can see that as two currents are being integrated on the differential oscillator a phase shift will start to emerge when the two waveforms are compared. This phase difference on node $\Delta \phi$ represents our system output where the signal is encoded in the pulse width of the digital signal. This signal is applied to the capacitor array for feedback.
$$ f_{osc} \approx \frac{\alpha I_{M1}}{N C_{gate} V_{th}} \text{where} \alpha = \frac{K_{M2}}{2 K_{N}} $$
The factor \\(\alpha\\) in Equation 43 dominates the noise performance when referred to the input which would ideally approach the \\(NEF\\) of that without the oscillator. Similarly the corner frequency of the oscillator flicker noise which is not rejected by the chopper scales with this factor. It follows that the transistor length of the oscillator has a strong relation with respect to $f_{cor} \propto 1/L^{2}$. Fortunately it is easy to diminish this contribution as only a small bias is needed to result in a oscillation frequency several orders outside the signal bandwidth.
$$ H_{sys}(s) \approx \frac{\eta f_{osc} }{s U_T} \cdot \frac{2-\alpha}{\alpha} N and f \approx \frac{C_I}{C_D} \cdot (N+2) $$
The overall open loop system characteristics \\(H_{sys}\\) evaluated in Equation 44. This reflects the single pole nature of the topology that scales with the oscillator frequency and the number of phases taped out as one may expect. Notably the capacitive feedback structure used can represent a very small feedback factor \\(1/f\\) without excess input capacitance that accommodates a large number of oscillator taps [^186]. Evaluating the low pass 3dB point of the system which reveals a dependency as shown in Equation 45.
$$ f_{3dB} = \frac{\eta f_{osc} }{\alpha U_T} \cdot \frac{N}{N+2} \cdot \frac{C_D}{C_I} $$
This expression is primarily dominated by the oscillator frequency which even for a small bias current can result in a considerable bandwidth. Although this is partially expected due to the fact there is no explicit load capacitance it also illustrates the benefits in FOM that can be achieved with this configuration of current-mode time domain architecture. There is a instinctive concern for the stability of the system as a result of the excessive bandwidth driven by maximizing efficiency. The non dominant poles introduced in the voltage domain is due to the parasitic capacitance on node \\(V_Q\\) typicall will not compromise stability due to coupling to the input of the transconductance at higher frequencies. The non-dominant pole on the time domain is introduced by any delay \\(t_d\\) from the VCO to the output buffers of the PFD as $e^{-j\omega t_d}$. This component can be more restrictive for small loop gain as it does not scale with the power of the input stage but with the supply voltage.
The voltage requirement of this structure is improved by biasing the NWell of the PMOS peudo differential pair to \\(V_{XN}\\) & \\(V_{XP}\\) in a cross coupled fashion to reject the differential lading component. The forward biasing reduces the threshold voltage of the devices allowing a supply voltage down to \\(0.6 V\\) without any considerable impact from leakage currents. This configuration also implies that the common mode at \\(V_X\\) is well regulated by the body transconductance of M4 & M5 rejecting common mode input fluctuation. The main voltage requirement actually comes from the switches of the chopper that feeds the ring oscillator that need good on-resistance to prevent noise injection which implies a minimum voltage of approximately \\(2V_{th}+V_{ov}\\). Back-gate biasing will allow us to reduce the impact of \\(V_{th}\\).
The psuedo-resistive feedback structure in Figure 74 b) extracts the signal component from up modulated aggressors using a current DAC which is resistively coupled to the input to close the loop. This allows us to feed back the full swing digital signals to cancel a DC off-set and sets the input common mode by matching the cross coupled transistor with the input pair. This primarily prevents having to use a cascaded resistor structure in order to deal with the large voltage swing on the output that can significantly degrade performance. While stability is trivialized by the capacitive feed forward signal that grantees stability [^187], it is important to note the design choice associated with the two poles in this feedback loop. One pole lies at the input of the complementary pair associated with \\(C_{fb}\\) and the other is at the gate of the cross-coupled pair \\(C_{x}\\).
$$ \tau_{hp1}(s) \approx C_{fb} \cdot R_{psudo} \text{and} \tau_{hp2}(s) \approx C_{x} \cdot \frac{\eta U_{T}}{I_{M1}} \cdot \frac{W_{M6} + W_{M7}}{W_{M6} - W_{M7}} $$
Equation 46 described the dependency of the two time constants in addition to the capacitive feedback. The reduction in capacitance of the feedback network implies that the high pass filter needs careful design in terms of the resulting pole location as the noise expected from the psuedo resistor will appear increasingly wider band as we try to reduce the total capacitance. Here we allow the second pole of \\(C_{x}\\) to approach DC by having $W_{M6} \approx W_{M7}$ resulting in a integration node. This means that the noise around the chopper frequency is strongly related to amount of capacitance we can allocate to \\(C_{x}\\) and the 1/f agressors are now shaped by the VCO integrator and this capacitance. The bias of \\(I_{M1}\\) in the current DAC should be adjusted to set the pole location close to but smaller than the chopper fundamental similar to the conventional design approach.
When we compare this structure to the conventional topology we realize a number of significant advantages. Primarily the inversion coefficient of the transistors is not bound like in a complementary input stage where the \\(V_{GS}\\) voltage for both the NMOS and PMOS has to be sufficiently large to allow the drain voltage to fluctuate by several \\(100 mV\\). This is particularly significant because the minimum feature size is inverse proportional to the optimal inversion coefficient confirming again that conventional means to not work at nano meter technologies. Here the threshold voltage can be arbitrarily small and we still retain a topology that is independent of supply voltage in the sense that it is strictly current biased. This will lead to improving the tolerance towards wafer level variations of the threshold voltage and carrier mobility which many sub \\(1 V\\) structures do not have. Similarly this implies class-A type power dissipation that minimizes switching current seen at the analogue supplies.
The excess in bandwidth from the VCO despite operating with a very small inversion coefficient has enabled us to achieve both \\(40-50 dB\\) closed loop gain while still retaining excess loop gain that easily exceeds \\(30 dB\\). This excess loop gain in the signal band is facilitated by the near ideal VCO integration of this topology that shapes a number of external noise sources and nonidealities. In particular technology scaling allows us to minimizes the noise gain due to the input capacitance \\(C_g\\) according to the expression $1 + C_{g}/C_{in}+ N/A_{cl}$ [^63]. Hence the VCO topology can allow a reduction for the input capacitance by a very significant factor relating to an impedance enhancement that scales with technology.
{{< figure src="technical_3/Sim_Inband.pdf}" width="500" >}}
{{< figure src="technical_3/Sim_Outband.pdf}" title="Figure 76: Transient noise simulation result of the 180nm CMOS time domain instrumentation topology with a \\(6 mV\\) peak to peak sine input at \\(1 kHz\\)." width="500" >}}
A Transient noise simulation performance is shown in Figure 76. This demonstrates that the dependency on nonlinearity is mainly due DAC mismatch components which are modulated the oscillator frequency spurs. Secondly the noise-floor and corner frequency characteristics follow closely to analytic predictions. In addition for the same current bias as a conventional implementation the structure can achieve an equivalent noise floor but at a reduced voltage overhead. Noticeably in the full spectrum there is a considerable amount of harmonics out side of the band induced by the chopped and oscillator aggressors. These components will need to be filtered out in order to approach a \\(60 dB\\) signal to noise ratio. Interestingly there is an observable gain in noise floor as we approach the point where there is no excess loop gain. Note that the spurious free dynamic range of this structure almost exceeds that of the structure used in Section 23 by a factor of 10 for the same power budget due to the increase input range.
## 47 Time Domain analogue filter
Now that we have addressed the aspects of achieving low noise and linear instrumentation we must proceed to address the mechanisms for filtering to implement a band limiting characteristic necessary for the processing algorithms. There is some diversity in the number of approaches used to filter time domain signals. Most notably the continuous-time FIR based structure that represents a number low power characteristics that scales well with technology without sampling or clocking [^188][^189]. However similarly to conventional FIR structures it is limited to signals where the frequency dynamic range is small in order to keep the filter order small. Other examples are found in PLL structures that lock using coherent phase domain signals which is inherently second order due to the analogue integration node which result in a loss of noise efficiency at low frequencies [^190]. We do mention that VCO-based ADCs have been very successful in achieving efficient high-order noise shaping [^191][^192].
It is important to realize that our proposed instrumentation topology converges on incoherent phase domain signals and neglects the modulation products through construction similar to that of asynchronous \\(\Delta\Sigma\\) modulators[^193]. Here we will take a similar approach to construct a first order phase domain integrator where the time-domain signals are also incoherent. Through simplicity the structure achieves a significantly better dynamic range and voltage scaling capability than its analogue domain counterpart. The premise will lie with our assumption that the intermodulation products of the incoherent frequencies are sufficiently out of band to allow construction of higher order filter structures.
{{< figure src="technical_3/TD_ABST.pdf" title="Figure 77: Closed loop time-domain analogue filtering structure" width="500" >}}
The topology used for analogue filtering of time-domain signals is illustrated in Figure 77. This is based similarly on the phase difference of two ring oscillators that integrate a switched current which is generated by evaluating the difference in duty cycle with respect to the input and feedback. The logic simply advances or recedes the phase difference of the oscillator when there is an excess or lack pulse width when comparing the two inputs respectively. This behaviour is shown in Figure 78. The use of logic gates over avoids any drawback that arise from limited linearity and mismatch in the case of approaching this design with current mixing techniques. More over the efficiency of these operations allows miniaturized reconfiguration in the digital domain with a minimal analogue structure.
{{< figure src="technical_3/DIG1.pdf" title="Figure 78: Simulated transient behaviour of the differential oscillator and the time encoded digital signals internal to the feedback loop." width="500" >}}
Most design considerations here are similar to that of conventional filters. We expect the analogue in-band noise components will scale with the biasing current \\(I_b\\) which will determine its input referred noise relative to the transconductance element $\Delta I$. This however does represent a fundamental drawback since the charge pump transconductance element does not benefit from the sub-threshold slope gain factor. $\Delta I$ will typically be larger for the same bandwidth requirements by a factor of \\(1/U_T\\). Although this factor is essential for achieving smaller cut off frequencies while maintaining large oscillation frequencies. The decreased noise efficiency is the fundamental drawback of using a digital logic instead the capacitive feedback network. However in this scenario the processed signal will already be at full dynamic range with reduced noise requirements.
{{< figure src="technical_3/TD_SUB.pdf}" width="500" >}}
{{< figure src="technical_3/TD_FLT.pdf}" title="Figure 79: Schematic sub-blocks for first-order time domain analogue filter " width="500" >}}
The gate level implementation is elaborated in Figure 79. The charge pump structure here uses a cascaded current source and dummy load for bandwidth improvement. This configuration is important because this operator precedes the integration and consequently has a substantial influence on off-set or distortion near the cut-off. The self referenced bias of the charge pump through M11-M14 should allow to good matching independent of the configuration in biasing transistors M2-M3. Similarly the noise figure is improved by sharing the drain voltage of M12-M13 as its noise is coupled to the common mode.
{{< figure src="technical_3/gmc_eqv.pdf}" width="500" >}}
{{< figure src="technical_3/TD_FLT2.pdf}" title="Figure 80: Bandpass time-domain analogue filter which is cascaded to realize a 4\\(^{th}\\) order TD-BPF." width="500" >}}
The two digital components in Figure 79(a) represent the subtraction for feedback and a gain factor \\(G\\) when the phases of \\(Q\\) are mapped to the output. The subtraction logic is determined by considering the XOR-PWM waveform as a two state input of $\pm 1$ and similarly the DAC input states which in this case is \\(+1/0/-1\\). The configuration of both components should compliment each other. This is exemplified when we consider the summing node for another case when two integrators are cascaded with both outputs fed back to achieve a bandpass response. This configuration is shown in Figure 80. The boolean operation required is shown in Figure \ref{fig:T3_logic} where four levels are needed to include carry signals. Here we compromise with two DAC structures with input states \\(\pm 1\\) and \\(+2/0/-2\\).
\begin{karnaughmap}
\centering
\karnaughmap{3}{\\(\boldmath{F}=D-(Q+X)\\) }{D Q X}{{1}{-1}{3}{1}{-1}{-3}{1}{-1}}{}
\caption{ Karnaugh map associated with the subtracting XOR type PWM signals \\(Q\\) & \\(X\\) from \\(D\\) }
\label{fig:T3_logic}
\end{karnaughmap}
The logic that sums the different phases relies on the coherence of it input. Because we know the different taps of the ring oscillator will not over lap with respect to certain signal range at the output we may isolate the components with the \\(AND\\) operation of two different phases to isolate small variations in pulse width and combine them. Analogous to a variable gain amplifier, if the signal variations exceed the section where the two phases overlap the output will saturate. The significance here is that gain achieved with this operation has arbitrary gain bandwidth product with negligible power dissipation. Once blocking or interfering signals have been removed we may give the signal reconfigurable gain by only using a handful of gates. It simply relies on increasing the number of oscillator taps from the previous stage while maintaining its feedback configuration which is independent of noise & linearity performance. In fact for a gain \\(G\\) on the PWM signal \\(D\\) we require \\(2G\\) taps that are summed according to Equation 47.
$$ Q = \bigcup_{k=0}^{G-1} \{ D_{R k} \cap D_{R k+N/2-R/2} \} Where R= \floor{\frac{N}{G}} $$
While \\(AND\\) & \\(OR\\) will perform equivalent operations that retain the oscillator phase difference \\(\Phi\\) but subtract a signal independent component which the gate delay between the two phases being operated on from the pulse width. Here \\(\Phi\\) is normalized such that it represents \\(1\\) and \\(0\\) when the oscillator phase difference is \\(\pi\\) and \\(0\\) respectively. This implies that for a positive gate delay number of $\Delta T$ we will have an output off set by $Q=\Phi - \frac{\Delta T}{N}$. If an \\(XOR\\) gate is used we extract a signal independent component as $Q=2 \cdot \frac{\Delta T}{N}$. Both statements will hold true as long as $\Phi < 1-2\Delta T$ which implies that the pulse section used for computation is signal independent. Using this rather simple construction of logic one may sum phases that are one radian apart with a \\(XNOR\\) gate to realize the absolute value operator which exemplifies the rich utility of this time domain processing.
$$ H_{sys}(s) = \frac{ G }{1 + s/p_1} \text{where} p_1 = \frac{N }{q_{tot}} \cdot \Delta I = \frac{ N \omega_{osc} }{ K } $$
When the primitive topology in Figure 77 is analysed in the Laplace domain we can derive the Equation 48. This demonstrates a first order characteristic similar to the amplifier and has is a close relationship with the oscillation frequency and the filter bandwidth with the addition of the gain factor \\(G\\).
This filter configuration here specifically designed for a $0.18 \mu m$ process. Considering that digital filters will become more viable as the technology node decreases it should be acknowledged that the proposed time-domain filter structure will only be advantageous when frequency dynamic range is large and memory is limited. This is primarily because the \\(kT/C\\) relations inhibit very aggressive sizing in the oscillator structure particularly if no excess loop gain is available. We still require a large amount of energy storage \\(q_{max}\\) to prevent external noise sources from perturbing the output. We point out the proposed topology discussed here provides the means by which instrumentation can successfully scale with technology characteristics. Particularly as it is robust towards transistor non-linearity and imperfections. A large component for performance enhancement will rely on calibration components that improve the resilience of the capacitive feedback structure and filter parameters to allow miniaturization. While this specific time-domain topology will not allow the absolute minimum supply voltage this configuration does take advantage of transistor sub-threshold slope which implies a fundamentally superior noise performance that can also circumvent supply noise.
{{< figure src="technical_3/TD_sys_sim.pdf" title="Figure 81: Transient noise simulation of proposed instrumentation amplifier with time-domain filter structure with a \\(6 mV\\) peak to peak sine input at \\(1 kHz\\)}\label{fig:T3_sys_sim" width="500" >}}
{{< figure src="technical_3/65nQ.pdf" title="Figure 82: Transient noise simulation result of the proposed instrumentation topology in 65nm CMOS with \\(2 mV\\) peak to peak sine input at \\(2 kHz\\)}\label{fig:T3_65nm" width="500" >}}
As shown in Figure 81 both a low noise floor and band limiting behaviour is achieved. In particular we see a \\(40 dB\\) roll off in the noise floor at the \\(6 KHz\\) cut-off frequency from the \\(4^{th}\\) order bandpass filter. While Table 11 reveals some similar performance characteristics as the conventional implementation in Section 23 which is the result of using the same optimization strategy. However this work is the first to consider NEF maximization for the design of time-domain circuits. As a result we are much more confident about the power efficiency for this implementation. As a reference this topology was also implemented using a 65nm CMOS process without filtering structure to confirm the scalability of this structure with the noise transient simulation shown in Figure82. The compact capacitive feedback may not allow linearity beyond 60dB but given the scalability and efficiency of this design there is a significant advantage over current state-of-the-art.
Table 11:
| Parameter | Units | \multicolumn{2}{c}{This work} | Chandrakasan [^194] | Tsividis [^195] | Markovic [^180]|
|----|----|----|----|----|----|
| Modality | | Time | Time | Voltage | Time | Time |
| Technology | [nm] | 180 | 65 | 180 | 28 | 40|
| Supply Voltage | [V] | (0.6) | (0.5) | $0.2 \| 0.8$ | (0.65) | (1.2)|
| Total Current | [(\mu)A] | (0.8) | (1.5) | (1.8) | (36.92) | (5.8) |
| Bandwidth | [Hz] | (375)-(6k) | (1)-(6k) | (1)-(1k) | (40M) | (3-1.5k) |
| Filter Order | | IIR (4^{th}) | - | IIR (2^{nd}) | FIR (8^{th}) | - |
| Noise Floor | [(nV/√{Hz}) ] | (69.4) | (57.5) | (36) | (514) | (427) |
| Noise Corner | [Hz] | (<10) | (<10) | (0.5) | - | (<1)|
| SFDR | [dB] | (58) | (54) | (50.4) | (30) | (78) |
| Area | [mm(^2)]| $115 \times 100 ^\star$ | $64 \times 69 ^\star$ | $800 \times 775$ | $72 \times 45$ | $280 \times 360$ |
| NEF | | (1.18) | (1.22) | (2.1) | (8.13) | (12.9) |
| Chopped Input Capacitance | [pF] | 0.04 | 0.03 | 21.5 | 0.01(^\dagger) | - |
This brings us finally to finding a satisfactory answer to how the area-power product figure of merit limited or bounded in some sense. Section \ref{ch:T1_model} argued that linearity and quantization was crucial constraint in the conventional structures which is not the case for the proposed structures. Instead we observe from Equation 41 we must dissipate a certain power level in the oscillator which we know is biased by a fractional current related to the input referred noise through \\(\alpha\\). As a result the current of the oscillator is fixed and thus when the voltage \\(V_T\\) is scaled down this sampling contribution will progressively become larger to the point that is it comparable to that of the thermal noise. This equality reveals the oscillator voltage should not approach $\alpha \cdot 2U_T$ or the NEF efficiency factor will degrade. We can assert that although at first it appeared that the sampling noise limited the minimum size of the instrumentation circuit here is limits the minimum power of the circuit in a much more explicit manner. The area requirement is more simply proportional to \\(K_F/\alpha\\) reflecting the location of the flicker noise corner and the target oscillator frequency. There are some details remain with regard to choosing \\(\alpha\\) which represents one degree of freedom for trading off oscillator area for minimum voltage but is also strongly depedent on the system bandwidth.
# 48 Analogue Signal Classification
Now our interests will be redirected towards the methods and hardware implementation of neural spike analogue classification in order to faithfully demonstrate that continuous time instrumentation can provide a substantial improvement over the conventional approach at the system level. In particular we demonstrate an unsupervised method that will allow the classification of spikes without requiring signal quantization at any stage of the adaptive process with empirical results.
{{< figure src="technical_3/Nsample.pdf}" width="500" >}}
{{< figure src="technical_3/Fsample.pdf}" title="Figure 83: Illustration of Nyquist rate feature extraction and using feature enhancement in order to operate at sub-Nyquist rates. " width="500" >}}
The abstract motivation here is illustrated in Figure 83. Utilizing digitized recordings as basis for feature extraction implies the necessity of operating with excessive data rates in order to capture the full bandwidth of features in the signal such that $f_s > 2 f_{BW}$. By using mechanisms that enhance & extract prominent features directly in the analogue domain this sampling constraint is eliminated [^141]. Instead we may sample at the rate of spikes present in the recording. Even by approximation we can assert that $f_n << f_s$. In some sense this motivation inspired by that of adaptive compressed sensing or sparse representation methods [^196]. Here we will introduce a perspective based on realizing a less generalized method that can be integrated effortlessly which has not yet been attempted in the literature. The challenge for this approach is finding efficient analogue operators that allow direct feature extraction and more importantly feedback mechanisms that adaptively improve the feature extraction process without substantial resource requirements or supervision.
The notion that motivated this specific classification structure is that in order to improve alignment and thereby reducing how noise couples to features relatively high sampling rates are needed. This implies high temporal resolution for these spread-spectrum signals is desirable such an approach but has unavoidable implications on larger memory and power requirements. As we shall see analysing \\(K\\) features for \\(M\\) centroids with mixed signal methods will require \\(KM+K\\) registers and \\(max(K,M)\\) integrators where register depth has logarithmic dependency on temporal resolution. In contrast to PCA, template matching, and many all-digital methods where temporal resolution or window size is linearly proportional to register count. While it is still vital to align the analogue operators with the spike waveform, increased clock rate does not influence the analogue power dissipation since signal quantization is not performed.
It is relatively rare to see analogue or mixed signal implementations for machine learning classifiers due to the convoluted impact circuit imperfections has of learning dynamics [^197]. This makes it difficult to successfully realize more complex architectures but the methodology can significantly increase the information storage density at very low power budgets. This is exemplified by the system in [^198] that not only achieve 1.04 GOPS/mW but also an area efficiency of 0.03 GOPS/mm\\(^2\\) using a 130 nm technology. Here the continuous valued charge on floating gates was used to preserve learned features without quantization of errors.
## 49 Feature Selection
It should be expected that when a spike event it sampled at a relatively high rate we typically only need a select few samples once it is aligned in side a window in order to tell different classes apart. Only when noise becomes a considerable component must we consider multiple samples before we can make an accurate distinction. In such a scenario we would like to use the samples that maximally distinguish classes. That is, we would like to maximize the quality of our feature \\(Q_F\\) by maximizing for instance a simple sum of \\(N\\) maximally separating samples \\(c\\) in the window \\(W\\) with the linear distance operator \\(D\\).
$$ Q_F = \frac{\sum_{i=1}^{N} D(W[c_i]) }{N} + \frac{en_{rms}}{√{N}} + \frac{BG(t)}{N} $$
The expression in Equation 49 primarily tells us that in the presence of white noise \\(en_{rms}\\) and background activity \\(BG\\). Increasing the number of samples reduces the contribution of white noise by \\(√{N}\\). However if the new samples have negligible signal quality our aggregate distance will decrease fractionally by \\(1/(N+1)\\). This suggests that we should avoid using on a large ensemble of samples because it avoids complexity and may very well improve classification accuracy. Implementing an effective analogue solution requires the evaluation of optimal section in the spike window. To analyse this problem let us define a distance to noise ratio \\((DNR)\\) with respect to the mean spikes of \\(M\\) classes and their standard deviation for each sample in the spike window as;
$$ DNR[n] = \frac{\sum_{i=2}^{M} |\mu_{0}[n] - \mu_{i}[n]|}{√{\sum_{k=1}^{M}\sigma_k^2[n]}} $$
Where \\(\mu_0\\), \\(\mu_k\\),\\(\sigma_k\\) are the mean of all spike waveforms, mean of spike waveforms in class \\(k\\) and the aggregate standard deviation of class \\(k\\) respectively. In a more general form of choosing samples that maximize Equation 50 can be seen as classification by expectation maximization[^199]. In Figure 84 we exemplify this quality factor for a number of training data sets along side the first principle component of each set respectively.
{{< figure src="technical_3/S2PcorA.pdf}" width="500" >}}
{{< figure src="technical_3/S2PcorB.pdf}" width="500" >}}
{{< figure src="technical_3/S2PcorC.pdf}" title="Figure 84: Illustration of feature dependency for windowed neural spikes for data sets from \cite[roman]{} with \\(-16/,dB\\) background noise and \\(-20 dB\\) added white Gaussian noise." width="500" >}}
A number of observations can be made. In particular for a variety of spike shapes show PCA peaks in the first moment which are convex with respect the depolarizing and repolarization sections. In contrast to the DNR plot shows multiple local minima and thus optimization in this space is not considered trivial. What should be pointed out is that the peaks in the PCA curve will typically correspond to good DNR points as they will relate to sections of high variance due to maximum class separation on top of the noise variance. It should also be clear that our interest does not lie in the refractory period as the slow-wave component is small in magnitude and typically corrupted by high-pass filters that reject environmental interference. Hence we inherently expect poor DNR in the latent refractory period of the recorded action potential. One of the more significant implications of not quantizing the signal is related to the fact that features can generally not be extracted before the detection event. This implies that the operator used for detection should avoid introducing group delay that could result in completely missing the most energetic features in the spiking event.
In actuality this hints at the contradictory advantage of analogue filter detection over an FIR filter equivalent. That is this group delay exclusive to IIR filters is highly dependent on the frequency content of the spike waveform. This can to some extent be observed in Figure 84 where class 2 is deliberately delayed with respect to the alignment point since it has a smaller derivative or equivalently less high frequency content. Here the alignment is a achieved by conditioning the signal with a narrow bandpass filter and looking for a peak above the threshold. While a relatively simple in implementation, if designed to maximize output signal to noise power it can be quite effective.
The approach to self mixing spike classes to with the sampling or alignment strategy is effective at improving features of already very similar spike shapes. In fact this alignment in some sense captures the features existent before the detection event and is mixed with latent features. As a result although analogue techniques are less effective at demonstrating resilience towards background noise they can efficiently mix different features to improve discrimination between spike shapes.
## 50 Mixed Signal Implementation
The adaptive method employed here is to section the spike waveform depending on the redundancy in the number of features required. Noting that features in the same polarizing phase will correlate strongly since they emerge from the same phenomena. Each section after the detection event is bounded as priori to grantee non-overlapping features (i.e. $0-150 \mu s , 170-310 \mu s$) and the maximum variance is found in each section respectively. These are assumed as optimal DNR features referred to as \\(\Omega\\)s. Our classification will integrate around these points and perform k-means clustering in the resulting space.
{{< figure src="technical_3/TD_SYS.pdf" title="Figure 85: System abstraction showing the configuration for pre-filtering detection and reconfigurable integrators. " width="500" >}}
The topology is summarized in Figure 85 where the predominant active component lies with the state machine reconfiguring the integrators to optimize the classification. The noise rejection performed by three bandpass filters primarily removes out of band aggressors that prevent accurate classification. Notice the feedback loop for detection relies on a long term average of a narrow band component in addition to a indication of a localized peak above the threshold. Filters \\(F1\\) & \\(F2\\) are band pass filters with first order roll-off and a \\(0.4-7 kHz\\) bandwidth where as \\(F3\\) is a narrow band of $2.5-4.5 kHz$ to maximize group delay sensitivity. There is an additional advantage of pre-sectioning the waveform which is that these optimal points can be found independently in sequence. During initialization we seed the starting point in the middle of the section.
{{< figure src="technical_3/TD_VM.pdf" title="Figure 86: Illustration of closed loop control for finding point of maximum variance. " width="500" >}}
The control loop for implementing this method is shown in Figure 86. Here \\(\Omega_{1/2}\\) represents the temporal off-set separated by $20 \mu s$ with respect to the detection event around which the signal is integrated in the analogue domain. The digital integrators primarily average the aggregate statistics to assist long term convergence by tracking the mean value of both integration results. This allows us to compare the deviation from the mean for each spike event and evaluate which has a larger variance. Note a key characteristic is that signal being analysed is band limited to such an extent that we do not expect local minima within each separate section. Since the digital integrators are accumulating boolean results to reject the uncorrelated noise the factor \\(a_{1}\\) represents the register depth of the counter. To be more precise the \\(\Omega\\)s are taken as a first order difference between the detection event and an off-set.
$$ X(t) = \underbrace{\int_{t0+\Omega}^{t0+\Omega+\Delta +MA} Q_{out}(t) dt}_{Integrated \: Signal} - \underbrace{ 2 \cdot \int_{t0}^{t0+\Delta} Q_{out}(t) dt}_{Off-set} $$
This first order difference operation is shown in Equation 51 where \\(t_0\\), \\(\Delta\\), and \\(Q_{out}\\) are the time where the spike even is aligned, static integration time of $20 \mu s$, and analogue signal containing the spike. This clarifies that the moving average \\(MA\\) is mixed with the signal when it evaluates the mean of the \\(\Omega_{1}\\) section. This approach primarily helps with a self referenced gain that allows smaller register depth.
{{< figure src="technical_3/TD_CC.pdf" title="Figure 87: Illustration of closed loop control for tracking two-centroids with the \\(\Omega_{1}\\) feature. " width="500" >}}
After several seconds of training data or equivalent spike count we can assume that the sections that maximize variance have been approached. At this point the \\(\Omega\\)s are fixed and the centroids need to be generated to complete the adaptive process for classification. As illustrated in Figure 87 a similar feedback mechanism is used to adjust the mean centroids based on boolean results. This particular configuration is the adjustment of two centroids based on one feature in the \\(\Omega_1\\) section. By adjusting the centroids \\(MA_{\mu}\\) when it is the closest to the new data point we realize a k-means clustering method with a \\(l_0\\) norm distance operator.
Since we bound the \\(\Omega\\) sections to be strictly non-overlapping the same analogue integrator can be used to evaluate the accumulated error of all features to each centroid. Moreover for a small \\(a_1\\) the centroid adjustment can be time multiplexed leaving a reduced requirement on the total number of integrators required. This implies that \\(K\\) integrators are needed to iterative adjust all the centroids and \\(M\\) integrators are needed to evaluate the distance from each centroid. Because this adaptive process is performed in isolation we may perform the training in phases that updates clusters and features separately we will only need \\(M\\) or \\(K\\) integrators concurrently which ever is more demanding. However \\(KM\\) registers are needed to specify the location of each centroid which should be converted to a time-domain signal by using reconfigurable delay lines.
{{< figure src="technical_3/TD_ADR.pdf" title="Figure 88: Delay line configuration for evaluating the absolute difference between the asynchronous time domain signal \\(D\\) and the registered centroid position \\(X\\). " width="500" >}}
Such a configuration is illustrated in Figure 88 where there is course control by selecting different phases of \\(D\\) and fine control with a conventional multiplexed delay structure. Again the reduction in complexity and rejection of quantization noise by performing time domain computation opposed to the equivalent \\(8b\\) full adder is typical of this processing modality. Finally the question should remain is that how are centroids initialized without requiring quantization. This requirement is avoided by using an iterative method with respect to centroid generation. After having a single centroid converge to the mean of the feature space we iteratively split centroids in two while training similar to that discussed in Section36. We presume redundant clusters will be generated that are removed if supervision is allowed to intervene or in the case that high level control is used to analyse which clusters are significant after several iterations. The results presented here however do not consider this supervision.
## 51 Validation
In order to demonstrate the viability of this approach we will simulate a linearised model that is constructed using Matlab. Here we aim show to what extent unsupervised methods are constrained with respect to classification performance. Original data sets used in Section 33 have been up sampled from \\(24 kS/s\\) to \\(240 kS/s\\) after the band limiting filters to emulate the continuous time logic that will operate at a high clock rate.
{{< figure src="technical_3/feature_clean.pdf}" width="500" >}}
{{< figure src="technical_3/feature_noisy.pdf}" title="Figure 89: comparing PCA and \\(\Omega\\) feature distribution for the Difficult2 data set. Ground truth for spike classes annotated as cyan, maroon, yellow and blue for false positives." width="500" >}}
The feature space resulting from this method is exemplified in Figure 89. Here we compare it to that of the two component PCA feature space since the \\(\Omega\\) represents its approximation. It is typical to see multiple additional clusters form either due to the detection of false positives or miss alignment of a spike class in the presence of noise. Initializing extra clusters can typically retain classification accuracy in noisy environment but degrade precision in pristine conditions.
{{< figure src="technical_3/A05.pdf}" width="500" >}}
{{< figure src="technical_3/B05.pdf}" title="Figure 90: \\(\Omega_2\\) and \\(\Omega_3\\) classification for data sets with \\(-26 dB\\) background activity." width="500" >}}
{{< figure src="technical_3/A01.pdf}" width="500" >}}
{{< figure src="technical_3/B01.pdf}" title="Figure 91: \\(\Omega_2\\) and \\(\Omega_3\\) classification for data sets with \\(-20 dB\\) background activity." width="500" >}}
{{< figure src="technical_3/A02.pdf}" width="500" >}}
{{< figure src="technical_3/B02.pdf}" title="Figure 92: \\(\Omega_2\\) and \\(\Omega_3\\) classification for data sets with \\(-16 dB\\) background activity." width="500" >}}
The results in Figure 91 demonstrate classification accuracy in terms of the percentage of all correctly classified events with respect to the ground truth including false positives and false negatives. This indicates that when a signal to noise ratio exceeds \\(20dB\\) the conditions are quite forgiving towards the simplicity of the algorithm. The two features used here require very little effort to adapt and classify activity. It is important to mention that a fixed filtering configuration is maintained for all test points in order to demonstrate adaptive characteristics.
The results in Figure 92 shows improvement in noisy conditions if the number of sections is increased to 3 implying a three dimensional feature space to improve centroid distance. Using the same algorithm for feature selection the configuration can deal with twice the amount of background noise without supervision. In some sense the fact that the signal is not quantized does not have a significant impact on classification accuracy. This highlights the importance of closed loop algorithms whether resources are constrained or not. As such constructing a convex search space or extracting well reasoned features from underlying phenomena is crucial to reducing in complexity.
{{< figure src="technical_3/CA01.pdf" title="Figure 93: False alarm rates normalized by true positives for the analogue detection." width="500" >}}
Figure 93 shows that these noise levels detection is relatively consistent but not as adaptive as the digital approach. The threshold for detection will favour generating false positives over false negatives. The main point of failure for noise levels beyond that point lies with the inability to perform feature selection based on localized variance maximization. This is partially expected as PCA will similarly perform poorly when noise levels become comparable to the signal.
In a practical case it may be difficult to ascertain if signal to noise level is adequate to trust classification unless there is confidence to do so in the sense that there may be redundancy in the recording taken. However it is significantly more viable for realizing sub $1 \mu W$ neural spike classification for large scale recordings considering the resource requirements for adaptive classification. Given that each integrator consumes less that \\(50 nW\\) in $0.18 \mu m$ CMOS and each structure needs minimal supervision.
Table 12: Overview of Detection & Classification performance in green for data sets from [^1] for different methods. \\(\star\\) Requires off-chip Supervision. \\(\dagger\\) White noise is also added at -20dB of the signal power.
| Method | Analogue | Digital Registers | Cycles / Sample | Data Set | \small{Background} @ -16dB (^\dagger) | \small{Background} @ -20dB (^\dagger) |
|----|----|----|----|----|----|----|
| \multirow{3}{2.5cm}{RVD} | \multirow{3}{2.5cm}{1(\times)ADC} | \multirow{3}{2cm}{83} | \multirow{3}{2.5cm}{172} | Easy 2 | \flcl{0.734} | \flcl{0.842} |
| | | | | Diff. 1 | \flcl{0.729} | \flcl{0.871} |
| | | | | Diff. 2 | \flcl{0.748} | \flcl{0.848} |
| \multirow{3}{2.5cm}{Template} | \multirow{3}{2.5cm}{1(\times)ADC} | \multirow{3}{2cm}{105} | \multirow{3}{2.5cm}{90} | Easy 2 | \flcl{0.820} | \flcl{0.876} |
| | | | | Diff.1 | \flcl{0.860} | \flcl{0.835} |
| | | | | Diff. 2 | \flcl{0.803} | \flcl{0.875} |
| \multirow{3}{2.5cm}{WDF [^200] } | \multirow{3}{2.5cm}{2(\times)BP-Filter 1(\times)ADC} | \multirow{3}{2cm}{41} | \multirow{3}{2.5cm}{104(^\star)} | Easy 2 | \flcl{0.951} | \flcl{0.991} |
| | | | | Diff. 1 | \flcl{0.850} | \flcl{0.929} |
| | | | | Diff. 2 | \flcl{0.846} | \flcl{0.916} |
| \multirow{3}{2.5cm}{(\Omega_3) Features} | \multirow{3}{2.5cm}{3(\times)BP-Filter 4(\times)Integrator 4(\times)DAC} | \multirow{3}{2cm}{16} | \multirow{3}{2.5cm}{1} | Easy 2 | \flcl{0.800} | \flcl{0.946} |
| | | | | Diff. 1 | \flcl{0.723} | \flcl{0.931} |
| | | | | Diff. 2 | \flcl{0.798} | \flcl{0.886} |
A number of methods are shown in Table 12 where we see classification accuracy and the corresponding hardware requirements in both the analogue and digital domain. The RVD and template methods presented in Section 33 represent the digital approach where little analogue components are needed beside the quantizer. Allocating more processing power or memory resources would imply choosing one over the other. As expected supervised intervention allows methods like WDF [^200] to leverage a substantial improvement with respect to resource efficient classification. In this perspective we see using \\(\Omega_3\\) features as distributing our resources in the analogue domain while still maintaining comparable classification accuracy but require less reliance on digital scaling factors. We provide more comparison details in Section 62 for the proposed digital and analogue methods proposed by this work as well as the equivalent Matlab implementation used for evaluation.
# 52 Conclusion
This chapter has proposed a number of time-domain constructs that encourage mixed signal design for instrumentation. Where we derived underlying concepts from the phase state of a ring oscillator in order to represent continuous valued time domain memory as the equivalent of a clocked filp-flop or sampled capacitor. In addition we have discussed the means to analytically evaluate and optimize the characteristics of these topologies. Overall we present are clear benefits over conventional implementations such as instrumentation and the functional manipulation of continuous valued signals. Moreover these structures will scale performance with technology due to the extensive use of digital gates. The instrumentation structure in particular gives way to fully synthesized platforms. Performing processing and filtering in the digital domain remains to be critical for robust sensing of LFPs and EAPs in very poor signal to noise conditions. A 0.6 V 58 dB SNDR time domain instrumentation architecture is demonstrated with a NEF of 1.18 that generates multiphase PWM encoded digital signals using sub 0.01 mm\\(^2\\) footprint and employing bandpass filtering with 40 dB/Dec roll off.
In extension we demonstrated the capacity for mixed signal analogue to information conversion with respect to unsupervised classification that uses adaptive techniques to converge towards specific signal characteristics. Using reconfigurable integration of selected temporal sections in the spike shape lets us effectively focus resources on feature and cluster evaluation without open loop quantization. This mitigates the trade off associated with resolution and digital complexity. The main challenge as pointed out is establishing what dynamics will allow convergence to optimal feature extraction with reduced hardware requirements. Here we exploit certain phenomena in the principle components of spike shapes and the sensitivity of group delay of analogue detection to frequency content in spike waveforms to achieve direct classification.
It is typical that techniques behind instrumentation and signal acquisition are more mature in development and direction when compared to different processing modalities. Especially when realizing mixed signal methods for machine learning where a multitude of convoluted factors impact performance. There is much still to addressed when adaptive techniques are evaluated with respect to their resource efficiency and this will likely be a important aspect that will emerge in many intelligent sensor systems.
# References:
[^1]: R.Q. Quiroga, Z.Nadasdy, and Y.Ben-Shaul, ''Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering,'' Neural Computation, vol.16, pp. 1661--1687, April 2004. [Online]: http://dx.doi.org/10.1162/089976604774201631
[^2]: R.A. Normann, ''Technology insight: future neuroprosthetic therapies for disorders of the nervous system,'' Nature Clinical Practice Neurology, vol.3, pp. 444--452, August 2007. [Online]: http://dx.doi.org/10.1038/ncpneuro0556
[^3]: K.Birmingham, V.Gradinaru, P.Anikeeva, W.M. Grill, B.Pikov, VictorMcLaughlin, P.Pasricha, K.Weber, DouglasLudwig, and K.Famm, ''Bioelectronic medicines: a research roadmap,'' Nature Reviews Drug Discovery, vol.13, pp. 399--400, May 2014. [Online]: http://dx.doi.org/10.1038/nrd4351
[^4]: ''Bridging the bio-electronic divide,'' Defense Advanced Research Projects Agency, Arlington, Texas, January 2016. [Online]: http://www.darpa.mil/news-events/2015-01-19
[^5]: G.Fritsch and E.Hitzig, ''ber die elektrische erregbarkeit des grosshirns,'' Archiv für Anatomie, Physiologie und Wissenschaftliche Medicin., vol.37, pp. 300--332, 1870.
[^6]: G.E. Loeb, ''Cochlear prosthetics,'' Annual Review of Neuroscience, vol.13, no.1, pp. 357--371, 1990, pMID: 2183680. [Online]: http://dx.doi.org/10.1146/annurev.ne.13.030190.002041
[^7]: ''Annual update bcig uk cochlear implant provision,'' British Cochlear Implant Group, London WC1X 8EE, UK, pp. 1--2, March 2015. [Online]: http://www.bcig.org.uk/wp-content/uploads/2015/12/CI-activity-2015.pdf
[^8]: M.Alexander, ''Neuro-numbers,'' Association of British Neurologists (ABN), London SW9 6WY, UK, pp. 1--12, April 2003. [Online]: http://www.neural.org.uk/store/assets/files/20/original/NeuroNumbers.pdf
[^9]: A.Jackson and J.B. Zimmermann, ''Neural interfaces for the brain and spinal cord — restoring motor function,'' Nature Reviews Neurology, vol.8, pp. 690--699, December 2012. [Online]: http://dx.doi.org/10.1038/nrneurol.2012.219
[^10]: M.Gilliaux, A.Renders, D.Dispa, D.Holvoet, J.Sapin, B.Dehez, C.Detrembleur, T.M. Lejeune, and G.Stoquart, ''Upper limb robot-assisted therapy in cerebral palsy: A single-blind randomized controlled trial,'' Neurorehabilitation AND Neural Repair, vol.29, no.2, pp. 183--192, February 2015. [Online]: http://nnr.sagepub.com/content/29/2/183.abstract
[^11]: P.Osten and T.W. Margrie, ''Mapping brain circuitry with a light microscope,'' Nature Methods, vol.10, pp. 515--523, June 2013. [Online]: http://dx.doi.org/10.1038/nmeth.2477
[^12]: S.M. Gomez-Amaya, M.F. Barbe, W.C. deGroat, J.M. Brown, J.Tuite, Gerald F.ANDCorcos, S.B. Fecho, A.S. Braverman, and M.R. RuggieriSr, ''Neural reconstruction methods of restoring bladder function,'' Nature Reviews Urology, vol.12, pp. 100--118, February 2015. [Online]: http://dx.doi.org/10.1038/nrurol.2015.4
[^13]: H.Yu, W.Xiong, H.Zhang, W.Wang, and Z.Li, ''A parylene self-locking cuff electrode for peripheral nerve stimulation and recording,'' IEEE/ASME Journal of Microelectromechanical Systems, vol.23, no.5, pp. 1025--1035, Oct 2014. [Online]: http://dx.doi.org/10.1109/JMEMS.2014.2333733
[^14]: J.S. Ho, S.Kim, and A.S.Y. Poon, ''Midfield wireless powering for implantable systems,'' Proceedings of the IEEE, vol. 101, no.6, pp. 1369--1378, June 2013. [Online]: http://dx.doi.org/10.1109/JPROC.2013.2251851
[^15]: R.D. KEYNES, ''Excitable membranes,'' Nature, vol. 239, pp. 29--32, September 1972. [Online]: http://dx.doi.org/10.1038/239029a0
[^16]: A.D. Grosmark and G.Buzs\'aki, ''Diversity in neural firing dynamics supports both rigid and learned hippocampal sequences,'' Science, vol. 351, no. 6280, pp. 1440--1443, March 2016. [Online]: http://science.sciencemag.org/content/351/6280/1440
[^17]: B.Sakmann and E.Neher, ''Patch clamp techniques for studying ionic channels in excitable membranes,'' Annual Review of Physiology, vol.46, no.1, pp. 455--472, October 1984, pMID: 6143532. [Online]: http://dx.doi.org/10.1146/annurev.ph.46.030184.002323
[^18]: M.P. Ward, P.Rajdev, C.Ellison, and P.P. Irazoqui, ''Toward a comparison of microelectrodes for acute and chronic recordings,'' Brain Research, vol. 1282, pp. 183 -- 200, July 2009. [Online]: http://www.sciencedirect.com/science/article/pii/S0006899309010841
[^19]: J.E.B. Randles, ''Kinetics of rapid electrode reactions,'' Discuss. Faraday Soc., vol.1, pp. 11--19, 1947. [Online]: http://dx.doi.org/10.1039/DF9470100011
[^20]: M.E. Spira and A.Hai, ''Multi-electrode array technologies for neuroscience and cardiology,'' Nature Nanotechnology, vol.8, pp. 83 -- 94, February 2013. [Online]: http://dx.doi.org/10.1038/nnano.2012.265
[^21]: G.E. Moore, ''Cramming more components onto integrated circuits,'' Proceedings of the IEEE, vol.86, no.1, pp. 82--85, January 1998. [Online]: http://dx.doi.org/10.1109/JPROC.1998.658762
[^22]: I.Ferain, C.A. Colinge, and J.-P. Colinge, ''Multigate transistors as the future of classical metal-oxide-semiconductor field-effect transistors,'' Nature, vol. 479, pp. 310--316, November 2011. [Online]: http://dx.doi.org/10.1038/nature10676
[^23]: I.H. Stevenson and K.P. Kording, ''How advances in neural recording affect data analysis,'' Nature neuroscience, vol.14, no.2, pp. 139--142, February 2011. [Online]: http://dx.doi.org/10.1038/nn.2731
[^24]: C.Thomas, P.Springer, G.Loeb, Y.Berwald-Netter, and L.Okun, ''A miniature microelectrode array to monitor the bioelectric activity of cultured cells,'' Experimental cell research, vol.74, no.1, pp. 61--66, September 1972. [Online]: http://dx.doi.org/0.1016/0014-4827(72)90481-8
[^25]: R.A. Andersen, E.J. Hwang, and G.H. Mulliken, ''Cognitive neural prosthetics,'' Annual review of Psychology, vol.61, pp. 169--190, December 2010, pMID: 19575625. [Online]: http://dx.doi.org/10.1146/annurev.psych.093008.100503
[^26]: L.A. Jorgenson, W.T. Newsome, D.J. Anderson, C.I. Bargmann, E.N. Brown, K.Deisseroth, J.P. Donoghue, K.L. Hudson, G.S. Ling, P.R. MacLeish etal., ''The brain initiative: developing technology to catalyse neuroscience discovery,'' Philosophical Transactions of the Royal Society of London B: Biological Sciences, vol. 370, no. 1668, p. 20140164, 2015.
[^27]: E.DAngelo, G.Danese, G.Florimbi, F.Leporati, A.Majani, S.Masoli, S.Solinas, and E.Torti, ''The human brain project: High performance computing for brain cells hw/sw simulation and understanding,'' in Proceedings of the Digital System Design Conference, August 2015, pp. 740--747. [Online]: http://dx.doi.org/10.1109/DSD.2015.80
[^28]: K.Famm, B.Litt, K.J. Tracey, E.S. Boyden, and M.Slaoui, ''Drug discovery: a jump-start for electroceuticals,'' Nature, vol. 496, no. 7444, pp. 159--161, April 2013. [Online]: http://dx.doi.org/0.1038/496159a
[^29]: K.Deisseroth, ''Optogenetics,'' Nature methods, vol.8, no.1, pp. 26--29, January 2011. [Online]: http://dx.doi.org/10.1038/nmeth.f.324
[^30]: M.Velliste, S.Perel, M.C. Spalding, A.S. Whitford, and A.B. Schwartz, ''Cortical control of a prosthetic arm for self-feeding,'' Nature, vol. 453, no. 7198, pp. 1098--1101, June 2008. [Online]: http://dx.doi.org/10.1038/nature06996
[^31]: T.N. Theis and P.M. Solomon, ''In quest of the "next switch" prospects for greatly reduced power dissipation in a successor to the silicon field-effect transistor,'' Proceedings of the IEEE, vol.98, no.12, pp. 2005--2014, December 2010. [Online]: http://dx.doi.org/10.1109/JPROC.2010.2066531
[^32]: G.M. Amdahl, ''Validity of the single processor approach to achieving large scale computing capabilities, reprinted from the afips conference proceedings, vol. 30 (atlantic city, n.j., apr. 18-20), afips press, reston, va., 1967, pp. 483-485, when dr. amdahl was at international business machines corporation, sunnyvale, california,'' in AFIPS Conference Proceedings, Vol. 30 (Atlantic City, N.J., Apr. 18-20), vol.12, no.3.\hskip 1em plus 0.5em minus 0.4em
elax IEEE, Summer 2007, pp. 19--20. [Online]: http://dx.doi.org/0.1109/N-SSC.2007.4785615
[^33]: J.G. Koller and W.C. Athas, ''Adiabatic switching, low energy computing, and the physics of storing and erasing information,'' in IEEE Proceedings of the Workshop on Physics and Computation.\hskip 1em plus 0.5em minus 0.4em
elax IEEE, October 1992, pp. 267--270. [Online]: http://dx.doi.org/10.1109/PHYCMP.1992.615554
[^34]: E.P. DeBenedictis, J.E. Cook, M.F. Hoemmen, and T.S. Metodi, ''Optimal adiabatic scaling and the processor-in-memory-and-storage architecture (oas :pims),'' in IEEE Proceedings of the International Symposium on Nanoscale Architectures.\hskip 1em plus 0.5em minus 0.4em
elax IEEE, July 2015, pp. 69--74. [Online]: http://dx.doi.org/10.1109/NANOARCH.2015.7180589
[^35]: S.Houri, G.Billiot, M.Belleville, A.Valentian, and H.Fanet, ''Limits of cmos technology and interest of nems relays for adiabatic logic applications,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.62, no.6, pp. 1546--1554, June 2015. [Online]: http://dx.doi.org/10.1109/TCSI.2015.2415177
[^36]: S.K. Arfin and R.Sarpeshkar, ''An energy-efficient, adiabatic electrode stimulator with inductive energy recycling and feedback current regulation,'' IEEE Transactions on Biomedical Circuits and Systems, vol.6, no.1, pp. 1--14, February 2012. [Online]: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6036003&isnumber=6138606
[^37]: P.R. Kinget, ''Scaling analog circuits into deep nanoscale cmos: Obstacles and ways to overcome them,'' in IEEE Proceedings of the Custom Integrated Circuits Conference.\hskip 1em plus 0.5em minus 0.4em
elax IEEE, September 2015, pp. 1--8. [Online]: http://dx.doi.org/10.1109/CICC.2015.7338394
[^38]: K.Bernstein, D.J. Frank, A.E. Gattiker, W.Haensch, B.L. Ji, S.R. Nassif, E.J. Nowak, D.J. Pearson, and N.J. Rohrer, ''High-performance cmos variability in the 65-nm regime and beyond,'' IBM Journal of Research AND Development, vol.50, no. 4.5, pp. 433--449, July 2006. [Online]: http://dx.doi.org/10.1147/rd.504.0433
[^39]: L.L. Lewyn, T.Ytterdal, C.Wulff, and K.Martin, ''Analog circuit design in nanoscale cmos technologies,'' Proceedings of the IEEE, vol.97, no.10, pp. 1687--1714, October 2009. [Online]: http://dx.doi.org/10.1109/JPROC.2009.2024663
[^40]: Y.Xin, W.X.Y. Li, Z.Zhang, R.C.C. Cheung, D.Song, and T.W. Berger, ''An application specific instruction set processor (asip) for adaptive filters in neural prosthetics,'' IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.12, no.5, pp. 1034--1047, September 2015. [Online]: http://dx.doi.org/10.1109/TCBB.2015.2440248
[^41]: G.Schalk, P.Brunner, L.A. Gerhardt, H.Bischof, and J.R. Wolpaw, ''Brain-computer interfaces (bcis): detection instead of classification,'' Journal of neuroscience methods, vol. 167, no.1, pp. 51--62, 2008, brain-Computer Interfaces (BCIs). [Online]: http://www.sciencedirect.com/science/article/pii/S0165027007004116
[^42]: Z.Li, J.E. O'Doherty, T.L. Hanson, M.A. Lebedev, C.S. Henriquez, and M.A. Nicolelis, ''Unscented kalman filter for brain-machine interfaces,'' PloS one, vol.4, no.7, pp. 1--18, 2009. [Online]: http://dx.doi.org/10.1371/journal.pone.0006243
[^43]: A.L. Orsborn, H.G. Moorman, S.A. Overduin, M.M. Shanechi, D.F. Dimitrov, and J.M. Carmena, ''Closed-loop decoder adaptation shapes neural plasticity for skillful neuroprosthetic control,'' Neuron, vol.82, pp. 1380 -- 1393, March 2016. [Online]: http://dx.doi.org/10.1016/j.neuron.2014.04.048
[^44]: Y.Yan, X.Qin, Y.Wu, N.Zhang, J.Fan, and L.Wang, ''A restricted boltzmann machine based two-lead electrocardiography classification,'' in IEEE Proceedings of the International Conference on Wearable and Implantable Body Sensor Networks.\hskip 1em plus 0.5em minus 0.4em
elax IEEE, June 2015, pp. 1--9. [Online]: http://dx.doi.org/10.1109/BSN.2015.7299399
[^45]: B.M. Yu and J.P. Cunningham, ''Dimensionality reduction for large-scale neural recordings,'' Nature Neuroscience, vol.17, pp. 1500 -- 1509, November 2014. [Online]: http://dx.doi.org/10.1038/nn.3776
[^46]: S.Makeig, C.Kothe, T.Mullen, N.Bigdely-Shamlo, Z.Zhang, and K.Kreutz-Delgado, ''Evolving signal processing for brain: Computer interfaces,'' Proceedings of the IEEE, vol. 100, no. Special Centennial Issue, pp. 1567--1584, May 2012. [Online]: http://dx.doi.org/10.1109/JPROC.2012.2185009
[^47]: G.Indiveri and S.C. Liu, ''Memory and information processing in neuromorphic systems,'' Proceedings of the IEEE, vol. 103, no.8, pp. 1379--1397, August 2015. [Online]: http://dx.doi.org/10.1109/JPROC.2015.2444094
[^48]: Y.Chen, E.Yao, and A.Basu, ''A 128-channel extreme learning machine-based neural decoder for brain machine interfaces,'' IEEE Transactions on Biomedical Circuits and Systems, vol.10, no.3, pp. 679--692, June 2016. [Online]: http://dx.doi.org/10.1109/TBCAS.2015.2483618
[^49]: V.Karkare, S.Gibson, and D.Marković, ''A 75- $\mu$w, 16-channel neural spike-sorting processor with unsupervised clustering,'' IEEE Journal of Solid-State Circuits, vol.48, no.9, pp. 2230--2238, September 2013. [Online]: http://dx.doi.org/10.1109/JSSC.2013.2264616
[^50]: T.C. Chen, W.Liu, and L.G. Chen, ''128-channel spike sorting processor with a parallel-folding structure in 90nm process,'' in IEEE Proceedings of the International Symposium on Circuits and Systems, May 2009, pp. 1253--1256. [Online]: http://dx.doi.org/10.1109/ISCAS.2009.5117990
[^51]: G.Baranauskas, ''What limits the performance of current invasive brain machine interfaces?'' Frontiers in Systems Neuroscience, vol.8, no.68, April 2014. [Online]: http://www.frontiersin.org/systems_neuroscience/10.3389/fnsys.2014.00068
[^52]: E.F. Chang, ''Towards large-scale, human-based, mesoscopic neurotechnologies,'' Neuron, vol.86, pp. 68--78, March 2016. [Online]: http://dx.doi.org/10.1016/j.neuron.2015.03.037
[^53]: M.A.L. Nicolelis and M.A. Lebedev, ''Principles of neural ensemble physiology underlying the operation of brain-machine,'' Nature Reviews Neuroscience, vol.10, pp. 530--540, July 2009. [Online]: http://dx.doi.org/10.1038/nrn2653
[^54]: Z.Fekete, ''Recent advances in silicon-based neural microelectrodes and microsystems: a review,'' Sensors AND Actuators B: Chemical, vol. 215, pp. 300 -- 315, 2015. [Online]: http://www.sciencedirect.com/science/article/pii/S092540051500386X
[^55]: N.Saeidi, M.Schuettler, A.Demosthenous, and N.Donaldson, ''Technology for integrated circuit micropackages for neural interfaces, based on goldsilicon wafer bonding,'' Journal of Micromechanics AND Microengineering, vol.23, no.7, p. 075021, June 2013. [Online]: http://stacks.iop.org/0960-1317/23/i=7/a=075021
[^56]: K.Seidl, S.Herwik, T.Torfs, H.P. Neves, O.Paul, and P.Ruther, ''Cmos-based high-density silicon microprobe arrays for electronic depth control in intracortical neural recording,'' IEEE Journal of Microelectromechanical Systems, vol.20, no.6, pp. 1439--1448, December 2011. [Online]: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6033040&isnumber=6075219
[^57]: T.D.Y. Kozai, N.B. Langhals, P.R. Patel, X.Deng, H.Zhang, K.L. Smith, J.Lahann, N.A. Kotov, and D.R. Kipke, ''Ultrasmall implantable composite microelectrodes with bioactive surfaces for chronic neural interfaces,'' Nature Materials, vol.11, pp. 1065--1073, December 2012. [Online]: http://dx.doi.org/10.1038/nmat3468
[^58]: D.A. Schwarz, M.A. Lebedev, T.L. Hanson, D.F. Dimitrov, G.Lehew, J.Meloy, S.Rajangam, V.Subramanian, P.J. Ifft, Z.Li, A.Ramakrishnan, A.Tate, K.Z. Zhuang, and M.A.L. Nicolelis, ''Chronic, wireless recordings of large-scale brain activity in freely moving rhesus monkeys,'' Nature Methods, vol.11, pp. 670--676, April 2014. [Online]: http://dx.doi.org/10.1038/nmeth.2936
[^59]: P.Ruther, S.Herwik, S.Kisban, K.Seidl, and O.Paul, ''Recent progress in neural probes using silicon mems technology,'' IEEJ Transactions on Electrical and Electronic Engineering, vol.5, no.5, pp. 505--515, 2010. [Online]: http://dx.doi.org/10.1002/tee.20566
[^60]: ibitem3d-printH.-W. Kang, S.J. Lee, I.K. Ko, C.Kengla, J.J. Yoo, and A.Atala, ''A 3d bioprinting system to produce human-scale tissue constructs with structural integrity,'' Nature Biotechnology, vol.34, pp. 312--319, March 2016. [Online]: http://dx.doi.org/10.1038/nbt.3413
[^61]: ibitemdistrib-electC.Xie, J.Liu, T.-M. Fu, X.Dai, W.Zhou, and C.M. Lieber, ''Three-dimensional macroporous nanoelectronic networks as minimally invasive brain probes,'' Nature Materials, vol.14, pp. 1286--1292, May 2015. [Online]: http://dx.doi.org/10.1038/nmat4427
[^62]: R.R. Harrison, P.T. Watkins, R.J. Kier, R.O. Lovejoy, D.J. Black, B.Greger, and F.Solzbacher, ''A low-power integrated circuit for a wireless 100-electrode neural recording system,'' IEEE Journal of Solid-State Circuits, vol.42, no.1, pp. 123--133, Jan 2007. [Online]: http://dx.doi.org/10.1109/JSSC.2006.886567
[^63]: J.Guo, W.Ng, J.Yuan, S.Li, and M.Chan, ''A 200-channel area-power-efficient chemical and electrical dual-mode acquisition ic for the study of neurodegenerative diseases,'' IEEE Transactions on Biomedical Circuits and Systems, vol.10, no.3, pp. 567--578, June 2016. [Online]: http://dx.doi.org/10.1109/TBCAS.2015.2468052
[^64]: W.Biederman, D.J. Yeager, N.Narevsky, J.Leverett, R.Neely, J.M. Carmena, E.Alon, and J.M. Rabaey, ''A 4.78 mm 2 fully-integrated neuromodulation soc combining 64 acquisition channels with digital compression and simultaneous dual stimulation,'' IEEE Journal of Solid-State Circuits, vol.50, no.4, pp. 1038--1047, April 2015. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2384736
[^65]: R.Muller, S.Gambini, and J.M. Rabaey, ''A 0.013mm$^2$, $5 \mu w$, dc-coupled neural signal acquisition ic with 0.5v supply,'' IEEE Journal of Solid-State Circuits, vol.47, no.1, pp. 232--243, Jan 2012. [Online]: http://dx.doi.org/10.1109/JSSC.2011.2163552
[^66]: H.Kassiri, A.Bagheri, N.Soltani, K.Abdelhalim, H.M. Jafari, M.T. Salam, J.L.P. Velazquez, and R.Genov, ''Battery-less tri-band-radio neuro-monitor and responsive neurostimulator for diagnostics and treatment of neurological disorders,'' IEEE Journal of Solid-State Circuits, vol.51, no.5, pp. 1274--1289, May 2016. [Online]: http://dx.doi.org/10.1109/JSSC.2016.2528999
[^67]: M.Ballini, J.Müller, P.Livi, Y.Chen, U.Frey, A.Stettler, A.Shadmani, V.Viswam, I.L. Jones, D.Jäckel, M.Radivojevic, M.K. Lewandowska, W.Gong, M.Fiscella, D.J. Bakkum, F.Heer, and A.Hierlemann, ''A 1024-channel cmos microelectrode array with 26,400 electrodes for recording and stimulation of electrogenic cells in vitro,'' IEEE Journal of Solid-State Circuits, vol.49, no.11, pp. 2705--2719, Nov 2014. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2359219
[^68]: P.D. Wolf, Thermal considerations for the design of an implanted cortical brain--machine interface (BMI).\hskip 1em plus 0.5em minus 0.4em
elax CRC Press Boca Raton, FL, 2008, pMID: 21204402. [Online]: http://www.ncbi.nlm.nih.gov/books/NBK3932
[^69]: T.Denison, K.Consoer, W.Santa, A.T. Avestruz, J.Cooley, and A.Kelly, ''A 2 $\mu$w 100 nv/rthz chopper-stabilized instrumentation amplifier for chronic measurement of neural field potentials,'' IEEE Journal of Solid-State Circuits, vol.42, no.12, pp. 2934--2945, December 2007. [Online]: http://dx.doi.org/10.1109/JSSC.2007.908664
[^70]: B.Johnson, S.T. Peace, A.Wang, T.A. Cleland, and A.Molnar, ''A 768-channel cmos microelectrode array with angle sensitive pixels for neuronal recording,'' IEEE Sensors Journal, vol.13, no.9, pp. 3211--3218, Sept 2013. [Online]: http://dx.doi.org/10.1109/JSEN.2013.2266894
[^71]: C.M. Lopez, A.Andrei, S.Mitra, M.Welkenhuysen, W.Eberle, C.Bartic, R.Puers, R.F. Yazicioglu, and G.G.E. Gielen, ''An implantable 455-active-electrode 52-channel cmos neural probe,'' IEEE Journal of Solid-State Circuits, vol.49, no.1, pp. 248--261, January 2014. [Online]: http://dx.doi.org/10.1109/JSSC.2013.2284347
[^72]: J.Scholvin, J.P. Kinney, J.G. Bernstein, C.Moore-Kochlacs, N.Kopell, C.G. Fonstad, and E.S. Boyden, ''Close-packed silicon microelectrodes for scalable spatially oversampled neural recording,'' IEEE Transactions on Biomedical Engineering, vol.63, no.1, pp. 120--130, Jan 2016. [Online]: http://dx.doi.org/10.1109/TBME.2015.2406113
[^73]: M.Han, B.Kim, Y.A. Chen, H.Lee, S.H. Park, E.Cheong, J.Hong, G.Han, and Y.Chae, ''Bulk switching instrumentation amplifier for a high-impedance source in neural signal recording,'' IEEE Transactions on Circuits and Systems---Part II: Express Briefs, vol.62, no.2, pp. 194--198, Feb 2015. [Online]: http://dx.doi.org/10.1109/TCSII.2014.2368615
[^74]: R.Muller, S.Gambini, and J.M. Rabaey, ''A 0.013$ $mm$^2$, 5$ \mu$w, dc-coupled neural signal acquisition ic with 0.5 v supply,'' IEEE Journal of Solid-State Circuits, vol.47, no.1, pp. 232--243, Jan 2012. [Online]: http://dx.doi.org/10.1109/JSSC.2011.2163552
[^75]: ''Rhd2164 digital electrophysiology interface chip - data sheet,'' Intan Technologies, Los Angeles, California, December 2013. [Online]: http://www.intantech.com/files/Intan_RHD2164_datasheet.pdf
[^76]: K.M. Al-Ashmouny, S.I. Chang, and E.Yoon, ''A 4 $\mu$w/ch analog front-end module with moderate inversion and power-scalable sampling operation for 3-d neural microsystems,'' IEEE Transactions on Biomedical Circuits and Systems, vol.6, no.5, pp. 403--413, October 2012. [Online]: http://dx.doi.org/10.1109/TBCAS.2012.2218105
[^77]: D.Han, Y.Zheng, R.Rajkumar, G.S. Dawe, and M.Je, ''A 0.45 v 100-channel neural-recording ic with sub-$\mu$w/channel consumption in 0.18$\mu$m cmos,'' IEEE Transactions on Biomedical Circuits and Systems, vol.7, no.6, pp. 735--746, December 2013. [Online]: http://dx.doi.org/10.1109/TBCAS.2014.2298860
[^78]: S.B. Lee, H.M. Lee, M.Kiani, U.M. Jow, and M.Ghovanloo, ''An inductively powered scalable 32-channel wireless neural recording system-on-a-chip for neuroscience applications,'' IEEE Transactions on Biomedical Circuits and Systems, vol.4, no.6, pp. 360--371, Dec 2010. [Online]: http://dx.doi.org/10.1109/TBCAS.2010.2078814
[^79]: J.Yoo, L.Yan, D.El-Damak, M.A.B. Altaf, A.H. Shoeb, and A.P. Chandrakasan, ''An 8-channel scalable eeg acquisition soc with patient-specific seizure classification and recording processor,'' IEEE Journal of Solid-State Circuits, vol.48, no.1, pp. 214--228, Jan 2013. [Online]: http://dx.doi.org/10.1109/JSSC.2012.2221220
[^80]: M.A.B. Altaf and J.Yoo, ''A 1.83$ \mu$j/classification, 8-channel, patient-specific epileptic seizure classification soc using a non-linear support vector machine,'' IEEE Transactions on Biomedical Circuits and Systems, vol.10, no.1, pp. 49--60, Feb 2016. [Online]: http://dx.doi.org/10.1109/TBCAS.2014.2386891
[^81]: K.Abdelhalim, H.M. Jafari, L.Kokarovtseva, J.L.P. Velazquez, and R.Genov, ''64-channel uwb wireless neural vector analyzer soc with a closed-loop phase synchrony-triggered neurostimulator,'' IEEE Journal of Solid-State Circuits, vol.48, no.10, pp. 2494--2510, Oct 2013. [Online]: http://dx.doi.org/10.1109/JSSC.2013.2272952
[^82]: A.Bagheri, S.R.I. Gabran, M.T. Salam, J.L.P. Velazquez, R.R. Mansour, M.M.A. Salama, and R.Genov, ''Massively-parallel neuromonitoring and neurostimulation rodent headset with nanotextured flexible microelectrodes,'' IEEE Transactions on Biomedical Circuits and Systems, vol.7, no.5, pp. 601--609, Oct 2013. [Online]: http://dx.doi.org/10.1109/TBCAS.2013.2281772
[^83]: H.G. Rhew, J.Jeong, J.A. Fredenburg, S.Dodani, P.G. Patil, and M.P. Flynn, ''A fully self-contained logarithmic closed-loop deep brain stimulation soc with wireless telemetry and wireless power management,'' IEEE Journal of Solid-State Circuits, vol.49, no.10, pp. 2213--2227, Oct 2014. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2346779
[^84]: W.Biederman, D.J. Yeager, N.Narevsky, J.Leverett, R.Neely, J.M. Carmena, E.Alon, and J.M. Rabaey, ''A 4.78 mm 2 fully-integrated neuromodulation soc combining 64 acquisition channels with digital compression and simultaneous dual stimulation,'' IEEE Journal of Solid-State Circuits, vol.50, no.4, pp. 1038--1047, April 2015. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2384736
[^85]: A.Mendez, A.Belghith, and M.Sawan, ''A dsp for sensing the bladder volume through afferent neural pathways,'' IEEE Transactions on Biomedical Circuits and Systems, vol.8, no.4, pp. 552--564, Aug 2014. [Online]: http://dx.doi.org/10.1109/TBCAS.2013.2282087
[^86]: T.T. Liu and J.M. Rabaey, ''A 0.25 v 460 nw asynchronous neural signal processor with inherent leakage suppression,'' IEEE Journal of Solid-State Circuits, vol.48, no.4, pp. 897--906, April 2013. [Online]: http://dx.doi.org/10.1109/JSSC.2013.2239096
[^87]: D.Han, Y.Zheng, R.Rajkumar, G.S. Dawe, and M.Je, ''A 0.45 v 100-channel neural-recording ic with sub-$\mu$w/channel consumption in 0.18$ \mu$m cmos,'' IEEE Transactions on Biomedical Circuits and Systems, vol.7, no.6, pp. 735--746, Dec 2013. [Online]: http://dx.doi.org/10.1109/TBCAS.2014.2298860
[^88]: R.Muller, H.P. Le, W.Li, P.Ledochowitsch, S.Gambini, T.Bjorninen, A.Koralek, J.M. Carmena, M.M. Maharbiz, E.Alon, and J.M. Rabaey, ''A minimally invasive 64-channel wireless $\mu$ecog implant,'' IEEE Journal of Solid-State Circuits, vol.50, no.1, pp. 344--359, Jan 2015. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2364824
[^89]: B.Vigraham, J.Kuppambatti, and P.R. Kinget, ''Switched-mode operational amplifiers and their application to continuous-time filters in nanoscale cmos,'' IEEE Journal of Solid-State Circuits, vol.49, no.12, pp. 2758--2772, December 2014. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2354641
[^90]: V.Karkare, H.Chandrakumar, D.Rozgić, and D.Marković, ''Robust, reconfigurable, and power-efficient biosignal recording systems,'' in IEEE Proceedings of the Custom Integrated Circuits Conference, Sept 2014, pp. 1--8. [Online]: http://dx.doi.org/10.1109/CICC.2014.6946018
[^91]: L.B. Leene and T.G. Constandinou, ''A 0.45v continuous time-domain filter using asynchronous oscillator structures,'' in IEEE Proceedings of the International Conference on Electronics, Circuits and Systems, December 2016.
[^92]: R.Mohan, L.Yan, G.Gielen, C.V. Hoof, and R.F. Yazicioglu, ''0.35 v time-domain-based instrumentation amplifier,'' Electronics Letters, vol.50, no.21, pp. 1513--1514, October 2014. [Online]: http://dx.doi.org/10.1049/el.2014.2471
[^93]: X.Zhang, Z.Zhang, Y.Li, C.Liu, Y.X. Guo, and Y.Lian, ''A 2.89$ \mu$w dry-electrode enabled clockless wireless ecg soc for wearable applications,'' IEEE Journal of Solid-State Circuits, vol.51, no.10, pp. 2287--2298, Oct 2016. [Online]: http://dx.doi.org/10.1109/JSSC.2016.2582863
[^94]: M.Elia, L.B. Leene, and T.G. Constandinou, ''Continuous-time micropower interface for neural recording applications,'' in IEEE Proceedings of the International Symposium on Circuits and Systems, May 2016, pp. 534--537. [Online]: http://dx.doi.org/10.1109/ISCAS.2016.7527295
[^95]: N.Guo, Y.Huang, T.Mai, S.Patil, C.Cao, M.Seok, S.Sethumadhavan, and Y.Tsividis, ''Energy-efficient hybrid analog/digital approximate computation in continuous time,'' IEEE Journal of Solid-State Circuits, vol.51, no.7, pp. 1514--1524, July 2016. [Online]: http://dx.doi.org/10.1109/JSSC.2016.2543729
[^96]: B.Bozorgzadeh, D.R. Schuweiler, M.J. Bobak, P.A. Garris, and P.Mohseni, ''Neurochemostat: A neural interface soc with integrated chemometrics for closed-loop regulation of brain dopamine,'' IEEE Transactions on Biomedical Circuits and Systems, vol.10, no.3, pp. 654--667, June 2016. [Online]: http://dx.doi.org/10.1109/TBCAS.2015.2453791
[^97]: E.B. Myers and M.L. Roukes, ''Comparative advantages of mechanical biosensors,'' Nature nanotechnology, vol.6, no.4, pp. 1748--3387, April 2011. [Online]: http://dx.doi.org/10.1038/nnano.2011.44
[^98]: R.Machado, N.Soltani, S.Dufour, M.T. Salam, P.L. Carlen, R.Genov, and M.Thompson, ''Biofouling-resistant impedimetric sensor for array high-resolution extracellular potassium monitoring in the brain,'' Biosensors, vol.6, no.4, p.53, October 2016. [Online]: http://dx.doi.org/10.3390/bios6040053
[^99]: J.Guo, W.Ng, J.Yuan, S.Li, and M.Chan, ''A 200-channel area-power-efficient chemical and electrical dual-mode acquisition ic for the study of neurodegenerative diseases,'' IEEE Transactions on Biomedical Circuits and Systems, vol.10, no.3, pp. 567--578, June 2016. [Online]: http://dx.doi.org/10.1109/TBCAS.2015.2468052
[^100]: D.A. Dombeck, A.N. Khabbaz, F.Collman, T.L. Adelman, and D.W. Tank, ''Imaging large-scale neural activity with cellular resolution in awake, mobile mice.'' Neuron, vol.56, no.1, pp. 43--57, October 2007. [Online]: http://dx.doi.org/10.1016/j.neuron.2007.08.003
[^101]: T.York, S.B. Powell, S.Gao, L.Kahan, T.Charanya, D.Saha, N.W. Roberts, T.W. Cronin, J.Marshall, S.Achilefu, S.P. Lake, B.Raman, and V.Gruev, ''Bioinspired polarization imaging sensors: From circuits and optics to signal processing algorithms and biomedical applications,'' Proceedings of the IEEE, vol. 102, no.10, pp. 1450--1469, Oct 2014. [Online]: http://dx.doi.org/10.1109/JPROC.2014.2342537
[^102]: K.Paralikar, P.Cong, O.Yizhar, L.E. Fenno, W.Santa, C.Nielsen, D.Dinsmoor, B.Hocken, G.O. Munns, J.Giftakis, K.Deisseroth, and T.Denison, ''An implantable optical stimulation delivery system for actuating an excitable biosubstrate,'' IEEE Journal of Solid-State Circuits, vol.46, no.1, pp. 321--332, Jan 2011. [Online]: http://dx.doi.org/10.1109/JSSC.2010.2074110
[^103]: N.Ji and S.L. Smith, ''Technologies for imaging neural activity in large volumes,'' Nature Neuroscience, vol.19, pp. 1154--1164, September 2016. [Online]: http://dx.doi.org/10.1038/nn.4358
[^104]: S.Song, K.D. Miller, and L.F. Abbott, ''Competitive hebbian learning through spike-timing-dependent synaptic plasticity,'' Nature Neuroscience, vol.3, pp. 919--926, September 2000. [Online]: http://dx.doi.org/10.1038/78829
[^105]: T.Kurafuji, M.Haraguchi, M.Nakajima, T.Nishijima, T.Tanizaki, H.Yamasaki, T.Sugimura, Y.Imai, M.Ishizaki, T.Kumaki, K.Murata, K.Yoshida, E.Shimomura, H.Noda, Y.Okuno, S.Kamijo, T.Koide, H.J. Mattausch, and K.Arimoto, ''A scalable massively parallel processor for real-time image processing,'' IEEE Journal of Solid-State Circuits, vol.46, no.10, pp. 2363--2373, October 2011. [Online]: http://dx.doi.org/10.1109/JSSC.2011.2159528
[^106]: J.Y. Kim, M.Kim, S.Lee, J.Oh, K.Kim, and H.J. Yoo, ''A 201.4 gops 496 mw real-time multi-object recognition processor with bio-inspired neural perception engine,'' IEEE Journal of Solid-State Circuits, vol.45, no.1, pp. 32--45, Jan 2010. [Online]: http://dx.doi.org/10.1109/JSSC.2009.2031768
[^107]: C.C. Cheng, C.H. Lin, C.T. Li, and L.G. Chen, ''ivisual: An intelligent visual sensor soc with 2790 fps cmos image sensor and 205 gops/w vision processor,'' IEEE Journal of Solid-State Circuits, vol.44, no.1, pp. 127--135, Jan 2009. [Online]: http://dx.doi.org/10.1109/JSSC.2008.2007158
[^108]: H.Noda, M.Nakajima, K.Dosaka, K.Nakata, M.Higashida, O.Yamamoto, K.Mizumoto, T.Tanizaki, T.Gyohten, Y.Okuno, H.Kondo, Y.Shimazu, K.Arimoto, K.Saito, and T.Shimizu, ''The design and implementation of the massively parallel processor based on the matrix architecture,'' IEEE Journal of Solid-State Circuits, vol.42, no.1, pp. 183--192, Jan 2007. [Online]: http://dx.doi.org/10.1109/JSSC.2006.886545
[^109]: M.S. Chae, W.Liu, and M.Sivaprakasam, ''Design optimization for integrated neural recording systems,'' IEEE Journal of Solid-State Circuits, vol.43, no.9, pp. 1931--1939, September 2008. [Online]: http://dx.doi.org/10.1109/JSSC.2008.2001877
[^110]: K.J. Miller, L.B. Sorensen, J.G. Ojemann, and M.den Nijs, ''Power-law scaling in the brain surface electric potential,'' PLoS Comput Biol, vol.5, no.12, pp. 1--10, 12 2009. [Online]: http://dx.doi.org/10.1371%2Fjournal.pcbi.1000609
[^111]: R.Harrison and C.Charles, ''A low-power low-noise cmos amplifier for neural recording applications,'' IEEE Journal of Solid-State Circuits, vol.38, no.6, pp. 958--965, June 2003. [Online]: http://dx.doi.org/10.1109/JSSC.2003.811979
[^112]: W.Sansen, ''1.3 analog cmos from 5 micrometer to 5 nanometer,'' in IEEE Proceedings of the International Solid-State Circuits Conference.\hskip 1em plus 0.5em minus 0.4em
elax IEEE, February 2015, pp. 1--6. [Online]: http://dx.doi.org/10.1109/ISSCC.2015.7062848
[^113]: M.S.J. Steyaert and W.M.C. Sansen, ''A micropower low-noise monolithic instrumentation amplifier for medical purposes,'' IEEE Journal of Solid-State Circuits, vol.22, no.6, pp. 1163--1168, December 1987. [Online]: http://dx.doi.org/10.1109/JSSC.1987.1052869
[^114]: W.Wattanapanitch, M.Fee, and R.Sarpeshkar, ''An energy-efficient micropower neural recording amplifier,'' IEEE Transactions on Biomedical Circuits and Systems, vol.1, no.2, pp. 136--147, June 2007. [Online]: http://dx.doi.org/10.1109/TBCAS.2007.907868
[^115]: B.Johnson and A.Molnar, ''An orthogonal current-reuse amplifier for multi-channel sensing,'' IEEE Journal of Solid-State Circuits, vol.48, no.6, pp. 1487--1496, June 2013. [Online]: http://dx.doi.org/10.1109/JSSC.2013.2257478
[^116]: C.Qian, J.Parramon, and E.Sanchez-Sinencio, ''A micropower low-noise neural recording front-end circuit for epileptic seizure detection,'' IEEE Journal of Solid-State Circuits, vol.46, no.6, pp. 1392--1405, June 2011. [Online]: http://dx.doi.org/10.1109/JSSC.2011.2126370
[^117]: X.Zou, L.Liu, J.H. Cheong, L.Yao, P.Li, M.-Y. Cheng, W.L. Goh, R.Rajkumar, G.Dawe, K.-W. Cheng, and M.Je, ''A 100-channel 1-mw implantable neural recording ic,'' IEEE Transactions on Circuits and Systems---Part I: Regular Papers, vol.60, no.10, pp. 2584--2596, October 2013. [Online]: http://dx.doi.org/10.1109/TCSI.2013.2249175
[^118]: V.Majidzadeh, A.Schmid, and Y.Leblebici, ''Energy efficient low-noise neural recording amplifier with enhanced noise efficiency factor,'' IEEE Transactions on Biomedical Circuits and Systems, vol.5, no.3, pp. 262--271, June 2011. [Online]: http://dx.doi.org/10.1109/TBCAS.2010.2078815
[^119]: ibitemQ-basedC.C. Enz and E.A. Vittoz, Charge-based MOS transistor modeling: the EKV model for low-power AND RF IC design.\hskip 1em plus 0.5em minus 0.4em
elax John Wiley & Sons, August 2006. [Online]: http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470855452.html
[^120]: Y.Yasuda, T.-J.K. Liu, and C.Hu, ''Flicker-noise impact on scaling of mixed-signal cmos with hfsion,'' IEEE Transactions on Electron Devices, vol.55, no.1, pp. 417--422, January 2008. [Online]: http://dx.doi.org/10.1109/TED.2007.910759
[^121]: S.-Y. Wu, C.Lin, M.Chiang, J.Liaw, J.Cheng, S.Yang, M.Liang, T.Miyashita, C.Tsai, B.Hsu, H.Chen, T.Yamamoto, S.Chang, V.Chang, C.Chang, J.Chen, H.Chen, K.Ting, Y.Wu, K.Pan, R.Tsui, C.Yao, P.Chang, H.Lien, T.Lee, H.Lee, W.Chang, T.Chang, R.Chen, M.Yeh, C.Chen, Y.Chiu, Y.Chen, H.Huang, Y.Lu, C.Chang, M.Tsai, C.Liu, K.Chen, C.Kuo, H.Lin, S.Jang, and Y.Ku, ''A 16nm finfet cmos technology for mobile soc and computing applications,'' in IEEE Proceedings of the International Electron Devices Meeting, December 2013, pp. 9.1.1--9.1.4. [Online]: http://dx.doi.org/10.1109/IEDM.2013.6724591
[^122]: L.B. Leene, Y.Liu, and T.G. Constandinou, ''A compact recording array for neural interfaces,'' in IEEE Proceedings of the Biomedical Circuits and Systems Conference, October 2013, pp. 97--100. [Online]: http://dx.doi.org/10.1109/BioCAS.2013.6679648
[^123]: Q.Fan, F.Sebastiano, J.Huijsing, and K.Makinwa, ''A $1.8 \mu w\:60 nv/√Hz$ capacitively-coupled chopper instrumentation amplifier in 65 nm cmos for wireless sensor nodes,'' IEEE Journal of Solid-State Circuits, vol.46, no.7, pp. 1534--1543, July 2011. [Online]: http://dx.doi.org/10.1109/JSSC.2011.2143610
[^124]: H.Chandrakumar and D.Markovic, ''A simple area-efficient ripple-rejection technique for chopped biosignal amplifiers,'' IEEE Transactions on Circuits and Systems---Part II: Express Briefs, vol.62, no.2, pp. 189--193, February 2015. [Online]: http://dx.doi.org/10.1109/TCSII.2014.2387686
[^125]: H.Chandrakumar and D.Markovic, ''A 2$\mu$w 40mvpp linear-input-range chopper-stabilized bio-signal amplifier with boosted input impedance of 300mohm and electrode-offset filtering,'' in IEEE Proceedings of the International Solid-State Circuits Conference.\hskip 1em plus 0.5em minus 0.4em
elax IEEE, January 2016, pp. 96--97. [Online]: http://dx.doi.org/10.1109/ISSCC.2016.7417924
[^126]: H.Rezaee-Dehsorkh, N.Ravanshad, R.Lotfi, K.Mafinezhad, and A.M. Sodagar, ''Analysis and design of tunable amplifiers for implantable neural recording applications,'' IEEE Transactions on Emerging and Selected Topics in Circuits and Systems, vol.1, no.4, pp. 546--556, December 2011. [Online]: http://dx.doi.org/10.1109/JETCAS.2011.2174492
[^127]: X.Zou, X.Xu, L.Yao, and Y.Lian, ''A 1-v 450-nw fully integrated programmable biomedical sensor interface chip,'' IEEE Journal of Solid-State Circuits, vol.44, no.4, pp. 1067--1077, April 2009. [Online]: http://dx.doi.org/10.1109/JSSC.2009.2014707
[^128]: L.Leene and T.Constandinou, ''Ultra-low power design strategy for two-stage amplifier topologies,'' Electronics Letters, vol.50, no.8, pp. 583--585, April 2014. [Online]: http://dx.doi.org/10.1049/el.2013.4196
[^129]: H.G. Rey, C.Pedreira, and R.Q. Quiroga, ''Past, present and future of spike sorting techniques,'' Brain Research Bulletin, vol. 119, Part B, pp. 106--117, October 2015, advances in electrophysiological data analysis. [Online]: http://www.sciencedirect.com/science/article/pii/S0361923015000684
[^130]: Y.Chen, A.Basu, L.Liu, X.Zou, R.Rajkumar, G.S. Dawe, and M.Je, ''A digitally assisted, signal folding neural recording amplifier,'' IEEE Transactions on Biomedical Circuits and Systems, vol.8, no.4, pp. 528--542, August 2014. [Online]: http://dx.doi.org/10.1109/TBCAS.2013.2288680
[^131]: X.Yue, ''Determining the reliable minimum unit capacitance for the dac capacitor array of sar adcs,'' Microelectronics Journal, vol.44, no.6, pp. 473 -- 478, 2013. [Online]: http://www.sciencedirect.com/science/article/pii/S0026269213000815
[^132]: Y.Zhu, C.-H. Chan, U.-F. Chio, S.-W. Sin, S.-P. U, R.Martins, and F.Maloberti, ''Split-sar adcs: Improved linearity with power and speed optimization,'' IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.22, no.2, pp. 372--383, February 2014. [Online]: http://dx.doi.org/10.1109/TVLSI.2013.2242501
[^133]: L.Xie, G.Wen, J.Liu, and Y.Wang, ''Energy-efficient hybrid capacitor switching scheme for sar adc,'' Electronics Letters, vol.50, no.1, pp. 22--23, January 2014. [Online]: http://dx.doi.org/10.1049/el.2013.2794
[^134]: P.Nuzzo, F.DeBernardinis, P.Terreni, and G.Vander Plas, ''Noise analysis of regenerative comparators for reconfigurable adc architectures,'' IEEE Transactions on Circuits and Systems---Part I: Regular Papers, vol.55, no.6, pp. 1441--1454, July 2008. [Online]: http://dx.doi.org/10.1109/TCSI.2008.917991
[^135]: G.Heinzel, A.R\"udiger, and R.Schilling, ''Spectrum and spectral density estimation by the discrete fourier transform (dft), including a comprehensive list of window functions and some new at-top windows,'' pp. 25--27, February 2002. [Online]: http://hdl.handle.net/11858/00-001M-0000-0013-557A-5
[^136]: F.Gerfers, M.Ortmanns, and Y.Manoli, ''A 1.5-v 12-bit power-efficient continuous-time third-order sigma; delta; modulator,'' IEEE Journal of Solid-State Circuits, vol.38, no.8, pp. 1343--1352, Aug 2003. [Online]: http://dx.doi.org/10.1109/JSSC.2003.814432
[^137]: Y.Chae, K.Souri, and K.A.A. Makinwa, ''A 6.3$ \mu$w 20$ $bit incremental zoom-adc with 6 ppm inl and 1 $\mu$v offset,'' IEEE Journal of Solid-State Circuits, vol.48, no.12, pp. 3019--3027, Dec 2013. [Online]: http://dx.doi.org/10.1109/JSSC.2013.2278737
[^138]: Y.S. Shu, L.T. Kuo, and T.Y. Lo, ''An oversampling sar adc with dac mismatch error shaping achieving 105db sfdr and 101db sndr over 1khz bw in 55nm cmos,'' in IEEE Proceedings of the International Solid-State Circuits Conference, January 2016, pp. 458--459. [Online]: http://dx.doi.org/10.1109/ISSCC.2016.7418105
[^139]: P.Harpe, E.Cantatore, and A.van Roermund, ''An oversampled 12/14b sar adc with noise reduction and linearity enhancements achieving up to 79.1db sndr,'' in IEEE Proceedings of the International Solid-State Circuits Conference, February 2014, pp. 194--195. [Online]: http://dx.doi.org/10.1109/ISSCC.2014.6757396
[^140]: ibitemchrch-turingM.Braverman, J.Schneider, and C.Rojas, ''Space-bounded church-turing thesis and computational tractability of closed systems,'' Physical Review Letters, vol. 115, August 2015. [Online]: http://link.aps.org/doi/10.1103/PhysRevLett.115.098701
[^141]: M.Verhelst and A.Bahai, ''Where analog meets digital: Analog-to-information conversion and beyond,'' IEEE Solid-State Circuits Magazine, vol.7, no.3, pp. 67--80, September 2015. [Online]: http://dx.doi.org/10.1109/MSSC.2015.2442394
[^142]: H.A. Marblestone, M.B. Zamft, G.Y. Maguire, G.M. Shapiro, R.T. Cybulski, I.J. Glaser, D.Amodei, P.B. Stranges, R.Kalhor, A.D. Dalrymple, D.Seo, E.Alon, M.M. Maharbiz, M.J. Carmena, M.J. Rabaey, S.E. Boyden, M.G. Church, and P.K. Kording, ''Physical principles for scalable neural recording,'' Frontiers in Computational Neuroscience, vol.7, no. 137, 2013. [Online]: http://www.frontiersin.org/computational_neuroscience/10.3389/fncom.2013.00137
[^143]: L.Traver, C.Tarin, P.Marti, and N.Cardona, ''Adaptive-threshold neural spike by noise-envelope tracking,'' Electronics Letters, vol.43, no.24, pp. 1333--1335, November 2007. [Online]: http://dx.doi.org/10.1049/el:20071631
[^144]: I.Obeid and P.Wolf, ''Evaluation of spike-detection algorithms fora brain-machine interface application,'' IEEE Transactions on Biomedical Engineering, vol.51, no.6, pp. 905--911, June 2004. [Online]: http://dx.doi.org/10.1109/TBME.2004.826683
[^145]: P.Watkins, G.Santhanam, K.Shenoy, and R.Harrison, ''Validation of adaptive threshold spike detector for neural recording,'' in IEEE Proceedings of the International Conference on Engineering in Medicine and Biology Society, vol.2, September 2004, pp. 4079--4082. [Online]: http://dx.doi.org/10.1109/IEMBS.2004.1404138
[^146]: T.Takekawa, Y.Isomura, and T.Fukai, ''Accurate spike sorting for multi-unit recordings,'' European Journal of Neuroscience, vol.31, no.2, pp. 263--272, 2010. [Online]: http://dx.doi.org/10.1111/j.1460-9568.2009.07068.x
[^147]: A.Zviagintsev, Y.Perelman, and R.Ginosar, ''Low-power architectures for spike sorting,'' in IEEE Proceedings of the International Conference on Neural Engineering, March 2005, pp. 162--165. [Online]: http://dx.doi.org/10.1109/CNE.2005.1419579
[^148]: A.Rodriguez-Perez, J.Ruiz-Amaya, M.Delgado-Restituto, and A.Rodriguez-Vazquez, ''A low-power programmable neural spike detection channel with embedded calibration and data compression,'' IEEE Transactions on Biomedical Circuits and Systems, vol.6, no.2, pp. 87--100, April 2012. [Online]: http://dx.doi.org/10.1109/TBCAS.2012.2187352
[^149]: U.Rutishauser, E.M. Schuman, and A.N. Mamelak, ''Online detection and sorting of extracellularly recorded action potentials in human medial temporal lobe recordings, in vivo,'' Journal of Neuroscience Methods, vol. 154, no. 12, pp. 204 -- 224, 2006. [Online]: http://www.sciencedirect.com/science/article/pii/S0165027006000033
[^150]: F.Franke, M.Natora, C.Boucsein, M.Munk, and K.Obermayer, ''\BIBforeignlanguageEnglishAn online spike detection and spike classification algorithm capable of instantaneous resolution of overlapping spikes,'' \BIBforeignlanguageEnglishJournal of Computational Neuroscience, vol.29, no. 1-2, pp. 127--148, 2010. [Online]: http://dx.doi.org/10.1007/s10827-009-0163-5
[^151]: M.S. Chae, Z.Yang, M.Yuce, L.Hoang, and W.Liu, ''A 128-channel 6 mw wireless neural recording ic with spike feature extraction and uwb transmitter,'' IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol.17, no.4, pp. 312--321, August 2009. [Online]: http://dx.doi.org/10.1109/TNSRE.2009.2021607
[^152]: P.H. Thakur, H.Lu, S.S. Hsiao, and K.O. Johnson, ''Automated optimal detection and classification of neural action potentials in extra-cellular recordings,'' Journal of Neuroscience Methods, vol. 162, no. 12, pp. 364 -- 376, 2007. [Online]: ttp://www.sciencedirect.com/science/article/pii/S0165027007000477
[^153]: J.Zhang, Y.Suo, S.Mitra, S.Chin, S.Hsiao, R.Yazicioglu, T.Tran, and R.Etienne-Cummings, ''An efficient and compact compressed sensing microsystem for implantable neural recordings,'' IEEE Transactions on Biomedical Circuits and Systems, vol.8, no.4, pp. 485--496, August 2014. [Online]: http://dx.doi.org/10.1109/TBCAS.2013.2284254
[^154]: Y.Suo, J.Zhang, T.Xiong, P.S. Chin, R.Etienne-Cummings, and T.D. Tran, ''Energy-efficient multi-mode compressed sensing system for implantable neural recordings,'' IEEE Transactions on Biomedical Circuits and Systems, vol.8, no.5, pp. 648--659, October 2014. [Online]: http://dx.doi.org/10.1109/TBCAS.2014.2359180
[^155]: B.Yu, T.Mak, X.Li, F.Xia, A.Yakovlev, Y.Sun, and C.S. Poon, ''Real-time fpga-based multichannel spike sorting using hebbian eigenfilters,'' IEEE Transactions on Emerging and Selected Topics in Circuits and Systems, vol.1, no.4, pp. 502--515, December 2011. [Online]: http://dx.doi.org/10.1109/JETCAS.2012.2183430
[^156]: V.Ventura, ''Automatic spike sorting using tuning information,'' Neural computation, vol.21, no.9, pp. 2466--2501, September 2009. [Online]: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4167425/
[^157]: D.Y. Barsakcioglu, A.Eftekhar, and T.G. Constandinou, ''Design optimisation of front-end neural interfaces for spike sorting systems,'' in IEEE Proceedings of the International Symposium on Circuits and Systems, May 2013, pp. 2501--2504. [Online]: http://dx.doi.org/10.1109/ISCAS.2013.6572387
[^158]: A.M. Sodagar, K.D. Wise, and K.Najafi, ''A fully integrated mixed-signal neural processor for implantable multichannel cortical recording,'' IEEE Transactions on Biomedical Engineering, vol.54, no.6, pp. 1075--1088, June 2007. [Online]: http://dx.doi.org/10.1109/TBME.2007.894986
[^159]: Y.Xin, W.X. Li, R.C. Cheung, R.H. Chan, H.Yan, D.Song, and T.W. Berger, ''An fpga based scalable architecture of a stochastic state point process filter (ssppf) to track the nonlinear dynamics underlying neural spiking,'' Microelectronics Journal, vol.45, no.6, pp. 690 -- 701, June 2014. [Online]: http://www.sciencedirect.com/science/article/pii/S0026269214000913
[^160]: C.Qian, J.Shi, J.Parramon, and E.Sánchez-Sinencio, ''A low-power configurable neural recording system for epileptic seizure detection,'' IEEE Transactions on Biomedical Circuits and Systems, vol.7, no.4, pp. 499--512, August 2013. [Online]: http://dx.doi.org/10.1109/TBCAS.2012.2228857
[^161]: K.C. Chun, P.Jain, J.H. Lee, and C.H. Kim, ''A 3t gain cell embedded dram utilizing preferential boosting for high density and low power on-die caches,'' IEEE Journal of Solid-State Circuits, vol.46, no.6, pp. 1495--1505, June 2011. [Online]: http://dx.doi.org/10.1109/JSSC.2011.2128150
[^162]: R.E. Matick and S.E. Schuster, ''Logic-based edram: Origins and rationale for use,'' IBM Journal of Research AND Development, vol.49, no.1, pp. 145--165, January 2005. [Online]: http://dx.doi.org/10.1147/rd.491.0145
[^163]: R.Nair, ''Evolution of memory architecture,'' Proceedings of the IEEE, vol. 103, no.8, pp. 1331--1345, August 2015. [Online]: http://dx.doi.org/10.1109/JPROC.2015.2435018
[^164]: C.E. Molnar and I.W. Jones, ''Simple circuits that work for complicated reasons,'' in IEEE Proceedings of the International Symposium on Advanced Research in Asynchronous Circuits and Systems, 2000, pp. 138--149. [Online]: http://dx.doi.org/10.1109/ASYNC.2000.836995
[^165]: ibitemBN-formH.Schorr, ''Computer-aided digital system design and analysis using a register transfer language,'' IEEE Transactions on Electronic Computers, vol. EC-13, no.6, pp. 730--737, December 1964. [Online]: http://dx.doi.org/10.1109/PGEC.1964.263907
[^166]: D.Wang, A.Rajendiran, S.Ananthanarayanan, H.Patel, M.Tripunitara, and S.Garg, ''Reliable computing with ultra-reduced instruction set coprocessors,'' IEEE Micro, vol.34, no.6, pp. 86--94, November 2014. [Online]: http://dx.doi.org/10.1109/MM.2013.130
[^167]: ''Msp430g2x53 mixed signal microcontroller - data sheet,'' Texas Instruments Incorporated, Dallas, Texas, pp. 403--413, May 2013. [Online]: http://www.ti.com/lit/ds/symlink/msp430g2553.pdf
[^168]: F.L. Yuan, C.C. Wang, T.H. Yu, and D.Marković, ''A multi-granularity fpga with hierarchical interconnects for efficient and flexible mobile computing,'' IEEE Journal of Solid-State Circuits, vol.50, no.1, pp. 137--149, January 2015. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2372034
[^169]: B.Vigraham, J.Kuppambatti, and P.R. Kinget, ''Switched-mode operational amplifiers and their application to continuous-time filters in nanoscale cmos,'' IEEE Journal of Solid-State Circuits, vol.49, no.12, pp. 2758--2772, December 2014. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2354641
[^170]: Y.Tsividis, ''Event-driven data acquisition and continuous-time digital signal processing,'' in IEEE Proceedings of the Custom Integrated Circuits Conference, September 2010, pp. 1--8. [Online]: http://dx.doi.org/10.1109/CICC.2010.5617618
[^171]: I.Lee, D.Sylvester, and D.Blaauw, ''A constant energy-per-cycle ring oscillator over a wide frequency range for wireless sensor nodes,'' IEEE Journal of Solid-State Circuits, vol.51, no.3, pp. 697--711, March 2016. [Online]: http://dx.doi.org/10.1109/JSSC.2016.2517133
[^172]: B.Drost, M.Talegaonkar, and P.K. Hanumolu, ''Analog filter design using ring oscillator integrators,'' IEEE Journal of Solid-State Circuits, vol.47, no.12, pp. 3120--3129, December 2012. [Online]: http://dx.doi.org/10.1109/JSSC.2012.2225738
[^173]: V.Unnikrishnan and M.Vesterbacka, ''Time-mode analog-to-digital conversion using standard cells,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.61, no.12, pp. 3348--3357, December 2014. [Online]: http://dx.doi.org/10.1109/TCSI.2014.2340551
[^174]: K.Yang, D.Blaauw, and D.Sylvester, ''An all-digital edge racing true random number generator robust against pvt variations,'' IEEE Journal of Solid-State Circuits, vol.51, no.4, pp. 1022--1031, April 2016. [Online]: http://dx.doi.org/10.1109/JSSC.2016.2519383
[^175]: ibitem0.5V-CircuitS.Chatterjee, Y.Tsividis, and P.Kinget, ''0.5-v analog circuit techniques and their application in ota and filter design,'' IEEE Journal of Solid-State Circuits, vol.40, no.12, pp. 2373--2387, December 2005. [Online]: http://dx.doi.org/10.1109/JSSC.2005.856280
[^176]: M.Alioto, ''Understanding dc behavior of subthreshold cmos logic through closed-form analysis,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.57, no.7, pp. 1597--1607, July 2010. [Online]: http://dx.doi.org/10.1109/TCSI.2009.2034233
[^177]: A.Hajimiri and T.Lee, ''A general theory of phase noise in electrical oscillators,'' IEEE Journal of Solid-State Circuits, vol.33, no.2, pp. 179--194, February 1998. [Online]: http://dx.doi.org/10.1109/4.658619
[^178]: A.Demir, A.Mehrotra, and J.Roychowdhury, ''Phase noise in oscillators: a unifying theory and numerical methods for characterization,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.47, no.5, pp. 655--674, May 2000. [Online]: http://dx.doi.org/10.1109/81.847872
[^179]: A.Hajimiri, S.Limotyrakis, and T.Lee, ''Phase noise in multi-gigahertz cmos ring oscillators,'' in IEEE Proceedings of the Custom Integrated Circuits Conference, May 1998, pp. 49--52. [Online]: http://dx.doi.org/10.1109/CICC.1998.694905
[^180]: W.Jiang, V.Hokhikyan, H.Chandrakumar, V.Karkare, and D.Markovic, ''A ±50mv linear-input-range vco-based neural-recording front-end with digital nonlinearity correction,'' in IEEE Proceedings of the International Solid-State Circuits Conference, January 2016, pp. 484--485. [Online]: http://dx.doi.org/10.1109/ISSCC.2016.7418118
[^181]: C.Weltin-Wu and Y.Tsividis, ''An event-driven clockless level-crossing adc with signal-dependent adaptive resolution,'' IEEE Journal of Solid-State Circuits, vol.48, no.9, pp. 2180--2190, September 2013. [Online]: http://dx.doi.org/10.1109/JSSC.2013.2262738
[^182]: H.Y. Yang and R.Sarpeshkar, ''A bio-inspired ultra-energy-efficient analog-to-digital converter for biomedical applications,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.53, no.11, pp. 2349--2356, November 2006. [Online]: http://dx.doi.org/10.1109/TCSI.2006.884463
[^183]: F.Corradi and G.Indiveri, ''A neuromorphic event-based neural recording system for smart brain-machine-interfaces,'' IEEE Transactions on Biomedical Circuits and Systems, vol.9, no.5, pp. 699--709, October 2015. [Online]: http://dx.doi.org/10.1109/TBCAS.2015.2479256
[^184]: K.A. Ng and Y.P. Xu, ''A compact, low input capacitance neural recording amplifier,'' IEEE Transactions on Biomedical Circuits and Systems, vol.7, no.5, pp. 610--620, October 2013. [Online]: http://dx.doi.org/10.1109/TBCAS.2013.2280066
[^185]: J.Agustin and M.Lopez-Vallejo, ''An in-depth analysis of ring oscillators: Exploiting their configurable duty-cycle,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.62, no.10, pp. 2485--2494, October 2015. [Online]: http://dx.doi.org/10.1109/TCSI.2015.2476300
[^186]: K.Ng and Y.P. Xu, ''A compact, low input capacitance neural recording amplifier,'' IEEE Transactions on Biomedical Circuits and Systems, vol.7, no.5, pp. 610--620, October 2013. [Online]: http://dx.doi.org/10.1109/TBCAS.2013.2280066
[^187]: M.Elia, L.B. Leene, and T.G. Constandinou, ''Continuous-time micropower interface for neural recording applications,'' in IEEE Proceedings of the International Symposium on Circuits and Systems, May 2016.
[^188]: Y.W. Li, K.L. Shepard, and Y.P. Tsividis, ''A continuous-time programmable digital fir filter,'' IEEE Journal of Solid-State Circuits, vol.41, no.11, pp. 2512--2520, November 2006. [Online]: http://dx.doi.org/10.1109/JSSC.2006.883314
[^189]: B.Schell and Y.Tsividis, ''A continuous-time adc/dsp/dac system with no clock and with activity-dependent power dissipation,'' IEEE Journal of Solid-State Circuits, vol.43, no.11, pp. 2472--2481, November 2008. [Online]: http://dx.doi.org/10.1109/JSSC.2008.2005456
[^190]: S.Aouini, K.Chuai, and G.W. Roberts, ''Anti-imaging time-mode filter design using a pll structure with transfer function dft,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.59, no.1, pp. 66--79, January 2012. [Online]: http://dx.doi.org/10.1109/TCSI.2011.2161411
[^191]: X.Xing and G.G.E. Gielen, ''A 42 fj/step-fom two-step vco-based delta-sigma adc in 40 nm cmos,'' IEEE Journal of Solid-State Circuits, vol.50, no.3, pp. 714--723, March 2015. [Online]: http://dx.doi.org/10.1109/JSSC.2015.2393814
[^192]: K.Reddy, S.Rao, R.Inti, B.Young, A.Elshazly, M.Talegaonkar, and P.K. Hanumolu, ''A 16-mw 78-db sndr 10-mhz bw ct $\delta \sigma$ adc using residue-cancelling vco-based quantizer,'' IEEE Journal of Solid-State Circuits, vol.47, no.12, pp. 2916--2927, December 2012. [Online]: http://dx.doi.org/10.1109/JSSC.2012.2218062
[^193]: J.Daniels, W.Dehaene, M.S.J. Steyaert, and A.Wiesbauer, ''A/d conversion using asynchronous delta-sigma modulation and time-to-digital conversion,'' IEEE Transactions on Circuits and Systems---Part I: Fundamental Theory and Applications, vol.57, no.9, pp. 2404--2412, September 2010. [Online]: http://dx.doi.org/10.1109/TCSI.2010.2043169
[^194]: F.M. Yaul and A.P. Chandrakasan, ''A sub-$\mu$w 36nv/$√Hz$ chopper amplifier for sensors using a noise-efficient inverter-based 0.2v-supply input stage,'' in IEEE Proceedings of the International Solid-State Circuits Conference, January 2016, pp. 94--95. [Online]: http://dx.doi.org/10.1109/ISSCC.2016.7417923
[^195]: S.Patil, A.Ratiu, D.Morche, and Y.Tsividis, ''A 3-10 fj/conv-step error-shaping alias-free continuous-time adc,'' IEEE Journal of Solid-State Circuits, vol.51, no.4, pp. 908--918, April 2016. [Online]: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7433385&isnumber=7446371
[^196]: J.M. Duarte-Carvajalino and G.Sapiro, ''Learning to sense sparse signals: Simultaneous sensing matrix and sparsifying dictionary optimization,'' IEEE Transactions on Image Processing, vol.18, no.7, pp. 1395--1408, July 2009. [Online]: http://dx.doi.org/10.1109/TIP.2009.2022459