CHAPTER III METHODS (DATA ELABORATION ALGORITHMS)

(1)

CHAPTER III

METHODS (DATA ELABORATION ALGORITHMS)

The formal scheme for the acquisition and analysis of the ENG signal for the control of prosthetic devices is composed of several modules [15] (Fıg. 3.1.):

• signal conditioning and pre-processing • feature extraction

• dimensionality reduction • pattern recognition

• offline and online learning

Fig.3.1. Formal scheme for acquisition and analysis of ENG for control of prosthetic devices.

The first module pre-processes the ENG signal in order to reduce noise artefacts and/or enhance spectral components that contain important information for data analysis. Moreover, it detects the onset of the activity and activates all the following modules. During the feature extraction phase, the measured ENG signal is processed in order to emphasize the relevant structures in the data, while rejecting noise and irrelevant data, producing the so-called “original feature set” .Sometimes a reduction of the dimensionality is needed to simplify the task of the classifier. In this case, a pattern recognition algorithm is used on the (reduced) feature set,

(2)

and the measured signal is classified into the output space . The learning modules are used to adapt the device to the ENG signals generated by the user because of its time-variant, and subject-variant characteristics.

3.1. ENG acquisition and pre-processing

A good acquisition of the ENG signal, in fact, is a prerequisite for good signal processing. Several factors are essential for the quality of ENG acquisition:

• Correct electrode positioning • Electromagnetic external noise • EMG artefacts

• Instrumentation high-frequency artefacts • Animal’s movement

• Correct stimuli application

We discuss these points and ideas of how to overcome these problems.

During the surgical implant the experimenter is not able to know a-priori if the electrode (mainly in case of intraneural electrodes), is correctly positioned (e.g. if it is close to the axons). It can be assessed only post-implant, by watching the SNR obtained. In the case of very poor SNR the electrode has to be repositioned. One new idea to overcome this problem is given in [11]. The authors propose [11] a LIFE microactuation system which would exclude the necessity to reposition the electrode surgically. In the case of cuff electrodes there are less implant problems, because it is external to nerve and by visual control it is possible to secure the positioning of the implanted electrode.

(3)

Electrodes and amplification system are like some kind of antenna for external radio-frequencies noise. In order to decrease the RF noise the Faraday shield has been used during the experimentation.

In EMG-based electrodes the EMG signal is Signal of Interest (SOI), while for ENG recordings it is not useful information (noise), and in fact it can corrupt the quality of ENG signal acquisition.

Generally, about 95% of the power spectrum of the EMG is accounted for by harmonics up to 400 Hz, and most of the remaining is electrode and equipment noise, but there have been observed components of EMG signal up to 800 Hz.

Considering that in high frequencies there is considerable amplifier introduced noise some kind of band pass filtering has to be done. Obviously, it has to take into consideration the trade off between better SNR of SOI where the noise is cut-off and the information lost by cutting off low and high frequencies. Hence, in the first place, frequency context of ENG recordings has to be examined. In order to do it, standard spectral analysis instruments have been applied:

• Discrete Fourier Transform (DFT) • Power of spectrum

• Short Time Fourier Transform- Gabor transform (STFT) • Power of Spectral Density (PSD)

The cuff signals have been divided between noise parts of signals and SOI parts of signals, and described instruments have been used in order to find the frequency band of the main power of signal of interest.

The DFT gives only information about frequency components that are present in the signal, without their temporal localization.

(4)

Given a finite –length sequence , its discrete Fourier transform (DFT) is defined as:

where L is the length of the sequence, F = 1/LTs is the frequency sampling step size. Power of spectrum is directly derived from DFT (as second power of Real part of Fourier Transform ), hence it suffers from similar limitations, but is useful because the eventual presence of EMG frequency context is clearly indicated.

In fact Parseval's theorem affirms:

where X is the discrete-time Fourier transform (DFT) of x and φ represents the angular frequency (in radians per sample) of x.

The interpretation of this form of the theorem is that the total energy contained in a waveform x(t) summed across all of time t is equal to the total energy of the waveform's Fourier Transform X(f) summed across all of its frequency components f.

Gabor transform or short-time Fourier transform (STFT). Most transforms, in their original form, assume that the signal under consideration is stationary. Generally this assumption fails in the case of ENG signal, except for short periods of time. The idea of STFT is, in fact, to apply the DFT on limited windows (periods of time), where it can be assumed that the signal is stationary. The STFT consists of a series of DTFs, indexed with respect to Ts and F (giving both time and frequencies information ):

(5)

where g[i] is the window function. The temporal sampling step size is T = K · Ts ; if K= 1, the STFT is computed at every sample in time; if K = L, the successive analysis windows do not overlap. The resolution in time and frequency is lower bounded by the time-bandwidth uncertainty principle or Heisenberg inequality:

A Gaussian window allows a balanced time resolution and frequency resolution. The STFT has, among its other useful properties, a well-developed theory and can be computed very efficiently. The main constraint is that each cell in the time–frequency plane must have identical shape (Fig.3.2.).

Fig.3.2. Tilling of STFT

In fact, as imposed by the temporal and frequency sampling steps, the time–frequency plane is divided into cells, each of which has a temporal width of T and a frequency height of F, and clearly the energy distribution of physical signals is not (in general) conveniently localized in region of fixed aspect ratio.

The PSD is one very useful instrument, which can be calculated in a few ways.

The spectral density of the signal, when multiplied by an appropriate factor, will give the power carried by the signal per unit frequency. This is known as the power spectral

(6)

density (PSD) or spectral power distribution (SPD) of the signal. The units of power spectral density are commonly expressed in watts per hertz (W/Hz)

Power spectral density (PSD) [16], describes how the power of a signal or time series is distributed with frequency. The PSD is the Fourier transform of the autocorrelation function, R(τ), of the signal if the signal can be treated as a stationary random process. This results in the formula:

which has been implemented in this data pre-processing..

Animal’s movement in the used set-up is not usually a problem because the animal is under anaesthesia, but it becomes relevant when the animal is close to wake up; it is represented by sudden EMG responses to pain stimuli.

The correct mode of stimuli application is essential for good recordings. In our experimental set-up all the stimuli have been applied ten times with a long possible duration for every type of stimuli. The stimulation levels of different repetitions has to be as constant as possible in order to facilitate the next, training part of pattern recognition.

3.2.ENG Feature Extraction

The fundamental purpose of feature extraction is to emphasize the important information in the measured signal while rejecting noise and irrelevant data.

Several researchers [12],[17] found that there is useful information in the transient burst of ENG signal, hence all the effort is directed to find optimal number and type of

(7)

features to extract from the signal. Here are presented those which have been used during the work.

3.2.1. Time Domain Features

Features in the time domain are generally quickly calculated (the advantage for the real time application), because they do not need a transformation. Some or all of these features have been widely used in research and in clinical practice.

Rectified bin value (RBI) is the most classic feature,[12],[15] extrapolated from EMG and ENG signals.

RBI= , where: if

x

k>Tb,

x

k=

x

k, otherwise

x

k=0.

Mean absolute value (MAV) is an estimate of the mean absolute value of the signal xi in a segment i that is N samples in length

Willison amplitude (WAMP) is the number of counts for each change of the ENG signal amplitude that exceeds a predefined threshold. It is given by:

with f (x) = 1 if x > threshold, 0 otherwise. It is one very utile feature in EMG elaboration but in this ENG study have not give good performance.

(8)

Variance of ENG(VAR). It is given by formula:

Zero crossing (ZC) is the number of times the waveform crosses zero. In order to reduce the noise-induced zero crossing, a threshold must be included. Given two consecutive samples xk and xk+1, ZC is incremented if:

with sgn(x)=1, if x > 0, 0 otherwise. This parameter provides a rough estimate of the properties in frequency domain. In present study plenty of threshold were tried, but without obtaining useful information from this feature.

Waveform length (WL) is the cumulative length of the waveform over the time segment. It is defined as:

where . This parameter gives a measure of waveform amplitude, frequency, and duration all in one. It has been found as one of the most informative.

AR model is used with success in EMG data elaboration. ENG signal is non stationary and nonlinear. But, in a short time interval, it can be regarded as a stationary Gaussian process. The ENG time series could be modelled as:

(9)

where n is the order of the AR model, ai are the estimate of the AR coefficients, and ek is the residual white noise. Here, we tried to use the first two coefficients of the model, but without big performance.

3.2.2. Higher order statistics(HOS)

Recently, in data elaboration of ENG[12] and EMG[17] the higher order statistics have been used with success.

They[12],[17] suggest that the statistical methods, due to their ability to separate the signal and noise subspaces, are superior. It was determined by the spectral analysis (results chapter) that the noise typically encountered with nerve-cuff electrode signals is normally (Gaussian) distributed. Therefore, third-order statistics can be applied to, ideally, completely reject the noise component.

Assuming zero-mean time-series segments, these cumulants as in [17] are:

where x(t) is the ENG signal, C is the cumulant and τ 1 ,τ 2 and τ 3 are time lags. In our study we choose those which showed the[12],[17] best performance. These are:

• Variance, as defined above. • The single delay variance:

var(x,1) = C2,x (1) = E{x(t)x(t +1)} • The kurtosis of ENG signal(fourth order cumulant):

(10)

The third order zero-lag cumulant has been tried too, but without great performance, hence confirming conclusions from [12],[17].

3.2.3.Time–Frequency Representation

Time–frequency representation can localize the energy of the signal both in time and in frequency, thus allowing a more accurate description of the physical phenomenon [15]. On the other hand, time–frequency representation (TFR) generally requires a transformation that could be computationally heavy.

Among different TFR available here have been used STFT and Wavelet denoise.

Short Time Fourier Transformation(STFT), above illustrated, has been implemented. Wavelet transformation (WT)[13] has been also used. The WT overcomes the main drawback of the STFT by varying the time–frequency aspect ratio and by producing a good frequency resolution Δf in long time windows (low frequencies) and a good time

localization Δt at high frequencies. This produces a tiling of the time–frequency plane

that is appropriate for most physical signals (Fıg. 3.3. ).

(11)

The continuous wavelet transform is defined as:

where Ψ(t) is the mother wavelet, which has the property that the set {Ψa,b} forms an

orthonormal basis in L²(ℜ) (a,b ∈ ℜ, a ≠ 0).In its discrete form, a =

a

0j and

τ =

n

·

a

0−j where n and j are integers (discrete wavelet transform, or DWT). A

common choice is

a

0 = 2 (dyadic wavelet basis), which allows great

computational efficiency. In this study the Wavelet coefficients soft-threshold de-noise has been used[13].

Power Spectral Density(PSD) as defined above, has been implemented as a feature, giving the spectral information located in time. Between different methods for estimation of PSD (e.g. Yule-Walker, Burg) here was used the classic implementation of PSD as the Fourier transform of the autocorrelation function, R(τ).

There are already some conclusions [13] about optimal length of observation window for best classification results. In the present study different window lengths have been tested: • 25 msec • 50 msec • 100msec • 150msec • 200msec

(12)

For optimal classification strategy three overlaps have been tested.

3.3. Dimensionality Reduction

The reduction of the dimensionality of the problem is generally fundamental to increasing the classification performance. This process should preserve as much of the relevant information as possible while reducing the number of dimensions. Four methods for feature reduction have been tested here:

• Principal components analysis[18] • Sequential forward selection [17] • HOS features choice

• Visibly most informative features

First two of them are classical data elaboration reduction algorithms[18],[17]; while third is choice inspired by state-of-art articles[17],[12]. Fourth feature set is chosen after the visual inspection of features extrapolated.

3.4. ENG Pattern Classification

There are several possible classification techniques. Among them, the most used are Bayesian pattern classifiers and artificial neural networks [1]. Recently, some authors have tried to use a fuzzy classifier, but other authors reported that with the appropriate representation of the signal, a linear classifier performs better than a nonlinear one [1]. Four different classifiers were tried in order to find the optimal one:

(13)

3. K-nearest neighbour classifier (KNN) 4.Artificial Neural Network (ANN)

3.4.1. Bayesian (Linear) Classifier

One of the standard statistical classification methods is the Bayes classifier. The measurement of the input vector x ∈ X ⊆ ℜN and of its response space y

∈Y={

y

₁,…,

y

K} may be considered in a probabilistic framework, and viewed as single

observation of the random variables X and Y. The a-posteriori probability that pattern x comes from class yk is given by:

These probabilities, in general, are not known but may be calculated from the a priori probabilities P(

y

k) and the conditional density function p(

x

|

y

k) using the Bayes’ theorem:

where:

Note that p(x) is the probability density function of the input space, and it remains constant for all P(

y

k|X). Bayes’ decision rule minimizes the probability of classification

(14)

error. In this study as a-priori probabilities have been used statistical frequencies of features and labels.

3.4.2. Parzen classifier

Parzen classifiers estimate the densities for each category and classify a test instance by the label corresponding to the maximum posterior. The decision region for a Parzen-window classifier depends upon the choice of Parzen-window function The kernel density estimation (or Parzen window method, named after Emanuel Parzen) is a way of estimating the probability density function of a random variable. As an illustration, given some data about a sample of a population, the kernel density estimation makes it possible to extrapolate the data to the entire population. In present study has been used only in automatic way, without exploring its nature.

3.4.3. Artificial Neural Network

An artificial neural network (ANN) is a computational system inspired by the learning characteristics and structure of biological neural networks. The application of ANN could reduce the amount of user training for appropriate use of device. Moreover, a simple feed-forward neural network can assure high recognition rates. ANNs possess several attractive features that make them suitable for difficult signal processing problems: generalization and ability to learn from experience without requiring an a priori mathematical model of the underlying signal characteristics; adaptation to changing environmental conditions; and the ability to process degraded and incomplete

(15)

In the present study the back-propagation trained feed-forward neural net classifier with two hidden layers and twenty hidden neurones has been used.

As decision rule the maximum selector between states has been used.

3.4.4. KNN Classification

The suboptimal , nonlinear K-nearest neighbour classifier. In this classifier, the K nearest samples in the training data set are found, for which the majority class is determined. This algorithm can be summarized as follows [17]. Given an unknown feature vector x and a distance measure(here chosen the simple-Euclidean), then,

1. Out of N training vectors, identify the k nearest neighbours, irrespective of class label, where k is chosen to be odd.

2. Out of these k samples, identify the number of vectors, ki , that belong to the class wi , i _{=1,2,..,M . Obviously, Σ}i ki=k

3. Assign x to the class wi with the maximum number ki of samples.

3.5. Offline and online learning

ENG signal patterns differs among individuals. Moreover, electrical impedance of the electrode-nerve contact changes (caused by connective tissue growing on the contact between electrode and nerve); electrode locations are different between different implants; time variations caused by movement, different wake state of animal, and so on,

differ from individual to individual and from time to time. It is clear that the ENG processing unit should adapt itself to these changes in order to

(16)

minimize bad-discriminations. The device should “learn” how the user behaves and

adjust its internal parameters relative to the operator’s variation in real time. In the present study all the classification trainings has been performed offline with

taking the odds samples of dataset (which is in fact ENG with labels of current state of stimulation). The test phase has been done with pair samples of same dataset applied to trained classifier.