Effects of noise on neural signal transmission: analysis of some simple models

(1)

Dipartimento di Matematica

Tesi di Laurea Magistrale

EFFECTS OF NOISE ON NEURAL

SIGNAL TRANSMISSION:

analysis of some simple models

Candidata:

Claudia Cusseddu

Relatore: Prof. Marco Romito

(2)

(3)

Introduction

Among all the cells of the body, the peculiarity of neurons is to be highly specialized to transmit information. Constituting the fundamental particles of our brain and nervous system, neurons allow us to understand the external world, to move our muscles, to think and much more. Rather than at the level of a single cell, the strength of neurons emerges through their communication network, a huge ensemble of billions of synaptic connections, which permits the information to travel throughout neurons. A synapse may be thought of as a “bridge” between two neurons. A new bridge can be built or destroyed anytime, leading to a continuous evolution of the network. The more infor-mation transmitted between two neurons, the stronger the synapse, which means to be more unlikely of disappearing.

How this kind of communication works was an unsolved question for a long time. Nowadays, we know that to communicate with each other, neurons generate electro-chemical impulses, called action potentials, or spikes. A neuron sends between 5 and 50 spikes per second: this incredibly fast activity allows a quick information transmission over long distances.

When subject to external input, e.g. a signal coming as an input received by our senses, neurons encode it in specific sequences of impulses, called spike trains. We may interpret the spike trains as the “brain language”. When analyzing such sequences, the shape of the single spikes can be ignored: what does count in a spike train is the time when each spike is generated.

This could remind the idea at the base of the Poisson process, the stochas-tic process that counts the occurrences of a specific event. Indeed, in mathe-matical models spike trains can be described as Poisson processes [9]. How-ever, due to the presence of some intrinsic noise in the nervous system, each neuron still generates spikes even without any external input. One of the consequences of this is that it is hard to determine the intensity of the Pois-son process modelling a spike train, because it varies over time according to random quantities. In this case, we need to generalize the Poisson process to a stochastic process which is based on the same idea of counting the occur-rences times, but, besides, has an intensity that varies randomly. This kind

(6)

of process is called Cox process and will be presented in this work. Due to the randomness of the intensity of spike trains, it is hard to reproduce the response of a neuron to a stimulus in experiments. One then considers the average of the responses in each trial, which corresponds to averaging with respect to all the noise sources.

In the present work, we study a neural population encoding a Gaussian time-dependent signal s(t). We imagine that, besides the input, neurons are subject to a common uniform noise and independent noise processes modelled as Gaussian processes. Our goal is to estimate the amount of information transmitted about the signal s(t). We see that noise actually may have a beneficial role in the information transmission.

In particular, this dissertation is structured as follows.

• In Chapter 1 we introduce some concepts of Neuroscience, which are useful for a first approach to the subject: we first describe the anatomy and the different roles and features of neurons. We then briefly explain how neurons communicate, that is how and when action potentials are generated. We conclude this chapter with a smattering on the stochastic resonance, the phenomenon by which the addition of noise increases the amounts of information transmitted about a signal. In fact, although the word “noise” commonly suggests a negative idea, it has been shown that noise can have beneficial effects. The reason why we introduce this phenomenon is that it will be observed in the models here proposed.

• Chapter 2 is devoted to the concepts of Probability theory at the base of our models, and it is divided into two parts: the first one shows the main concepts about point processes, of which the Poisson process represents the pillar. We review the definition of both homogeneous and inhomogeneous Poisson process. Due to the way we designed the models, an inhomogeneous Poisson process, whose intensity varies ac-cording to time, is not enough: we need to randomize the intensity of this process, because such intensity depends on random processes in its turn. Hence, we introduce the doubly stochastic Poisson process, also called Cox process, named after the statistician David Cox, who first published the model in 1955. In order to define it, we first pro-vide an understanding of random measures, which are the family in which point processes – in particular, the Cox processes – lie. Several theoretical results are presented, such as the law of a random measure and the probability generating functional, which are useful to define the Cox process. The Cox process is built via conditioning: first, we introduce an underlying stochastic process – e.g. a random measure –,

(7)

and then we build an inhomogeneous Poisson process whose intensity is conditioned to the realization of the former. The second part of this Chapter focuses on random signals, providing the main concepts which are useful in the following to study the relation between the signal and the neural population response and to quantify the information encoded by the population about a signal. We recall the definition of the correlation function and we present the spectral analysis for sig-nals; we then see how Wiener-Khintchine theorem links time-domain and frequency-domain analysis of signals. Lastly, we show a way to estimate the amounts of the information carried by the neural popu-lation, via the coherence function and a lower bound on the mutual information rate, two quantities that will be introduced as well. • Chapter 3 builds on the previous chapters to present several

math-ematical models for the neural population described above. We first show two models, studied in the paper "Shifting Spike Times or Adding and Deleting Spikes" (S. Voronenko, W. Stannat, B. Lindner, 2015), in which independent noise sources influence the neural communication, as we mentioned above. In one model the independent noise adds or deletes spikes (AD model); in the other one, it shifts the spike times (STS model). The simplicity of the hypothesis in the two models allows to implement the spectral analysis and the estimation of the informa-tion analytically. It emerges that the amounts of informainforma-tion trans-mitted increases in both cases. In the AD model, this phenomenon is independent of the parameters of the model; in the STS model, it depends on the specific parameters: the noise strongly enhances the transmission of the signal in some cases, while it does not improve it in other cases. Finally, we replicate the numerical simulations presented in the paper, and we compare them with the analytical results.

We conclude presenting new models which combine the previous ones, and their respective numerical results. We first propose a natural com-bination, in which both models are applied to the same population. In the first combination, the spike times are added and deleted and then shifted. If, on the other hand, we apply the STS model and then the AD model to the same spike trains, the results are not satisfying, be-cause a large number of spikes is deleted. We hence decide to apply the AD model after the STS model according to a probability. We finally propose a third way to combine them, in which either model is applied to each spike train within the neural population with equal probability. The results are compared with those of the previous models.

(8)

(9)

Chapter 1 Foundations of Neuroscience

The goal of this chapter is to provide some basic concepts of Neuroscience, such as the structure and the activity of neurons, the cells of our brain which are fundamental for the mechanism of the whole body. The importance of neurons already emerges in the first section, where a brief description of their anatomy and their different roles is provided. We then describe, in the second section, how communication among neurons works. Such communication is always affected by some noise, which may still produce some benefits in the nervous system: this is the topic of the last section. The first two sections of this chapter are based on [16], whereas the last section is mostly based on [19].

1.1 Neurons

Neurons are nerve cells that constitute the basic building blocks of our brain and nervous system. About 86 billion neurons live in the human brain. The key property of such cells is to be highly specialized to transmit in-formation throughout the body. To communicate, they generate electrical impulses called action potentials, permitting a quick information transmis-sion over long distances. We say that the neurons fire, or spike. A typical neuron fires between 5 and 50 times every second. The structure that per-mits a neuron to pass a signal to another neuron is called synapse. The more signals sent between two neurons, the stronger the synaptic connection grows. Each neuron can form thousands of links with other neurons in this way, giving a typical brain more than 100 trillion synapses. Depending on the nature of the signals, the synapses can be both electrical and chemical. This is the extraordinary characteristic that, among all the body cells, only neurons have: they form a huge, complex and constantly changing

(10)

communi-cation network, which allows us to move our muscles, feel the external world, think, form memories and so on. Another difference with the other body cells is that neurons stop reproducing shortly after birth. Because of this, some parts of the brain have more neurons at birth than later in life because neurons die but are not replaced. Neurogenesis, that is the formation of new nerve cells, does occur in some parts of the brain throughout life. However, what really counts is the number and the strength of synaptic connections, rather than the number of neurons.

We now describe very briefly the anatomy of neurons. Like other cells, the neuron contains a central part called cell body or soma. The soma contains a nucleus that holds genetic information, and other organelles, including mitochondria, Golgi bodies, and cytoplasm, that support the life of the cell and play a major role in synthesizing protein. Each cell is surrounded by a cell membrane that protects it, and communication with other cells occurs at this level.

Unlike other body cells, neurons have rootlike extensions called dendrites, connected to the cell body, that receive input information from the outside, through numerous receptors called neurotransmitters. Dendrites can origi-nate from the cell body or other dendrites, and they create a thick network of ramifications, often called “dendritic tree”.

From the cell body, another branch originates: it is a long slender fiber called axon, that carries such information away from the cell body and forks into many axon terminals, where the neuron ends. The axon contains a substance called myelin, which is essential for electrical insulation and protection, and some intracellular structures, called microtubules, that speed up action po-tential propagation, so that the impulse travels faster along the axon. Finally, the axon terminals connect the neuron to other neurons, or directly to or-gans, through the synaptic transmissions. The more synaptic transmissions created, the stronger the link among neurons is.

Summarizing, the basic parts of a neuron are the cell body, the dendrites, the axon, and the axon terminals. Dendrites receive the input signal, while the soma processes and sends it through the axon, to the axon terminals, and towards other cells. However, we must specify that neurons are not all the same.

Although it is hard to neatly classify neurons in the nervous system, due to the complexity and the diversity of their shape, size, and all the functions they perform, it is possible to provide some kind of classification. For instance, based upon the number of processes that extend out from the cell body, we can provide a structural classification. Four major groups arise

(11)

from such classification:

• Unipolar neurons have a single, short process that extends from the cell body and then branches into two more processes that extend in opposite directions. They are very rare.

• Bipolar neurons have only two processes that extend in opposite direc-tions from the cell body, a dendrite, and an axon.

• Multipolar neurons are defined as a single axon and many dendrites and dendritic branches.

• Pseudo-unipolar neurons contain an axon that is split into two branches: central (from the cell body to spinal cord) and peripheral (from the cell body to the periphery). These axonal branches should not be confused with dendrites, since they do not have separate dendrites and an axonal process. However, one of the two branches serves a dendritic function. Based on the functions and roles of neurons within the nervous system, we can distinguish three main classes:

• Sensory neurons, or afferent neurons, carry information from the sen-sory receptor cells throughout the body toward the brain.

• Motor neurons transmit information from the central nervous system to the muscles and organs of the body, called effectors, controlling their movements.

• Interneurons, also known as relay neurons, connect the motor neurons and the sensory neurons and they are responsible for communicating and integrating information between different neurons in the body. In this work, we shall focus on sensory neurons in the peripheral nervous system, most of which are pseudo-unipolar. Although rare, bipolar neurons can be found as well, for instance in the retina of the eye and the olfactory system. Sensory neurons are activated by sensory inputs from the environ-ment, that can be physical or chemical, corresponding to all five of our senses: sound, touch, heat, or light, taste or smell. An impulse is generated at the sensory organs (ears, skin, eyes, tongue or nose) and conducted to the central nervous system via sensory receptors, organelles that convert the stimuli into internal electrical impulses, called action potentials, and sensory neurons.

(12)

Figure 1.1: Sketch of neurons’ functions and signal transmission. The sensory receptors receive an input stimulus and convert it in an electrical impulse, which starts to travel in the peripheral nervous system (PNS, the lighter side), throughout the sensory neurons, here exemplified by the blue neuron. The impulse carried by the sensory neurons arrives at the central nervous system (CNS, the darker side) and it is sent to the interneurons, here ex-emplified by the green neurons, which connect the sensory neurons to the motor neurons. Motor neurons, which are here represented by the red neu-ron, finally carry the information to the effectors, i.e. the muscles and the organs.

1.2 Action potential

We said in the previous paragraph that communication among neurons occurs through action potentials and synapses. We shall yet describe how such communication works. Roughly speaking, an action potential is a very short event that occurs when the incoming stimulation is strong enough for a neuron to decide to pass it towards other cells. It consists of a very rapid (around 2 milliseconds) change of charges between the inside and the outside of the cell. It then provides the propagation of signals along the neurons’ axons and lets the neurons communicate. We now see that in some details.

(13)

electri-cally charged particles, such as sodium and potassium, and that are located between the inside of the neuron and the outside of the neuron, specifically around the cell membrane. How these ions are distributed is important for the neuron both during the activity and at rest (when the cell is not sending a signal). Typically, for a neuron in its resting state, the concentration of sodium ions is higher outside the cell than inside, while the concentration of potassium is the opposite, with more ions inside the neuron than outside. We also need to consider the charge of the ions when thinking about their distribution across the membrane. The difference in total charge inside and outside of the cell is called the membrane potential and it is the key point of how action potential works. When a neuron is at rest, there are more positively charged ions outside the cell than inside the cell, so the membrane potential is negative; in particular, it measures around −70 milliVolts, i.e. the inside of the cell is approximately 70 milliVolts less positive than the outside. Action potentials are nothing more than a temporary shift – from negative to positive and again to negative – in the neuron’s membrane po-tential, caused by ions suddenly flowing in and out of the neuron. Indeed, the concentration of ions is not static: ions move in and out of the neuron constantly, and this lets the membrane potential change its value.

However, ions cannot simply move across the membrane at will. Most of the ions move across the membrane through some “holes” in the cell mem-brane, called ion channels. Some of them are always open, but many require signals to tell them to open or close. The so-called voltage-gated ion chan-nels are very important in this process, and they are always shut during the resting state. As ions move through a channel and cross from one side of the cell membrane to the other, they cause the membrane potential to move away from its resting potential. When cell depolarisation (i.e. a positive change in the concentration and charge of ions inside the cell) occurs, after an external stimulation, the membrane potential starts to rise. If it reaches a certain “threshold value”, which is around −55 milliVolts, the voltage-gated ion channels open, allowing a rapid inward flow of sodium ions, which pro-duces a further explosively rise of the membrane potential, till the polarity of the membrane reverses and reaches a positive value that is around +40 milliVolts. At this point, an action potential, or spike, is generated, and this activates the signal propagation. After that, the sodium channels close and the potassium channels are activated, producing an outward current of potassium ions. The membrane potential rapidly falls, and hyperpolarization occurs: the potential drops till −90 milliVolts. Finally, it is reestablished to its resting value of −70 milliVolts again. The whole process lasts about 2 milliseconds.

(14)

Figure 1.2: Schematic mechanism of an action potential. When a neuron is at rest, the membrane potential is around −70 mV. When the membrane potential reaches the “threshold value” (blue dashed line), the membrane potential explosively rises, an action potential is generated, and then the membrane potential rapidly falls, till around −90 mV. Finally, it comes back to its resting state. The whole process lasts about 2 milliseconds and it is represented by the red line. The yellow lines show some failed initiations when the membrane potential starts to rise but does not reach the threshold value.

Whatever is the stimulus received, neurons always communicate in a stan-dard way, that is through sequences of action potentials. We can think that these sequences constitute the “brain language”.

Although the impulses vary in amplitude, duration and shape, we shall look at them as if they were indistinguishable: this is known as the “all-or-none law”, according to which the strength with which a neuron responds to a stimulus is independent of the strength of the stimulus; either the stimulus exceeds a determined “threshold value”, and in this case the neuron responds with an impulse, or it does not reach such value and the neuron does not respond, that is no impulse is generated. To get a satisfying knowledge about the transmitted information, we need to assume we can ignore amplitude, du-ration and shape of impulses, and that we can focus only on the total amount of the impulses within a time interval and on the instants in which they are

(15)

generated. The sequences of such times are called spike trains. The impulses constituting a spike train are fired not only in response to a stimulus but also during the spontaneous activity, that is with no external stimuli. Moreover, it has been observed in experiments that, when repeating trials with a fixed stimulus, the response of the neurons is not perfectly reproducible, but ex-hibits trial-to-trial variability ([25]). Such variability is typically interpreted in terms of “noise”. Noise processes can have various origins and affect the transmission of signals in neurons. We shall not study why all this happens, we rather study how to mathematically describe this phenomenon. More-over, the noise has not necessarily be interpreted with a negative idea. In the next section, we present a short overview of some benefits of noise in the nervous system, while in the next chapter we shall look for a suitable model that mathematically represents the neurons’ activity under the influence of noise sources.

1.3 Stochastic resonance

We conclude this chapter with a very brief overview of some benefits of noise in the nervous system. By “noise”, we mean random disturbances of signals, that produce some variability quantitatively measured through the changes in spike timing. Although this word commonly suggests a detrimen-tal variability, it has been showed that it can be also beneficial. Noise can have various origins([25])). Our goal is to underline that noise sources can have positive effects on neural signal transmission.

For instance, it has been shown that the presence of noise can enhance the transmission of weak signals in neurons. By weak signal, we refer to subthreshold signals, i.e. those signals not reaching a certain threshold level. An example of a subthreshold signal is an external stimulation for a neu-ron that is under the already mentioned threshold value of −55 milliVolts. By adding a random noise it can happen that the neuron still detects such a signal.

This effect, where the addition of a certain level of random noise improves a neuron’s ability to detect and transmit weak signals, is known as stochastic resonance (SR). At low noise levels, the sensory signal does not cause the sys-tem to cross the threshold and few signals are detected. On the other hand, for large noise levels, the response is dominated by the noise. Intermediate noise levels allow the signal to reach the threshold without being covered by the noise. On a single neuron level, stochastic resonance can occur only in the presence of subthreshold signals, so supra-threshold signals general do not give rise to stochastic resonance. However, at the level of the neural

(16)

population, noise can maximize the output performance even with strong (supra-threshold) signals. This phenomenon is a particular form of stochas-tic resonance and it is known as supra-threshold stochasstochas-tic resonance (SSR). The supra-threshold stochastic resonance differs from the stochastic reso-nance because it does not require the input signal to be below a “threshold level” and it does not disappear when the signal is no longer “subthreshold”. Noise not only impacts the total transmitted information, but it also affects which frequencies of the sensory signal are preferably encoded by a neural system. The suppression of information about the input signal in certain frequency bands can be regarded as a form of information filtering: we may ask whether the neural system is preferentially encoding slow (low-frequency) components of a signal or fast (high-(low-frequency) components of a signal.

We do not further deepen why and how these mechanisms work. We are rather interested in how mathematically quantify them. Supra-threshold stochastic resonance is commonly described through the mutual information between the input and the output, a quantity that we will introduce in the next chapter. The information filtering can be studied through the coherence function, which is related to the former and which will be introduced in the next chapter as well.

We will study different models in which independent noise sources shape signal information transmission within a neural population and we will show that supra-threshold stochastic resonance can be observed.

(17)

Chapter 2 Point processes and random

signals: basic concepts

This chapter is organised as follows: the first section introduces the theory of the point processes, which are often used to describe the sequences of spike trains. A well-known example of this kind of processes is the Poisson process, from which we start. We then generalize the process to the Cox process, which is a generalization of the former. In order to talk about Cox processes, we first need to introduce some concepts about random measures and their probability law, focusing on the case of point processes, that are a special case of random measures.

In the second section, we introduce the main concepts of random sig-nals: they constitute, for instance, the mathematical representation of noise sources and neural stimuli. We start from the definition and the main fea-tures of a random signal, such as stationarity and ergodicity. We then define the autocorrelation function of a signal and the cross-correlation between two signals, in order to briefly present the spectral analysis of a signal and the Wiener-Khintchine theorem. Lastly, we talk about quantifying the informa-tion about a signal, and the coherence funcinforma-tion is defined.

The part about the theory of the point processes is mainly based on [7, 8], while the concepts introduced about the random signals can be found more in detail in [18, 20].

2.1 Point processes

As we introduced in the previous chapter, neurons communicate ran-domly, through action potentials, and we are only interested in their occur-rence times rather than their form, amplitude and duration. In

(18)

probabil-ity theory, phenomena characterised by events positions or their occurrence times – such as the spike trains – are described by a special class of stochas-tic processes, called point processes. Heurisstochas-tically, a point process is a tool we use to randomly allocate points on a state-space. In the following we introduce the theory of point processes, beginning from the simplest case, the Poisson process.

2.1.1 Poisson process

The archetype and the simplest point process is the well-known Poisson process, the definition of which we recall, together with some basic features. Recall that a random variable X is said to follow the Poisson distribution with parameter α if it assumes non-negative integer values only and if

P (X = k) = α

k_e−α

k! , k = 0, 1, . . .

We can easily calculate the expected value E[X] = α and the variance V ar(X) = α.

Definition 2.1. A random process (Nt)t≥0 is called (homogeneous) Poisson

process with intensity λ > 0 if: 1. N0 = 0;

2. for t ≥ s ≥ 0, Nt− Ns is a Poisson r.v. with parameter λ(t − s);

3. for each n ≥ 1 e 0 ≤ t0, . . . , tn, the increments Ntn− Ntn−1, . . . , Nt1−Nt0

are independent;

We recall that the Poisson process can be equivalently described by means of the sum of independent and identically distributed random variables with exponential distribution: let us consider events that occur with random dis-tances τj one from each other, where the τj are i.i.d. r.v.’s and τ1 ∼ exp(λ),

that is P(τ1 ≤ t) = 1 − e−λt Θ(t), with

Θ(t) = (

0 if t < 0 1 if t ≥ 0 being the Heaviside function.

Let us introduce the sequence of times in which events occur: T0 = 0 Tn=

n

X

j=1

(19)

Figure 2.1: Sequences of interarrival times τj and arrival times Tn.

For each n, being Tn the sum of independent exponential r.v.’s, it has an

Erlang distribution. Fixed an instant t, we inquire how many events occurred in the interval time [0, t]. We call this quantity Nt. The equivalence between

the following events

Nt ≤ k ⇐⇒ Tk+1 > t

implies that Nt is a Poisson r.v. with parameter λt.

If we observe this random variable over time t, that is as a random process, we can define the Poisson process (Nt)t≥0 with intensity λ > 0. This process

counts the jumps number by the time t, starting from N0 = 0 at t = 0 and

increasing its value by 1 at each Tn.

Hence, the process can be written as the sum of Heaviside functions evaluated at t − Tn: Nt= ∞ X n=1 Θ(t − Tn).

Considering the feature of spike trains, we could think to provide a mathe-matic description of them using a Poisson process. Unfortunately, the (ho-mogeneous) Poisson process is a very particular case of point process, and when a neuron receives an external stimulus, the condition we require for the intensity to be constant in time is very limiting. Therefore, we sponta-neously generalize to the inhomogeneous case, in which the process intensity λ is replaced by a function of time λ(t): in the definition 2.1, we just replace the condition 2 by

2’. for t ≥ s ≥ 0, Nt− Ns is a Poisson r.v. with parameter

Rt

s λ(u)du.

It has been observed ([1]) that even such a model, based on the inhomoge-neous Poisson process – although very appreciated and often used because of its simplicity – is not always acceptable for spike trains. An increasingly common view in the literature is to model a spike train by means of a Pois-son process whose intensity is, in its turn, a stochastic process. The point process obtained randomizing the intensity of a Poisson process is called dou-bly stochastic Poisson process, or more briefly, the Cox process. In order to

(20)

define it formally, we first give a brief introduction to the theory of random measures and point processes in general.

2.1.2 Random measures

Let X be a Polish space (i.e. a complete separable metric space) and let BX = B(X ) be the σ-field of its Borel sets.

Definition 2.2. A Borel measure µ on a Polish space X is called boundedly finite if µ(B) < ∞ for each bounded Borel set B.

A counting measure N on the space X is a boundedly finite measure that assumes only integer values.

A counting measure N is simple if

N {x} = 0 or 1 for each x ∈ X .

We denote with MX the space of all boundedly finite measures on the space

X , with NX the space of all counting measures and with eNX the space of all

simple counting measures.

We can define natural distances on the spaces MX and NX, so that they

can be interpreted as Polish space in their own right. Moreover, the Borel σ-field B(MX) is the smallest σ-field with respect to which the mappings

ΦA : MX → R given by

ΦA(µ) = µ(A)

are measurable for all bounded Borel sets A ∈ BX. Analogously, the Borel

σ-field B(NX) is the smallest σ-field with respect to which the mappings

N → N (A) are measurable for all bounded Borel sets A ∈ BX. All these

facts allow us to define random measures and point processes as measurable mappings on the spaces MX and NX, and we refer to [8] for a comprehensive

proof.

Definition 2.3. A random measure ζ with state-space X is a measurable mapping from a probability space (Ω, F , P) into the metric space MX, B(MX).

A point process N with state-space X is a measurable mapping from a probability space (Ω, F , P) into the metric space NX, B(NX). The point

process is called simple when

(21)

Therefore, a random measure is a particular case of stochastic process, indexed on the Borel σ-field BX of X , with values on the space MX of the

boundedly finite measures, and a point process is a particular case of the former, taking only integer values.

A realization ζ(·) of a random measure ζ is a boundedly finite measure on X for a fixed ω ∈ Ω, and it takes the value ζ(A, ω) on the set A ∈ BX

(similarly N (A) for a point process N ). If we fix A, ζ(A, ·) is a function from Ω into R+, and this suggests the following proposition.

Proposition 2.1.1. Let ζ be a mapping from a probability space (Ω, F , P) into MX. Then ζ is a random measure if and only if for each fixed A ∈ BX,

ζ(A, ·) is a random variable.

Proof. Let A be the σ-field of subsets of MX whose inverse images under ζ

belong to F , and define

ΦA: MX → R+

ζ(·, ω) 7→ ζA(ω) := ζ(A, ω)

such that, for any B ∈ B_R+, we have

ζ−1 Φ−1_A (B) = (ζA)−1(B).

If ζA ≡ ζ(A, ·) is a random variable, then (ζA)−1(B) ∈ F and by definition

Φ−1_A (B) ∈ A. Since B(MX) is the smallest σ-field with respect to which the

mappings

φA(µ) = µ(A)

are measurable, B(MX) ⊆ A, and thus ζ is a random measure. Conversely,

if ζ is a random measure, ζ−1 Φ−1_A (B) ∈ F because Φ−1_A (B) ∈ B(MX) by

definition, thus ζA is a measurable mapping from Ω into R+, i.e. a random

variable.

Corollary 2.1.2. Let N be a mapping from a probability space (Ω, F , P) into NX. Then N is a point process if and only if for each fixed A ∈ BX,

N (A, ·) is a random variable.

Now we shall present an useful result for random measures, from which it follows an important characterization of the point processes.

Definition 2.4. Let X be a Polish space. Given a random measure ζ with state-space X , we say that x0 ∈ X is an atom of ζ if

Pζ {x0} > 0 > 0.

If a random measure has only atoms is called purely atomic, whereas if it has no atoms is called diffuse.

(22)

Every random measure ζ ∈ MX can be decomposed into atomic and

diffuse components. To see this, we first recall the Dirac measure δx,

δx(A) = 1A(x) =

(

1 if x ∈ A 0 if x /∈ A

Proposition 2.1.3. Given a random measure ζ ∈ MX, the set of its atoms

is countable at most.

Moreover, denoting the set of its atoms with D = {xk}k=1,2,..., the random

measure ζ admits the following decomposition:

ζ = ζd+ ζa:= ζd+ ∞

X

k=1

ζ {xk}δxk,

where ζd is a diffuse measure and ζa := P ζ {xk}δxk is a purely atomic

measure.

Proof. We just need to prove that the set D of the atoms for a random measure ζ is countably infinite at most. Indeed, in this case, we can write D = {xk}k=1,2,..., and define for any Borel set A the function

ζd(A) = ζ(A) −

X

xk∈D

ζ {xk}δxk(A).

ζd is positive and it is σ-additive in A because so ζ is, and for each such A it

defines a random variable and thus a random measure as well, that has no atoms by construction.

Let us prove that D has countably many elements. Suppose, by contra-diction, that D is uncountable. Since X can be covered by the union of at most countably many bounded Borel sets, there exists a bounded set ¯A that contains uncountably many atoms.

Define the subset

D :=x : P ζ(x) > > ⊆ D ∩ ¯A.

By monotonicity, D ∩ ¯A = lim↓0D.

If D is finite for each > 0, we know that D ∩ ¯A is countable, thus it exists a

¯

> 0 such that D¯ is infinite. We can extract from D¯an infinite sequence of

(23)

probability P (En) > ¯. Because ζ is boundedly finite,

0 = P (ζ(A) = ∞)

≥ P (ζ(x) > ¯ for infinitely many x ∈ D¯)

≥ P (infinitely many En occur)

= P ∞ \ n=1 ∞ [ k=n Ek ! = lim n→∞P ∞ [ k=n Ek ! ≥ ¯ > 0, yielding a contradiction.

Proposition 2.1.4. An element N ∈ MX belongs to NX if and only if it

can be written as

N =X

i

κiδxi, (2.1)

where all κi are positive integers and {xi}i is a countable set with at most

finitely many xi in any bounded Borel set.

Proof. If the (2.1) holds, the measure N belongs to NX.

Conversely, if N is integer-valued, any atom of N must have positive mass by definition, and since N is boundedly finite, there can be at most countably many atoms, because we can cover X by at most countably many bounded sets, each of them containing finitely many atoms at most. Thus, it is enough to show that the measure N does not have a diffuse component.

Let y be arbitrary in X and let (j)j=1,2,...be a decreasing sequence of positive

real numbers, such that j → 0 and Bj(y) ↓ {y} for j → ∞, with Bj(y)

being the spheres of center y and radius j. Then by continuity

N {y} = lim

j→∞N Bj(y).

N Bj(y) is a non-negative integer for each j, and so is N {y} too. Thus,

if y is not an atom of N , it must be the limit of a sequence of spheres Bj(y)

such that N Bj(y) = 0 and in particular it must be the centre of a sphere

with this property. We proved that the support of N has exclusively atoms of N , namely N is purely atomic.

We now move to the study of the probability distribution of the random measures and point processes.

(24)

The law of a random measure

As well as stochastic processes in general, the law of a random measure ζ, i.e. the probability measure it induces on the space MX, B(MX), is

completely determined by its finite-dimensional distributions. These are the joint distributions, for all finite families of bounded Borel sets A1, . . . , An of

the random variables ζ(A1), . . . , ζ(An):

Fn(A1, . . . , An; x1, . . . , xn) = Pζ(Ai) ≤ xi, i = 1, . . . , n

(2.2) As in the general theory of stochastic processes, there exist necessary and sufficient conditions on a set of finite-dimensional distributions (2.2) that en-sure that they are the finite-dimensional distributions of a random meaen-sure. We need the family of finite-dimensional distributions (2.2) to be consis-tent. In this situation, the concept of consistency requires not only the usual conditions of the Kolmogorov existence theorem (invariance under index per-mutations and consistency of marginals), but also conditions that ensure that the realizations are measures, that is additivity and continuity:

ζ(A1 ∪ A2) = ζ(A1) + ζ(A2) a.s.

for each couple of disjoint Borel sets A1, A2, and

ζ(An) → 0 a.s.

for each sequence of Borel sets {An}n decreasing to ∅.

Consider a set functions p(A1, . . . , An; r1, . . . , rn), where n, r1, . . . , rn are

non-negative integers and the Ai, i = 1, . . . , n are Borel sets. If the

fam-ily p(·, ·) satisfies the four consistency conditions, then there is a unique probability measure P defined on the σ-field generated by the cylinder sets N(·); N (A1) = r1, . . . , N (An) = rn for which

P N (·); N (A1) = r1, . . . , N (An) = rn = p(A1, . . . , An; r1, . . . , rn).

We do not enhance this result, and we shall focus instead on a char-acterization of the law of a point process, exploiting their property to be non-negative integer-valued.

The probability generating functional

As well as the law of non-negative integer-valued random variables can be uniquely determined by the probability generating function, the law of point processes can be uniquely determined by the probability generating functional, which is a heuristic generalization of the former and we shall introduce shortly.

(25)

Definition 2.5. Given a discrete random variable X with non-negative in-teger values, the probability generating function of X is the function

gX(z) = EzX = +∞

X

k=0

zkp(k),

where p(k) := P(X = k) is the probability mass function of X.

It is known that the series absolutely converges at least for any z ∈ C with |z| ≤ 1.

If X = (X1, . . . , Xd) is a d-dimensional discrete random variable, taking

values in {0, 1, . . . }d_{, the probability generating function of X is expressed}

as gX(z1, . . . , zd) = Ez1X1· · · z Xd d = ∞ X k1,...,kd=0 zk1 1 · · · z kd d p(k1, . . . , kd), (2.3)

where p(k1, . . . , kd) is the probability mass function of X. We know that the

series converges at least for any z = (z1, . . . , zd) ∈ Cdwith max{|z1|, . . . , |zd|} ≤ 1.

As an example, take a Poisson random variable X with parameter λ. Its probability generating function is

gX(z) = +∞ X k=0 zke −λ_λk k! = e −λ +∞ X k=0 (λz)k k! = e λ(z−1)_.

What follows is a fundamental result for the probability generating func-tion.

Proposition 2.1.5. Given two random variables X e Y ,

gX(z) = gY(z) for each z ⇐⇒ X e Y have the same law.

Proof. We can prove the statement for the unidimensional case only, since the multidimensional case is an immediate generalization of the former. The ⇐= ) part is trivial. Now, since the radius of convergence of the probability generating function is ≥ 1, the latter admits a unique expansion in power series around zero:

gX(z) = +∞ X k=0 zk_{P(X = k) =} +∞ X k=0 zkg (k) X (0) k! gY(z) = +∞ X k=0 zk_{P(Y = k) =} +∞ X k=0 zkg (k) Y (0) k!

(26)

We aim to generalize this concept to find an analogue characterization for the law of point processes.

Consider first a generalization of the Poisson process in definition 2.1, defined on a Polish space X . Let µ(·) ∈ MX be a boundedly finite measure

defined for any Borel set A ∈ BX. We can then define the law of the general

Poisson process N through the following finite-dimensional distributions: P N (Ai) = ki, i = 1, . . . , n = n Y i=1 µ(Ai)ki ki! e−µ(Ai)_, _(2.4)

for any finite family {Ai, i = 1, . . . , n} of bounded and mutually disjoint Borel

sets. The measure µ(·) is called parameter measure of the process. Observe that, if X = R, from (2.4) the general Poisson process can be reduced both to the homogeneous Poisson process and inhomogeneous Poisson process. The former with intensity λ, if we consider as a measure µ(A) = λm(A), while the latter with intensity λ(t) if we consider as a measure R_Aλ(u)du, where m(A) denotes the Lebesgue measure of the set A.

We can easily calculate the probability generating function for the law of the general Poisson process, by applying (2.3) and (2.4):

g(A1, . . . , An; z1, . . . , zn) = exp − n X j=1 (1 − zj)µ(Aj) ! , (2.5)

where the sets A1, . . . , An are mutually disjoint and µ(·) is the parameter

measure of the process.

Now, let ζ be a random measure with state-space the Polish space X , and let f be a Borel function defined on the same space X . Suppose that f is bounded and has bounded support. Then there exists finite the integral

ζf :=

Z

X

f (x)ζ(dx),

since every realization of ζ is boundedly finite by definition. Moreover, such an integral is a random variable: consider first f as the indicator function on a bounded Borel set, then apply the usual approximation results based on linear combinations and monotone convergence theorem. If we consider a point process N as the random measure with respect to which we integrate, then the integral is uniquely defined as the sumP

if (xi), where the sum is

taken over the points in which N realizes.

At last, we denote by V(X ) the class of Borel functions h : X → R defined on the space X and real-valued, such that 0 ≤ h(x) ≤ 1 for any x ∈ X and 1 − h has bounded support.

(27)

We can finally introduce the following generalization of probability gen-erating function.

Definition 2.6. The probability generating functional of a point process N on the space X is defined for any h ∈ V(X ) as

G[h] ≡ GN[h] = E exp Z X log h(x)N (dx) . (2.6)

where we take the exponential in the formula above as zero if h(x) equals zero on some set A ∈ BX, unless N (A) = 0 when we take the exponential as

one.

From the proposition 2.1.4, we know that N is almost surely finite on the bounded Borel set in which 1 − h does not vanish, thus the integral in (2.6) can be written as the finite sum of non-positive terms and we have an alternative way to define the probability generating functional:

G[h] = E " Y i h(xi) # ,

where we calculate the product over the points in which N realizes. We de-fine the product in the previous formula as zero when h(xi) = 0 for some xi

and we take the empty product to be unity.

The probability generating functional is a fundamental tool because, as well as the probability generating function, it gives all the information about the law of the process.

Theorem 2.1.6. Let G[h] be a real-valued functional, defined for all h ∈ V(X ). Then G is the probability generating functional of a point process N if and only if the following conditions hold:

(i) for every h of the form

1 − h(x) =

n

X

k=1

(1 − zk)1Ak(x),

where A1, . . . , Anare mutually disjoint bounded Borel sets, and |zk| ≤ 1,

the probability generating functional G[h] reduces to the joint proba-bility generating function P (A1, . . . , An; z1, . . . , zn) of a n-dimensional

(28)

(ii) for every sequence {hn} ⊂ V(X ) such that hn ↓ h pointwise, G[hn] → G[h],

whenever 1 − h has bounded support;

(iii) G[1] = 1 , where 1 denotes the function identically equal to 1.

Moreover when these conditions are satisfied, the functional G uniquely de-termines the law of the process.

Proof. Assume that N is a point process, with probability generating func-tional defined as in (2.6). Then conditions (i) and (iii) are immediate.

(i) G[h] = E h e R ∪A_klog h(x)N (dx)i _{= E}h_ePk R Aklog(zk)N (dx)i = E h Q kz N (Ak) k i (iii) G[h] = E h e(RXlog(1)N (dx)) i = e0 _{= 1.}

Now suppose that hn ↓ h pointwise; since hn ∈ V(X ) for every n, they are

bounded by 1 on X . Thus, if we set gn:= − log(hn),

• gn≥ 0

• gn↑ g := − log(h) pointwise, by continuity.

Then it follows from the monotone convergence theorem that for each real-ization of N ,

Z

log(hn)dN →

Z

log(h)dN a.s.

Now it is enough to note that | expR_Xlog(hn)dN | ≤ 1, in order to apply the

dominated convergence theorem and obtain G[hn] → G[h].

Conversely, suppose that (i)-(iii) hold.

Let p(A1, . . . , An; r1, . . . , rn) the distribution associated with the probability

generating function of (i). It can be proved ([7]) that the consistency condi-tions for p(·, ·) are satisfied, so we may extend it uniquely to a consistent set of functions which are the finite-dimensional distributions of a unique point process N (·). Then its probability generating functional G∗[h] must be agree with G[h] over simple functions h. Further, arbitrary h ∈ V can be uniformly approximated by a decreasing sequence of simple functions, and G, G∗ are continuous by hypothesis and by the point (ii) in the previous implication respectively. Therefore they agree for all h ∈ V, and thus G is the probability generating functional of N .

(29)

As an example, we calculate the probability generating functional of the general Poisson process, of which we already calculated the probability gen-erating function (2.5) g(Aj; zj, j = 1, . . . , n) = exp − n X j=1 (1 − zj)µ(Aj) ! ,

with A1, . . . , An being mutually disjoint and µ(·) being the parameter

mea-sure of the process. If we write h(x) = 1 − n X j=1 (1 − zj)1Aj(x),

h equals zj on Aj and equals 1 on (∪Aj) C

, and then the (2.5) can be expressed as G[h] = exp Z X h(x) − 1µ(dx) , (2.7)

which is the required form of the probability generating functional.

The (2.7) can be also obtained through the following heuristic idea. As-sume that the support 1 − h(x) is partitioned in subsets ∆Ai, and that

xi ∈ ∆Ai is a “representative point” for each i. Then

Z

X

log h(x)N (dx) ≈X

i

log h(xi)N (∆Ai).

Because of independence’s hypothesis, the random variable N (∆Ai) are

in-dependent and thus

E exp Z X log h(x)N (dx) ≈ E " Y i

exp log h(xi)N (∆Ai)

# =Y i Eh(xi)N (∆Ai) =Y i exp − 1 − h(xi)µ(∆Ai) ≈ exp − Z X 1 − h(x)µ(dx) .

2.1.3 Point processes via conditioning and Cox processes

The main goal of this chapter is to define the point processes via condi-tioning, in particular the Cox process. In order to build these processes we

(30)

follow two steps: first, we introduce an underlying process (a random mea-sure, for instance), and then we build a secondary process whose distribution is conditioned to the realization of the former.

The existence of such processes is based on the construction of a bivariate distribution, and since the realization of a random measure can be thought of as a point in a metric space, it is enough to follow the same basic path we usually apply for for dealing with bivariate distributions in the space R2, and which we quickly recall.

In general, given two metric spaces (X , F , µ) and (Y, G, ν), we denote with Z = X × Y the Cartesian product of the sets X e Y and with H = F ⊗ G the σ-field product, that is the σ-field generated by the measurable rectangles A × B, where A ∈ F and B ∈ G. Define then the product space as the measurable space (Z, H). We can equip the space (Z, H), for instance, with the product measure µ × ν, that is a probability measure such that

(µ × ν)(A × B) = µ(A)ν(B).

Not all the measures on a product space are product measures. Given a probability measure π on the space (Z, H), we aim to decompose it as a mixture of marginal and conditional distributions defined on the spaces X e Y. Recall that the marginal measures πX and πY are the projections of π on

the spaces (X , F , µ) and (Y, G, ν) respectively, that is

πX(A) = π(A × Y) e πY(B) = π(X × B).

Given A ∈ F and B ∈ G, we aim to write π(A × B) as the integral π(A × B) =

Z

A

Q(B|x)dπX(x), (2.8)

where we interpret the Q(B|x) as the conditional probabilities of events B given the occurrences of x. This procedure is known as disintegration of π. The existence of such a family of measures is related to the problem of regular conditional probabilities.

Theorem 2.1.7. Let (Y, BY) be a complete separable metric space, (X , F )

an arbitrary measurable space, and π a probability measure on the product space (Z, H). Then, with πX(A) = π(A × Y) for all A ∈ F , there exists a

family of functions Q(B|x), defined on BY × X , such that

• Q(·|x) is a probability measure on G for each fixed x ∈ X ;

(31)

• (2.8) is satisfied for all A ∈ F and B ∈ BY.

Such a family is called the family of regular conditional probabilities.

We omit the proof of this theorem and turn to the converse problem to define a product measure from the conditional and marginal probability measures.

Proposition 2.1.8. Given a family {Q(·|x), x ∈ X } of probability measures on the space (Y, BY), and given a probability measure πX on the space (X , F ),

if Q(B|x) is F -measurable, for any fixed B ∈ BY as a function of x, then the

(2.8) defines a probability measure on the product space (Z, H). When this condition is fulfilled, for every non-negative, H-measurable, function f (·, ·), we have Z Z f dπ = Z X dπX(x) Z Y f (x, y)Q(dy|x). (2.9)

Proof. If the function Q(B|x), thought as a function of x, is F -measurable for each fixed B ∈ BY, the integral in (2.8) is well-defined. Furthermore,

this function, which is defined on the measurable rectangle sets of H, can be extended to a finitely additive set function on the algebra A of finite unions of disjoint rectangle sets. Countable additivity follows using monotone approximation arguments. Hence, applying the Carathéodory’s extension theorem, we can extend π to a measure for which the Fubini-Tonelli extension (2.9) holds.

The projection of π on the space (Y, BY), i.e. the measure defined by

πY(B) =

Z

X

Q(B|x)dπX(x),

is called mixture measure of the Q(·|x) with respect to πX. It “recreates” the

measure on Y from Q(B|x) and π.

Consider a family of probability distributions {P (·|x), x ∈ X } on the space MY, and a random variable X(·) with law Π(·) on BX. The previous

proposition implies that, if P (A|x) is measurable for any fixed A ∈ B(MY),

then there exists a process with probability measure P on MY (the mixture

measure of P (·|x) with respect to Π), defined as P (A) = E [P (A|X)] =

Z

X

P (A|x)Π(dx).

Observe that in this case what counts is not the random variable, but rather its distribution, so we can consider as a random variable a random measure or a point process.

(32)

Definition 2.7. Let {N (·|x) : x ∈ X } be a family of point processes on the Polish space Y. The family is said to be measurable if, for any A ∈ B(NY),

the function

P (A|x) = P[N (·|x) ∈ A] is BX-measurable.

Lemma 2.1.9. Let {N (·|x) : x ∈ X } be a family of point process and let {G[h|x], x ∈ X } the family of their probability generating functionals, defined for h ∈ V(Y), corresponding to the probability measures {P (·|x), x ∈ X }. The family {N (·|x)} is measurable – in the sense of the definition 2.7 – if and only if for any fixed h ∈ V(Y), G[h|x] is a measurable function of x. Proof. Denote with A the family of subsets A of NY, for which P (A|x) is

BX-measurable in x. We can choose as h a linear combination of indicator

functions. Taking the derivative, we obtain the finite-dimensional distribu-tions

Pk(B1, . . . , Bk; n1, . . . , nk|x), (2.10)

for any k > 0, n1, . . . , nk ≥ 0, and for all finite unions of disjoint sets

(B1, . . . , Bk). If the (2.10) are BX)-measurable, then A contains the cylinder

sets and, being a family of measurable functions, is closed under monotone convergence, this it contains the σ-field generated by the cylinders BY).

Conversely, if the family {N (·|x)} is measurable, it is enough to note that, given an arbitrary element h ∈ V(Y), G[h|x] can be obtained from the case in which h is a simple function through operations which preserve the measurability in x.

Theorem 2.1.10. Given:

(a) a measurable family (in the sense of the definition 2.7) of point processes defined on the Polish space Y, with probability generating functional {G[h|x], x ∈ X } defined for any h ∈ V(Y);

(b) a random variable X : Ω → X with law Π(·) on BX).

Then the functional G[·] given by

G[h] = E(G[h|X]) = Z

X

G[h|x]Π(dx)

(33)

Proof. From the proposition 2.1.8 we know that P (A) =

Z

X

P (A|x)Π(dx)

is a probability measure on B(NY), and thus it has a probability generating

functional given by G[h] = E exp Z Y

log h(y)N (dy) = Z NY exp Z Y

log h(y)N (dy) P (dN ) = Z X Π(dx) Z NY exp Z Y

log h(y)N (dy) P (dN |x) = Z X G[h|x]Π(dx).

We finally introduce the Cox process, or doubly stochastic Poisson pro-cess, which will be the basis of our models in the next chapter. This kind of process is obtained randomizing the parameter measure of a Poisson process: for any fixed realization ζ of a random measure on X , we build an inhomo-geneous Poisson process with parameter measure ζ. It is easy to see that the probabilities in a Poisson process N (·|ζ) are measurable functions of ζ. For instance,

P(A; n) =

[ζ(A)]n

n! e

−ζ(A)

is a measurable function of ζ(A), which in turn is a measurable function of ζ as an element in the metric space MX of boundedly finite measure on X .

Hence, we can apply the proposition 2.1.8 and take expectation with respect to the law of ζ, to obtain a “mixed” point process on X .

The probability generating functional of Poisson process can be written, for h ∈ V(X ), as a function of the parameter measure ζ:

G [h|ζ] = exp Z X h(x) − 1ζ(dx) . (2.11)

For a fixed h, the (2.11) is a measurable function of ζ seen as an element of MX, thus the family of the probability generating functionals is a measurable

family. We can then apply the theorem 2.1.10 and build a point process, considering the expectation of (2.11), calculated with respect to a probability measure of ζ on MX.

(34)

Definition 2.8. Given a random measure ζ on the space X , G[h] = Eζ exp Z X h(x) − 1ζ(dx) (2.12) defines the probability generating functional of a point process on X , called Cox process, or doubly stochastic Poisson process, directed by the random measure ζ.

Therefore a point process N is a Cox process directed by the random measure ζ when, conditional on ζ, its realizations are those of a Poisson process N (·|ζ) with parameter measure ζ.

We now present a simple example of Cox process, that will be useful in the next chapter for an explicit construction of our models.

Let r(t) be a positive and integrable stochastic process, with a probability law P. In this case, the most natural choice to build a random measure with state-space R it is to consider a multiple of the Lebesgue measure, given by ζ(du) = r(u)du, that is

ζ (s, t) ≡ Z t

s

r(u)du.

We call r(t) the intensity and ζ (s, t) the parameter measure of the Cox process.

We can define, using (2.11) G[h|ζ] = exp Z ∞ −∞ h(t) − 1r(t)dt and finally G[h] = EP exp Z ∞ −∞ h(t) − 1r(t)dt ,

which represents the probability generating functional of a Cox process di-rected by ζ.

Since the Cox process is (conditional on ζ) nothing but an inhomogeneous Poisson process, we can write it as

N (t) =

+∞

X

n=0

(35)

or, equivalently, we can express it by its derivative (in the sense of distribu-tions) x(t) = +∞ X n=0 δ(t − Tn) (2.14)

where the Tn represent the times in which the events occur and the δ(·) is

the Dirac δ.

2.2 Random signals

In the first chapter, we defined neurons as particular cells which are high-specialised in signal transmission: they encode information about input stim-uli in sequences of action potentials, called spike trains. In the previous section, we justified the description of the spike trains through the Cox pro-cesses. This section focuses on a brief introduction of random signals. We show the main features of signals and the spectral analysis. At the end of the section, we present a way to measure and estimate the amount of information that a system transmits about particular types of signals, received as input. These results will be applied in the next chapter, allowing us to quantify the information about a stimulus transmitted by the neural population.

A signal is any physical phenomenon, varying with respect to time, which conveys information of interest. For instance, the signal measured by an electrocardiograph, which from a physical point of view is nothing else but a weak electrical voltage, is associated to an information of a medical nature; the acoustic signal produced by a guitar, that is a variation of the air pressure detected by our ear, contains information of an aesthetic nature, and so on. The signal evolution occurs over time and can be known a priori (determin-istic signal) or a posteriori (random signal). We shall focus on the latter, using concepts from the Probability theory. A random signal is represented by a stochastic process parametrised on a time set.

A random signal can be “desired”, in the sense that conveys desired in-formation, or can be an “undesired” signal, in the sense that it is involuntar-ily added to the desired signal, disturbing it and damaging the information transmission. We shall call noise this kind of undesired signal. It conveys information too, but in general, this is undesired and not completely distin-guishable from the signal. However, as we mentioned in the first chapter, noise can have also a beneficial role in information transmission, and this will be seen in our models.

(36)

is necessary to describe its features from a probabilistic perspective, for in-stance through the mean, variance, and temporal correlation functions of the process.

2.2.1 Stationary processes

An important feature of signals, and stochastic processes in general, is the stationarity. There are different kinds of stationarity, depending on the statistical properties that one considers. We shall say that a process is sta-tionary (in a certain sense) whenever the considered properties do not depend on the temporal parameter.

Definition 2.9. A stochastic process (Xt)t≥0 is called:

• first-order stationary if the probability distribution of the first order does not depend on time, i.e. if

fX(x; t) = fX(x; t + τ ) ≡ fX(x) for each τ,

where fX(x; t) is the first-order density function of X.

• n-th order stationary, for a given n ≥ 2, if the probability distribution of the n-th order does not depend on time, i.e. if

fX(x1, . . . , xn; t1, . . . , tn) = fX(x1, . . . , xn; t1+τ, . . . , tn+τ ) for each τ,

where fX(x1, . . . , xn; t1, . . . , tn) is the joint density function of order n

of X.

Note that if a stochastic process Xt is first-order stationary then its mean

is constant, and that stationarity of order n ≥ 2 implies stationarity of all orders k ≤ n.

When stationarity holds for each n, i.e. the family of the finite-dimensional distributions is time-independent, the process is called strictly stationary. Such a definition is very strong, and for many results it is actually enough to require much less. In order to find a weaker definition of stationarity, we recall the concept of correlation of the process, which describes how such process, considered for a given instant, is related to itself for a different given instant.

Let Xt be an integrable stochastic process with finite mean function µ(t).

(37)

1. the covariance function: CXX(t, s) = E Xt− µ(t) Xs− µ(s) (2.15) 2. the autocorrelation function:

ρXX(t, s) = CXX(t, s) σX(t)σX(s) , (2.16) with σ2_X(t) = E Xt− µ(t) 2

. Namely, the autocorrelation function is the normalized covariance. In the sequel, however, we shall use a different defi-nition of the autocorrelation function.

Actually, when dealing with signal analysis, there is some confusion about the definition of the quantities referred above: some refer to CXX as the

au-tocorrelation function, while others refer to E[XtXs] as to the autocorrelation

function. In Signal processing, all the definitions are often used interchange-ably. Since CXX(t, s) = KXX(t, s) − µ(t)µ(s), we will mostly focus on KXX,

which is the object that catches the concept of self-relation between values of the process at different times. Hence, we define

20. the autocorrelation function:

KXX(t, s) = E[XtXs] (2.17)

When more convenient in calculations, we shall refer to CXX as to the

auto-correlation function.

Taking τ = s − t we can write

KXX(t, t + τ ) = E[XtXt+τ] (2.18)

Note that if the process is second-order stationary, then the autocorrela-tion funcautocorrela-tion depends only on the temporal difference τ between two instants s e t, and not on the specific instants. Hence, in this case, the eq. (2.18) can be reduced to

KXX(t, t + τ ) = KXX(τ )

In some cases, the mean and the autocorrelation functions completely characterize the law of a process (Gaussian case). Hence, second-order sta-tionarity, which implies the temporal independence of these two quantities, would simplify the analysis of the process, but at the same time is still a strong request. We then introduce another kind of stationarity.

(38)

• E[Xt] = µ(t) = µ(t + τ ) = µ, for each τ ∈ R;

• E[XtXt+τ] = KXX(τ ), for each τ ∈ R

that is if the process has constant mean and the covariance function only depends on the difference between two instants and not on absolute times.

Clearly, every second-order stationary process is a wide-sense stationary process too. Observe that if the process is Gaussian the converse is also true. Remark. When the wide-sense stationarity is not guaranteed, we shall con-sider the following quantity as the correlation function:

ˆ

KXX(τ ) = lim

t→∞E[XtXt+τ]. (2.19)

If the limit in the above equation exists, then ˆKXX(τ ) does not depend on t

and it will simplify quite a lot the rest of the signal analysis. Hence, when necessary, we shall consider the limit of large times to ensure stationarity. Ergodic processes

In addition to the expectation E[Xt], that is the statistical average on the

set of realizations of the process for a fixed time t, we can define a temporal average, for each fixed realization of the process,

M [Xt] := lim T →∞ 1 2T Z T −T x(t)dt

Since M [Xt] depends on the realization, it will be in general different for each

realization, and from the statistical average.

Definition 2.11. A wide-sense stationary process Xt with mean µ is called

(wide-sense) ergodic if

M [Xt] = E[Xt] (2.20)

M [XtXt+τ] = E[XtXt+τ] (2.21)

that is if all realizations of the process have the same temporal average, which coincides with the statistical expectation µ of the process, and the same temporal autocorrelation function, which coincides with the statistic autocorrelation function KXX(τ ). Therefore, when ergodicity holds, every

realization of the process is representative for all realizations.

Ergodicity is a stronger condition of stationarity and it is often impossible to prove a process to be ergodic. In practice, however, in order to deal with a reasonable analysis, this hypothesis must be often assumed true, because it is impossible to observe all the possible realizations of a random process.

(39)

Cross-correlation between two processes

One of the main purposes when studying random signals is to discuss their relationship with each other. The cross-covariance function and the cross-correlation function are a measure the “similarity” of a process (Xt)t≥0

to another process (Yt)t≥0 which is time-shifted:

1. cross-covariance function: CXY(t, t + τ ) := E Xt− E(Xt) Yt+τ − E(Yt+τ) (2.22) 2. cross-correlation function: KXY(t, t + τ ) := E[XtYt+τ] (2.23)

What we said about the interchangeability of the definitions (2.16) and (2.17) holds analogously in this case with the definitions (2.22) and (2.23).

Definition 2.12. We say that (Xt)t≥0 e (Yt)t≥0 are jointly wide-sense

sta-tionary if both are wide-sense stasta-tionary and if the cross-correlation function KX,Y(t, t + τ ) is independent from t:

KXY(t, t + τ ) = E[XtYt+τ] = KXY(τ )

If the two processes are not at least jointly wide-sense stationary, one can use an analogous definition to (2.19).

Two stochastic processes Xt and Yt are called jointly ergodic if they are

both ergodic and if

M [XtYt+τ] = E[XtYt+τ] = KXY(τ ),

that is if the temporal average of the cross-correlation coincides with the cross-correlation function.

So far, we have described random signals with an analysis on time domain, that is with respect to their temporal evolution in the real world. In the next paragraph we shall analyze the spectral analysis of signals, which is the description in the frequency domain.

2.2.2 Spectral analysis of random signals

The spectral analysis of a signal refers to the study of a signal in the frequency domain, and helps to observe and capture the essence of the signal while requiring much less data then time domain analysis. Roughly speaking, the frequency represents how fast a signal is changing. We first describe the spectral analysis for the deterministic signals and then we move to the spectral analysis for the random signals.

(40)

Fourier analysis of deterministic signals

Let x(t) be a periodic signal, i.e. such that it exists a constant T0 > 0,

called period, such that x(t) = x(t + T0) for all t. The frequency f0 is

defined to be the inverse of the period T0 and is measured in Hertz (cycles

per second), provided that the period is measured in seconds. A periodic signal is converted from the time representation to the frequency domain through its Fourier series, which we briefly recall: every periodic signal can be expressed as the sum of sinusoidal oscillations with suitable amplitude, phase and frequency:

x(t) = A0 + 2 ∞

X

k=1

Akcos(2πkf0t + θk) (real form)

where A0 is a constant, Akand θk respectively denote the amplitude and the

initial phase of the k-th oscillation, f0 is the fundamental frequency, fk= kf0

is the k-th harmonic frequency and it is an integer multiple of f0.

An alternative form of the Fourier series is the complex form, that is often preferred in practice:

x(t) =X

k∈Z

Xke−i2πkf0t (complex form)

where the Xk are known as Fourier coefficients:

Xk= 1 T0 Z T0/2 −T0/2 x(t)ei2πkf0t_dt

Knowing the temporal evolution of the signal x(t) is equivalent to know the Fourier coefficient Xk of the series.1

When dealing with non-periodic (but still deterministic) signals, one can generalize the concepts above and still develop a spectral analysis: a non pe-riodic signal can be represented as the combination of sinusoidal components with infinitesimal amplitude and frequency that changes with continuity. We shall not deepen this issue. Intuitively, we can look at a non-periodic signal as the limit of a periodic signal with period T0 for T0 → ∞ (equivalently, for

f0 → 0). Thus, when referring to non-periodic signals, one applies Fourier

transform instead of the Fourier series. In this case, the following equations hold:

1_{The series does not always converge. Some sufficient conditions that guarantee the}

Effects of noise on neural signal transmission: analysis of some simple models

Dipartimento di Matematica

Tesi di Laurea Magistrale