• Non ci sono risultati.

Investigating the molecular basis of the shift from goal-directed to habitual behavior

N/A
N/A
Protected

Academic year: 2021

Condividi "Investigating the molecular basis of the shift from goal-directed to habitual behavior"

Copied!
134
0
0

Testo completo

(1)

1

(2)

2

(3)

3

Acknowledgments

I would like to thank my supervisor, Raffaella Tonini, for giving me the opportunity to perform my PhD, supporting this work throughout the past years and giving me the chance to expand it through collaborations. I would like to thank Barbara Greco, with who I learned how to perform operant behavior.

I would like to thank all our collaborators from the laboratory of Prof. Katona for their support and passion. I thank all the past and present members of my lab, they have been colleagues and friends, helping me to improve this work through various aspects, from methodology to scientific discussions. As well as generally in life, thank you for being great friends. Specially, I thank Anna with who I worked on this project. Thanks to Massimo, Andrea, Marta, Adriian, Stefania, Ieva.

I thank all the other friends who during these years were here for the small and big things. Particularly all of you who I met in Genova, Lisbon, and Budapest. You all added a great value to the time spent in those different locations, and to my whole PhD. studies.

I thank my family, for their support, interest and care during those years. Thank you for always supporting my decisions.

(4)

4

Contents

Acknowledgments ...3 ABSTRACT ...7 INTRODUCTION ...8 What is a habit? ...8

The importance of habits ...8

Instrumental Training to study habits formation: ...9

Brain loci for control of behavior strategies ... 10

The infralimbic medial prefrontal cortex: ... 11

The Orbitofrontal Cortex: ... 12

The Basolateral amygdala: ... 13

The Thalamus: ... 15

Basal ganglia system ... 16

Dopaminergic related structures ... 17

The striatum ... 18

The Albin-DeLong Model discussed... 28

Habit formation and synaptic plasticity ... 33

The endocannabinoid (eCBs) system ... 34

Endocannabinoid-mediated synaptic plasticity ... 37

eCB-LTD ... 38

eCB-LTD via TRPV1 receptor ... 41

eCB-SynDep ... 42

The importance of mGluR1/5s signaling for eCB-LTD and its downstream targets ... 42

Other molecular determinants of striatal synaptic plasticity in iSPNs ... 46

Molecular determinants of striatal synaptic plasticity in dSPNs ... 47

Behavioral correlates of plasticity ... 47

Metaplasticity ... 48

Molecular Metaplasticity ... 49

NMDAR mediated metaplasticity ... 49

mGluR mediated metaplasticity ... 50

Behavioral Metaplasticity ... 54

(5)

5

Learning induced metaplasticity ... 56

Metaplasticity relevance for addiction and pathologies ... 57

AIMS ... 58

MATERIALS AND METHODS ... 60

Behavioral tests: ... 60

Animals... 60

Animal Housing ... 60

Instrumental learning ... 61

In-Vivo MPEP manipulations: ... 61

In-vivo DHPG perfusion ... 62 Western Blotting ... 63 STORM ... 65 Sample preparation ... 65 Immunostaining ... 65 Imaging ... 65 Image processing. ... 66

Image analysis through vivid STORM ... 67

Electrophysiology ... 67 Slice Preparation ... 67 Patch-clamp recordings ... 68 Statistical analyses ... 69 EXPERIMENTAL APPROACH ... 70 RESULTS ... 74

1. The in-vivo pharmacological activation of mGluR1/5R and instrumental training sessions converge on the AKT-GSK3 pathway ... 74

2. Overtraining promotes habitual control of behavior in C57Bl/6J mice. ... 77

4. Overtraining followed by omission unveils modification of a subset of phosphorylated proteins downstream mGluR1/5s receptor in the DLS. ... 80

4.1 AKT signaling pathway is affected by over-training followed by an omission test. ... 80

4.3 Decrease of AKT pathway activity do not induce any changes in active-β-catenin levels ... 83

4.3 Decrease of AKT pathway activity occurs in parallel with a decrease of RGS4 protein but not of FMRP expression in the DLS of over-trained mice upon omission ... 84

(6)

6

5. Omission is needed to unveil the priming effect of training on the AKT pathway ... 87

6. MPEP administration during training is able to rescue mGluR1/5s downstream pathways and behavioral adaptations: ... 91

7. Omission induces an increase of mGluR5 downstream kinases lost upon over-training: ... 94

8. Omission induces an increase of diacylglycerol-lipase-α (DGL) and CB1R in short trained animals: ... 96

9. The omission procedure unveils training-induced cell-type specific adaptations of the eCB-dependent LTD in the DLS of over-trained compared to short-trained animals ... 100

DISCUSSION & PERSPECTIVES: ... 106

BIBLIOGRAPHY ... 113

(7)

7

ABSTRACT

According to current theories, actions are controlled by the balance between two dissociable processes acquired during learning. The flexible, goal-directed control of behavior, which is dependent on the causal relationship between the action and its outcome (A-O) and the automatic and inflexible habitual behavior that is insensitive to changes in A-O contingencies. The amount of training critically defines which cognitive process determines action control. Whilst at early stages behavior is goal-directed; with repeated practice behavior gradually becomes automatic and habitual.

Lesion studies in rodents have identified two striatal sub-region that are at least partially responsible for the expression of goal directed and habitual behaviour, the dorso-medial (DMS) and dorso-lateral striatum (DLS) respectively. Within these structures striatal projection neurons integrate cortical glutamatergic signals and midbrain dopaminergic information through multiple post-synaptic receptors to shape synaptic plasticity and action output. Amongst which, the group I metabotropic glutamate receptors (mGluR1/5s) are ideally placed, integrating glutamatergic signals into cellular responses and regulating pre-synaptic glutamate release through endocannabinoids production.

We found that in mice subjected to different training regimes of instrumental conditioning of nose poke for food reward, which promote either goal-directed (short-training) or habitual behavior (over-training), the inability of reverting A-O contingency after over-training was associated with a lack of activation of signalling cascades downstream mGluR1/5s. Preventing in-vivo the activation of mGluR5 during training, in the dorsolateral striatum (DLS), restored behavioral sensitivity to changes in A-O contingencies and averted biochemical changes.

By using super resolution microscopy (STORM) and electrophysiology on ex-vivo brain slices, we also found that the capability of updating changes in A-O association is associated with a nanoscale reorganization of molecular components of the mGlur5-endocannabinoid (eCB) signalling and enhanced eCB-mediated LTD in goal-directed but not habitual mice.

(8)

8

INTRODUCTION

During everyday life, the repetition of routine behaviors is leading to a fixed way of doing, thinking, willing, or feeling (Andrews., 1903). Driving a car through the same path every day, turning on a light in a well-known environment, riding a bike, recollecting where we always park our car in the morning. After an initial learning phase, all these daily behaviors tend to be performed without thinking, allowing our attention to be focused elsewhere. In an ever-changing environment, this behavioral automatization forms our habits and is proposed to free up the cognitive loads during intermittent behaviors, giving organisms the opportunity to focus on novel conditions.

What is a habit?

Habits are characterized by repetitive and nearly automatic behavior. While goal-directed actions are controlled by their consequences, habits are influenced by antecedent stimuli. Habits and habitual strategies are learnt processes, they usually occur repeatedly over the course of days or years, becoming nearly inflexible behavioral strategies. Habits tend to involve ordered and structured action sequences which are elicited mainly by contextual stimuli present during learning. They may also contain expression of routine “thoughts” (Graybiel, 2008). Habit learning can be roughly defined as the acquisition of associations between a stimulus (e.g. a context) and responses (e.g. motor program engagement) (Hull, 1943). Stimuli response associations represents in part one of the two systems of instrumental conditioning, one of which is considered to involve a goal-directed system that acquires associations between responses and the incentive value of outcomes, and a second, the habit system, that acquires S–R associations (Adams & Dickinson, 1981) (Reviewed in Gasbarri et al., 2014; Hart, Leung and Balleine, 2014).

The importance of habits

Understanding how habits are formed is a real challenge, but will lead to a better understanding of the neuronal physiology, which is important because of their close relation to pathophysiology. Indeed, it has been proposed that dysregulation of habits is a common substrate for pathologies such as addiction and obsessive-compulsive disorders (Berke and Hyman, 2000; Dickinson et al., 2002; Belin et al., 2013; Gerdeman et al., 2003; Graybiel 2008).

(9)

9

Instrumental Training to study habits formation:

In mice, the control of instrumental actions such as nose poke for food reward is proposed to contain at least three major components.

The first is a contingency learning process during which the animal encode the relation between performing an action and receiving an outcome (i.e. food reward). Whether or not this encoding represents a strict causal-relationship is suggested by instrumental learning in rodents and acquired causal judgment of the effectiveness of an action by humans which show high concordance across multiple variables (Dickinson and Shanks 1995) (Balleine and Dickinson 1998).

The second stage is an incentive learning process allowing the animal to assign an appropriate value to a reward, and learn how this value can be adapted across different motivational states (Balleine and Dickinson 1998). This second module of instrumental learning is engaged when animals contact and experience the reward in the relevant state (Balleine and Dickinson 1998).

This two first stages, are key constituents of a goal-directed control of behavior during which animals are sensitive to both changes in the contingency between the action and the outcome, and changes in its incentive value.

The third and last stage of instrumental conditioning is composed of stimulus response associations (S-R) insensitive to changes in either A-O contingencies nor degradation in the incentive value of the outcome.

In instrumental reward-based learning, the amount of training critically defines which cognitive components of action control are engaged. At first, during the early phase of training, behavior is goal-directed and sensitive to change in the value of the outcome or in the causal relationship between the action and the outcome (reviewed in Graybiel, 2008; Yin and Knowlton, 2006; Balleine and Dickinson 1998). After extensive training, a stimulus-response (S-R) association gradually prevails, and after overtraining, the value of the outcome is strongly decreased. Behavior becomes habitual and impervious to alterations in the A-O contingency.

To test whether behavior is controlled by goal-directed or habitual cognitive strategies, two different tasks are commonly performed: the outcome-devaluation task or the omission task. The former measures behavioral sensitivity to changes in the intrinsic value of the outcome, usually performed by devaluating its incentive value via specific satiation or illness-induced pairing with the reward, while the latter assesses sensitivity to changes in the A-O contingency, usually performed by increasing the reward delivery probability when the animal do not perform any actions (Balleine and Dickinson,

(10)

10

1998; Graybiel, 2008; Nazzaro et al., 2012). Habitual mice are less sensitive to degradation of the A-O contingency than goal-directed mice.

Brain loci for control of behavior strategies

In cognitive neuroscience and experimental psychology, habit learning has been distinguished from goal directed behavior as a product of a procedural learning brain domain, while the latter is proposed to be part of a declarative memory brain territory which encodes facts and episodes (Graybiel 2008).

Declarative (explicit) memory is the conscious, intentional recollection of facts based on previous experiences and their consequences. Its counterpart, the non-declarative (implicit) memory, does not require any conscious retrieval and is mainly expressed through performance. The two different memory systems are proposed to support behavior parallel processing (Squire, 2009; White & McDonald, 2002).

Such definitions are supported by experimental data pointing at distinct brain region for processing of declarative and procedural memory.

In humans, amnesic patients with bilateral damage to the hippocampal formation, or diencephalic midline, trained to perform a probabilistic classification task explicit a severe impairment in declarative memory for the training episode. However, they are still able to non-consciously learn the task. On the other hand, non-demented patients with Parkinson's disease fail to learn the task. This, despite being able to recall the training phase (Knowlton et al. 1996; Bayley et al. 2005).

Similar results were reproduced via lidocaine inhibition of the hippocampus or the dorsolateral striatum (DLS) in rats tested in a cross-maze task (Packard & McGaough 1996). Lidocaine treatment in the hippocampus produced an early shift towards win-stay S-R response-strategies while the same treatment in the DLS maintained a flexible place-strategy in which animals use their environment to determine their response, a behavior typically only present at early stages of the training (Packard & McGaough 1996).

Another brain region has been proposed to be part of a cognitive domain for learning facts and express declarative memory. In contrast of basal ganglia activation seen via FMRI, during the learning of a feed-back based task involving the rapid learning of S-R associations, paired-associative learning implicating declarative memory is correlated with an increased medial temporal lobe activity (Salat et

(11)

11

al. 2006). However, activity in the medial temporal lobe decreases the more the training progresses while basal ganglia activation increases (Foerde et al. 2006, Poldrack et al. 2001, Willingham et al. 2002). Moreover, in disease states involving dysfunction of the basal ganglia, medial temporal lobe activity can appear under conditions in which striatal activity normally would dominate (Moody et al. 2004, Rauch et al. 2006, Voermans et al. 2004).

These observations demonstrate both a competition and combined contribution from declarative (goal-directed) and procedural (habitual) brain memory system to behaviors.

These findings have been further defined using animal models of instrumental learning. In 1981, Dickinson defined habits experimentally as being performed not in relation to a current or future goal but rather in relation to a previous goal and the antecedent behavior that most successfully led to achieving that goal (Dickinson 1981, Balleine & Dickinson 1998, reviewed in Graybiel 2008). By combining the learning of a reward-based task with lesion studies, it has been shown that goal-directed and habitual behaviors are supported by the activity of different cortical and sub-cortical regions.

The infralimbic medial prefrontal cortex:

After lesions of the infralimbic medial prefrontal cortex in rats, animal show a sensitivity to reward devaluation for both a short-training and over-training responses, indicating that these animals respond using goal-directed strategies despite being on an overtraining schedule. In contrast, rats lesioned in the more dorsal prelimbic region of the medial prefrontal cortex fail to decrease their response following devaluation of the reward in either the short-training or over-training. These behaviors indicate that their response is not goal directed. Findings from this study implicate a bidirectional involvement of the mPFC. The more dorsal (i.e. prelimbic) mPFC supporting goal-directed flexible strategies while the more ventral (i.e. infralimbic) mPFC does not (Killcross & Coutureau, 2003). This view is supported by cortico-striatal connections (Fig.1). While the prelimbic mPFC sends most of its connections to the dorsomedial striatum supporting goal-directed behavior, the infralimbic mPFC targets mainly the nucleus Accumbens (NAc) which is a structure proposed to be implicated more in appetitive Pavlovian conditioning rather than instrumental conditioning (Yin et al., 2008) (Fig.1).

(12)

12

Figure 1: Schematic representation of prelimbic and infralimbic cortico-striatal connections: PL: prelimbic; IL: infralimbic; M2: medial agranular; M1: motor; SS: somatosensory

The Orbitofrontal Cortex:

Another prefrontal cortex sub-region is proposed to contribute to goal directed strategies and behavioral flexibility. Studies within the orbitofrontal cortex propose this structure to be implicated in impulsivity and behavioral flexibility to assess and update the value of an outcome in changing condition. It has been shown that OFC lesioned rats tend to favor larger delayed reward over smaller ones (Winstanley et al., 2004) suggesting a role of the OFC in assessing and updating outcome values under changing conditions. The inability to update behavior in response to changes in the value of the outcome is implicative of habitual behavior (and tested through outcome devaluation), it suggests a role of the OFC in updating performances during devaluation (Gremel and Costa., 2013) is supported by its connections to the ventral and dorso medial striatum (Gremel et al., 2016) (Fig.2).

Another study coupled lesions of the OFC with the use of the 5-choice serial reaction time task (5CSRTT), this task measures different types of performance including aspects of attention and inhibition such as compulsive (perseveration) and impulsive (premature) responses. OFC lesioned rats display increased omission, premature, and perseverative responses, suggesting that the OFC could regulate impulsivity. Those observations strengthen the first evidences of a role of the OFC in response

(13)

13

flexibility (Chudasama et al., 2003). One of the most reported findings in human and animals with OFC lesions is an inability to process reversal-learning tasks (Rolls et al., 1994; Rahman et al., 1999).

OFC lesions cause a deficit in reversal learning, and involve a perseverance in actions that are no longer rewarded (Chudasama and Trevor., 2003) (Schoenbaum et al., 2007). At first, this result was interpreted as a failure to inhibit preexistent responses, but several lines of evidence suggest that the OFC may actually be important for encoding the outcome-value of the response (Schoenbaum et al., 2007). Thus, the reversal-learning deficit observed with OFC dysfunction could be considered a failure to encode the devaluation of the reinforcer (no longer presented upon the action). This could be interpreted as a decrease in goal-directed strategies and an increase in habitual behavior. (reviewed by Torregrossa et al., 2008)

The Basolateral amygdala:

Downstream to the OFC, the basolateral amygdala is proposed to be the most important neural contributors to reward-based learning and decision making particularly implicated in adaptive, goal-directed behavior (reviewed in the basolateral amygdala in reward learning and addiction, Kate M. Wassum and Alicia Izquierdo., 2015).

Among other structures, the BLA sends excitatory projections to the NAc, dorsomedial striatum (DMS), medial and orbital frontal cortex (OFC) (Fig.2) all structures critically implicated in reward-based learning, motivated behavior, and action selection (Britt et al., 2012; Lalumiere, 2014; Lüthi and Lüscher, 2014). Indeed, on one hand, dopamine efflux regulations within the BLA together with its connections to the NAc have been proposed to regulate the selection and co-ordination of specific sequences of behaviors appropriate to incentive stimuli present in the environment (Phillips et al., 2003). On the other hand, an intact BLA is required for selective sensitivity to a change in the reward value of the instrumental outcome, connections from the BLA to the DMS being propose to support this information. (Corbit LH and Balleine BW., 2005).

(14)

14

Figure 2: Schematic representation of orbitofrontal and basolateral amygdala striatal connections: OFC: orbitofrontal cortex; DMS: dorso medial striatum; NAc sh: nucleus accumbens shell; BLA: basolateral amygdala

Surprisingly, excitotoxic BLA lesions have no effect on many primary measures of both Pavlovian and instrumental appetitive learning. The BLA is not required for the acquisition of an instrumental action (Balleine et al., 2003), or for a reward-related instrumental discrimination (Schoenbaum et al., 2003). Nevertheless, alterations are revealed when a representation of a specific predicted reward (i.e., outcome of instrumental action) must be encoded in the learned association and used to guide responding such as during outcome devaluation. Pre- and post-training BLA lesions or inactivation do not disrupt instrumental actions, but they do render them insensitive to selective devaluation of the outcome (Balleine et al., 2003; Balleine and Killcross., 2006; Hatfield et al. 1996; Johnson et al., 2009; Parkes and Balleine, 2013; Pickens et al., 2003). Insensitivity to devaluation has been reported in monkeys with whole, excitotoxic amygdala lesions (Izquierdo and Murray, 2007) and in rats with specific BLA lesions (Balleine et al., 2003; Corbit and Balleine, 2005; Coutureau et al., 2009; Hatfield et al., 1996).

Previously discussed data suggest that pre-training BLA lesions disrupt the formation of instrumental goal-directed A-O associations, thereby promoting action acquisition via an alternate form of learning that does not require the BLA, the inflexible stimulus-response (S-R) driven actions (i.e. habit). In support of this, pre-training BLA lesions also render instrumental responding insensitive to changes in the A-O contingency (Balleine et al., 2003), one of the other marker of habitual behavior.

(15)

15

The BLA links incentive value to the outcome representation encoded in both stimulus driven rewarded-actions and A-O associations, but it is not acting alone to serve this function. Evidence from various laboratories suggests that the amygdala provides outcome-specific value information to the OFC, interacting with the latter in stimulus-guided tasks to update the outcome value information following selective satiation (Baxter et al., 2000; Saddoris et al., 2005; Zeeb and Winstanley, 2013). OFC lesions also disrupt such associative encoding in the BLA, suggesting a bidirectional functional relationship (Saddoris et al., 2005). Importantly, the OFC is necessary for devaluation sensitivity when tasks are heavily guided by reward-predictive stimuli and is proposed to be unnecessary for the sensitivity of instrumental action to devaluation (Ostlund and Balleine, 2007b), this indicate that BLA to OFC projections might encode reward-predictive stimuli information.

In order to display appropriate responses to changes in reward value, the BLA must also interact with other structures such as the striatum. Namely, projections from the BLA to the NAc core are proposed to mediate the sensitivity of instrumental action to devaluation (Shiflett and Balleine, 2010) while connections to the dorso medial striatum are vital to acquire instrument A-O type of associations (Corbit et al., 2013).

The Thalamus:

The Thalamus and its various nuclei are at the interface between sensory cortices and subcortical structures responsible for the execution of actions. As consequence it has been seen as hub for sensory information implicated in arousal, attention, and voluntary movement. Nevertheless, evidences of connection between thalamic nuclei and the prefrontal cortex and dorsal striatum have prompt various researchers to study its role in the regulation of instrumental learning and performance. The three main thalamic nuclei investigated for their role in such regulation are the anterior-thalamic-nuclei (ANT), the medio-dorsal (MD), and lastly the parafascicular-thalamic-anterior-thalamic-nuclei (PF).

The ANT was one of the first thalamic nuclei to be investigated for its role in mediating instrumental behavior (Gabriel et al. 1977; Gabriel et al., 1983; Gabriel et al., 1989). Even though these early studies suggested a role of this nuclei in instrumental learning, refined studies using skinner operant paradigm (Skinner, 1932) (Corbit et al., 2003) revealed that that the tasks governing behavior confounded Pavlovian and instrumental processes as discussed by Bradfield and colleagues (Bradfield et al., 2013). Indeed, no deficit were found in ANT lesioned rats tested in free operant instrumental conditions allowing to exclude this region as a regulator of instrumental behavior (Corbit et al., 2003).

(16)

16

Distinctively to the ANT, the medio-dorsal thalamus was found to play a role in instrumental learning. Even though early investigations did not employ tasks that clearly separate Pavlovian and instrumental relations (Buchanan, 1994). MD lesions were later found to affect performance in several operant behavioral learning processes. These studies highlight MD as an important structure for the acquisition of goal-directed behavior (Corbit et al., 2003) but not for its expression as post-training MD lesions left goal-directed behavior intact (Ostlund and Balleine, 2008). These findings suggest that the MD might play a role similar to that of the prelimbic cortex (PL) which has been found to mediate the acquisition but not expression of goal-directed behavior (Ostlund and Balleine, 2005).

Lastly, the role of the parafascicular-thalamic-nuclei was examined. As for the others thalamic nuclei, several lines of evidence suggest a role such flexible behavior. However, behavioral tests performed in those studies make it difficult to entangle between operant and Pavlovian conditioned control of behavior. Recent research driven by Bradfield et al. suggest that the PF, might mediates the alterations in learning that occur when A-O contingencies change through its connections to the posterior dorsomedial striatum (Bradfield et al., 2013).

Basal ganglia system

The basal ganglia are a group of subcortical nuclei connecting the cerebral cortex with different neural systems affecting behavior. The basal ganglia reside in the basal forebrain, and they provide outputs to various behavior effector systems, such as the thalamic nuclei projecting to frontal cortical areas involved in the planning and execution of movement (reviewed by Graybiel in 1990, 1995). The role of basal ganglia in behavioral functions has expanded, with evidence indicating their involvement in procedural learning related to goal-directed and habitual behavior, action selection, and motivation. Regulation of this last process has been suggested to be the key role of basal ganglia, possibly driving also its function on motor control.

The basal ganglia can be anatomically subdivided in 4 interconnected nuclei: striatum (composed in primates by the caudate, putamen, and nucleus accumbens), subthalamic nucleus, globus pallidus (further sub-divided into an internal and an external tract) and substantia nigra (divided in pars compacta, pars reticulata, and ventral tegmental area) (Fig.3). For the purpose of this thesis I will

(17)

17

discuss of the role of dopaminergic related structures bearing motivational information and the role of the striatum critically involved in movement regulation in relation to habit formation.

Figure 3: The basal ganglia (a) The human and (b) rodent basal ganglia illustrated respectively on a coronal and a sagittal diagram. A parallel color code identifies similar structures in the human and rodent brain. (c) Coronal diagram of mouse brain shows the separation between the medial and the lateral dorsal striatum, respectively corresponding to human caudate and putamen nuclei.

Dopaminergic related structures

Critical to learning processes, early motivational signal provided by dopamine neurons have been investigated by various researchers. Phasic dopamine responses are elicited by reward and reward associated cues (Schultz, 1998). These responses are proposed to fulfill dopamine roles in motivational control, including signaling reinforcement learning (Schultz, 1986; Wise, 2004) and are seen as an incentive signal that promotes immediate reward seeking (Berridge and Robinson, 1998). Dopamine is also proposed to relay negative motivational signals (Bromberg-Martin et al., 2010; Matsumoto M and Hikosaka O., 2009).

DA cells in relation to habit formation can be divided into two major groups: VTA and substantia nigra pars compacta (SNc). On one hand, the projection from the VTA to accumbens has

(18)

18

been the center of attention in the field of reward-related learning, they appear to play a limited role, if any, in instrumental A-O associations. They are however proposed to play a role in Pavlovian conditioned responses (Pavlov, 1928; Waelti et al., 2001; Sotak et al., 2005; Wise RA, 2006). On the other hand, the much more massive nigrostriatal pathway has been relatively neglected, with attention focused primarily on its role in Parkinson’s disease but it recently appeared to play a role in instrumental learning process (Satoh T et al., 2003; Berridge, 2007; Wise, 2009).

Substantia nigra pars compacta

Because of its role in voluntary movement initiation, study of the involvement of the substantia nigra pars compacta in instrumental learning was always challenging. Faure and colleagues tried to overcome this issue by depleting dopaminergic projections to the dorsolateral striatum only, leaving intact the projections to the dorsomedial part critical in initial stages of the learning. Through this study, they did confirm that dopaminergic nigro-striatal projections to the dorsolateral part of the striatum were critical for habit formation. (Faure et al., 2005). However, this study did leave out any information regarding the role of these projections at early stages of the training. The development of newer tools such as optogenetics did allow to further define the role of these projections at early stages of instrumental learning (Rossi MA et al., 2013). In this study the authors described that optogenetical-stimulation of SNc dopaminergic neurons do facilitate the acquisition of an instrumental challenging task without any reward delivery. Moreover, using such approach did not alter sensitivity to A-O contingency. This support a role of these projections in reward salience encoding and motivational processing at the early stages of instrumental training.

The striatum

The striatum, which integrates cortical, thalamic, and midbrain inputs is considered as a hub in the cortico-basal ganglia network motif (reviewed by Graybiel in 1990).

Different striatal regions appear to participate in distinct functional networks involved in instrumental learning, The accumbens acts as a hub in the limbic network, the DMS in the associative network and the DLS in the sensorimotor network.

(19)

19 Nucleus Accumbens (Ventral Striatum)

Several lines of evidence suggested that the nucleus accumbens (NAC) subdivided in two sub-regions, the core and the shell. These two sub-regions have been proposed to play an important role in learning processes implicating A-O associations (cf. Colwill and Rescorla, 1986; Dickinson and Balleine, 1994). Indeed, the simultaneous blockade of DA-D1 and NMDA receptors in nucleus accumbens core was shown to delay the acquisition of instrumental lever pressing (Smith-Roe and Kelley., 2000). In addition, post-session blockade of protein synthesis within the nucleus accumbens affected the acquisition of instrumental lever pressing through a proposed disruption of memory consolidation (Hernandez et al., 2002).

However, these views have been challenged by Yin et al in 2008. In their review of the literature, the authors propose that while the nucleus accumbens is fundamental for the acquisition of certain appetitive Pavlovian responses, the dorsal striatum is mandatory for the acquisition and expression of instrumental actions (Yin et al., 2008). This initial picture was as well questioned in a review of Belin and colleagues in 2009. The authors discussed that although lesion and drug manipulations of the nucleus accumbens core can alter the acquisition of instrumental behavior, it remains unclear to which extend the accumbens participate in such initial acquisition (Belin et al., 2009).

Investigating specific effect related to instrumental learning as discussed previously is often assessed by reward devaluation or A-O contingency degradation. To this regard, cell body lesions in either core or shell of the accumbens did not alter sensitivity to contingency degradation (Corbit et al., 2001). On the same extend, nucleus accumbens DA depletions in rats did not alter sensitivity to reinforcer devaluation, suggesting that accumbens core DA might not be crucial for encoding A-O associations (Lex and Hauber., 2010).

However, Dopamine and its proposed role in arousing effects of conditioned stimuli could be an amplifier already acquired instrumental response, but also could act to promote acquisition by increasing response output. This hypothesis is supported by recent research in which optogenetic stimulation of ventral tegmental DA neurons did not provide positive reinforcement of instrumental lever pressing on its own, nor it did affect food intake, but did amplify the emergence of food-reinforced lever pressing on an active lever during acquisition (Adamantidis et al., 2011).

(20)

20 The dorsal Striatum

Similarly, the dorsal striatum can be divided into at least two major regions, associative and sensorimotor. The associative striatum or dorsomedial striatum (DMS) contains neurons that fire in anticipation of action-contingent rewards and changes their firing according to the magnitude of the expected reward value (Hikosaka et al., 1989; Hollerman et al., 1998; Kawagoe et al., 1998). These experiments in primates were the first evidences indicating a possible encoding of the reward value and A-O association within the DMS, both defining variables of a goal directed behavior.

In the associative network, the prefrontal and parietal associative cortices and their target in the DMS are involved in transient memory, both prospective, in the form of outcome expectancies, and retrospective, as a record of recent efference copies (Konorski, 1967) (Fig.4a).

The sensorimotor striatum, on the other hand, comprises the sensorimotor cortices and their targets in the basal ganglia; mainly represented by the dorsolateral striatum (DLS) (Fig.4a). The outputs of this circuit are directed at motor cortices and brain stem motor networks. Neuronal firing in the sensorimotor striatum is generally not modulated by reward expectancy. It exhibits more movement-related activity than neurons in the associative striatum (Kanazawa et al., 1993; Kimura et al., 1993; Costa et al., 2004). Costa and colleagues were among the first suggestion of the involvement of the DLS in encoding automated skilled motor behavior, one of the characteristic of habitually driven actions (Costa et al., 2004).

Finally, in addition to the medial-lateral gradient, there is significant functional heterogeneity along the anterior-posterior axis of the dorsal striatum, though not sufficient data is currently available to permit any detailed classification (Yin et al., 2005b).

Lesions studies did further define DMS and DLS roles in instrumental learning. Indeed, while lesions of the dorsomedial striatum (DMS) abolished the sensitivity to changes in A-O association, leading to the expression of habitual behavior (Yin et al., 2005; Yin and Knowlton, 2006); lesions of the dorsolateral striatum (DLS) did prevent the expression of habitual behavior (Yin and Knowlton, 2004; Yin and Knowlton, 2006) (Fig.4a).

In order to study molecular determinants of habitual behavior, I did focus on the Striatum which is, as discussed previously, an integrator of Cortical, thalamic information and motivational drives coming from dopaminergic inputs and is known to play a major role in the formation and expression of habits (Fig.4b).

(21)

21

Figure 4: Dorsal Striatum and habits (a) Coronal diagram of mouse brain showing the integration of cortical inputs within the medial and lateral part of the striatum and their relation to the cognitive and habit system. (b) Sagittal schematic diagram of the mouse brain showing the glutamatergic and dopaminergic inputs modulating indirect and direct pathway of the dorsal striatum.

Dorsal striatum cellular organization:

On a cellular level, the striatum is composed of less than 5% of cholinergic and GABAergic interneurons which in addition to dopaminergic fibers from the SNc, provide inputs able to modify the responsiveness of the 95 % remaining neurons, the striatal projection neurons (SPNs). SPNs are subdivided in D1R and D2R expressing GABAergic neurons (Reiner et al., 1998; Kreitzer, 2009; Tepper, Tecuapetla, Koos, & Ibanez-Sandoval, 2010).

(22)

22 Striatal interneurons

Four different type of interneurons have been described in the striatum thus far, classified through both their electrical properties and the proteins they express.

- large, aspiny cholinergic interneurons

- medium-sized aspiny fast-spiking interneurons co-containing GABA, parvalbumin (PARV), and the neurotensin-related hexapeptide, (Lys8, Asn9) NT (8-13) (LANT6)

- medium-sized aspiny interneurons expressing somatostatin (SS), neuropeptide Y (NPY) and nitric oxide synthase (NOS)

- medium-sized neurons expressing the calcium-binding protein calretinin (CALR)

Among these four class, there is an increasing interest in regards to cholinergic interneurons. At the interconnection between the thalamus and striatum they are proposed to play a major role in motivation, action selection and goal directed behavior (Kimura et al., 2003; Bradfield et al., 2013; Matamales et al., 2016)

Cholinergic interneurons

Cholinergic interneurons account for 1-2% of the total of striatal neurons. They are large (>50 μm), aspiny neurons and display an autonomous, tonic firing rate. However, excitatory and inhibitory inputs have the potential to affect their tonic activity by delaying or advancing upcoming action potential (Goldberg et al., Handbook of basal ganglia, 2010).

Cholinergic interneurons release acetylcholine (ACh) which act pre- and post-synaptically to affect neuronal excitability, synaptic transmission and synaptic plasticity. ACh signaling effects are transduced mainly by the activation of muscarinic receptors expressed on striatal neurons (Koós and Tepper, 2002; Zhou et al., 2002; Wilson, 2004). ACh can also activate nicotinic receptors expressed on axons of dopaminergic neurons and fast-spiking GABAergic interneurons (Goldberg, 2010, Handbook of Basal Ganglia Structure and Function).

(23)

23

Muscarinic acetylcholine receptors (mAChRs) are a family of G protein-coupled receptors (GPCRs), and can be classified into 2 subcategories according to the subtype of G proteins they couple with.

M1-like (M1, M3, and M5) receptors are functionally linked with Gαq proteins, whose activation stimulates phospholipase C to trigger a phosphoinositide-dependent signaling pathway.

M2-like (M2 and M4) receptors are coupled to Gαi/o proteins, whose activation inhibits adenylyl cyclase, thereby decreasing cAMP production and protein kinase A (PKA) activity (Peralta et al., 1988).

In SPNs the main muscarinic receptors expressed are M1 (expressed by all SPNs) (Bernard et al., 1992; Hersch et al.,1994; Yan et al., 1997) and M4 (expressed selectively by D1 SPNs) (Hersch et al, 1994; Ince et al., 1997; Santiago and Potter 2001). M1s promote depolarization of SPNs both by suppression of several potassium currents (reviewed in Goldberg, 2010, Handbook of Basal Ganglia Structure and Function, pag 140), and by down-regulating voltage dependent CaV2.1 (P/Q-type)- and CaV2.2 (N-type) Ca2+ currents (Howe and Surmeier, 1995). These Ca2+ channels subtypes are coupled to BK and SK channels (Vilchis et al., 2000). Hereby, CaV2.1 and CaV2.2 down-regulation increases the excitability of SPNs by increasing their evoked firing rate and reducing their AHPs (Perez-Rosello et al., 2005). In addition, M4 receptors inhibit N- and P/Q-type Ca2+ channels, but their effect is negligible compared to M1 (Perez-Rosello et al., 2005). Since N- and P/Qtype Ca2+ current controls GABA release in SPNs, M1 agonist causes a reduction in GABA release from collaterals of SPNs (Marchi et al, 1990; Sugita et al., 1991; Perez-Rosello et al., 2005). All together this evidence indicates that ACh both facilitates firing frequency of SPNs and inhibits GABAergic interconnections. The combination of these two actions has been proposed to lead to an increase in the recruitment of projection neurons (Wickens and Oorschot, 2000). Moreover, muscarinic agonist can pre-synaptically inhibit cortical and dopaminergic afferents (Barral et al. 1999; Calabresi et al. 2000; Jones et al., 2001) and potentiate post-synaptic NMDA-dependent responses (Calabresi et al. 1998). The scenario proposed would allow cholinergic signaling to select incoming afferents while simultaneously ensuring that the recipient neurons respond vigorously to these afferents (Perez-Rosello et al., 2005).

Despite the similar effect of M1 and M4 on firing frequency, the differential coupling of these two receptors to inhibitory or activating G proteins confers to these receptors an opposite effect concerning synaptic plasticity and the induction of protein expression.

(24)

24 Fast-spiking interneurons

Parvalbumin-immunoreactive (PV+) striatal interneurons are medium to large sized (16–18 μm diameter), and have 5–8 aspiny, often varicose, dendrites which branch relatively sparsely, originating a restricted dendritic arborization (200–300 μm diameter). The axon branches overlap and extend beyond the limits of the dendritic field of the cell of origin, thus creating the densest arborization of all striatal neurons (Tepper 2010). PV+ neurons account for the 0.7% of total striatal neurons (Rymar et al., 2004), being more present in the dorsolateral striatum (Luk and Sadikot, 2001).

PV+ interneurons receive strong, multiple cortical afferences, and are contacted multiple times by each cortical fiber (Ramanathan et al., 2002). They also receive dopaminergic (Kubota et al., 1987), cholinergic (Chang and Kita, 1992), pallidal (Bevan et al., 1998) and few thalamic afferences (Kita, 1993). Their GABAergic projections target preferentially SPNs, but also other PV+ and other interneurons (Kita, 1993; Bevan et al., 1998).

This class of interneurons is called fast-spiking because of their characteristic firing mode. They display a hyperpolarized resting membrane potential (~–80 mV), very brief action potentials with a rapid and large amplitude, and brief duration spike after-hyperpolarization. These properties allow for the fastest spiking mode among striatal interneurons (Koós and Tepper, 1999; Taverna et al., 2007). Spiking activity of PV+ interneurons induces inhibitory post-synaptic potentials (IPSPs) in contacted SPNs, which can delay the timing of depolarization-evoked spikes and, in the case of short bursts, may completely block spiking in SPN (Koós and Tepper 1999). PV+ interneurons are connected to each other also by means of gap-junctions (Koós and Tepper, 1999). This characteristic allows the synchronization or desynchronization (depending on the firing mode) (Russo et al., 2013) of many interneurons, thus creating an inhibitory syncytium capable of exerting powerful and synchronous inhibitory control over a large number of SPNs (Koós and Tepper, 1999).

SS/NPY/NOS interneurons

SS/NPY/NOS interneurons are the second largest cell in the striatum, with a soma diameter of 15–25 μm. Their dendritic arborization is relatively simple and unbranched, extending up to 600 μm in diameter, while their axonal arborization, with 1 or 2 main axons (Kawaguchi, 1993), is the sparsest and longest of any striatal neuron (Tepper et al, 2010, Handbook of Basal Ganglia Structure and Function).

(25)

25

These neurons are hereby considered a unitary class, but not all the three markers are always expressed by these cells. It is still unclear whether the specific expression of only some of these markers corresponds to different electrophysiological and/or morphological phenotypes (Tepper et al, 2010, Handbook of Basal Ganglia Structure and Function).

SS/NPY/NOS interneurons receive monosynaptic inputs from the cortex (Kawaguchi, 1993), as well as dopaminergic (Kubota et al., 1988), cholinergic (Tepper et al, 2010, Handbook of Basal Ganglia Structure and Function), and pallidal GABAergic afferences (Bevan et al., 1998). They display a quite depolarized resting membrane potential (~50-60 mV), and long duration action potential (Kawaguchi, 1993; Kubota and Kawaguchi, 2000; Centonze et al., 2002). Their discharge induces IPSCs in SPNs, and it has been proposed that they could be the still unidentified cellular responsible of the GABAergic IPSPs registered in cholinergic interneurons after their spiking. If true, this would indicate that SS/NPY/NOS interneurons would be the principal player of a recurrent synaptic network for the regulation of the activity of ACh interneurons (Sullivan et al., 2008).

Calretinin (CR)-expressing interneurons

Calretinin (CR)-expressing interneurons are medium sized cells (12–20 μm in diameter) and issue a small number of aspiny dendrites that branch sparingly (Bennett and Bolam, 1993). In rodents they account for ~0.8% of total striatal neurons (Rymar et al., 2004), but in primates their number seems to be 3-4-fold higher (Wu and Parent, 2000). Not much is known about these neurons, as it is for other striatal interneuronal classes (e.g. tyrosine hydroxilase expressing neurons).

Striatal Projection Neurons

SPNs are the main neuronal population in the striatum comprising around 95% of striatal neurons. They receive Glutamatergic projections of pyramidal neurons residing in cortical layer 2/3 and 5 (Wall et al, 2013; Kress et al., 2013), and from neurons of the thalamic central medial complex and the parafascicular nucleus (Smith et al., 2004). Cortical and thalamic glutamatergic signals on SPNs activate mainly 3 types of glutamate receptors: α-amino-3-hydroxy-5-methyl-4-isoxazole propionate (AMPA), N-methyl-D-aspartate (NMDA), and metabotropic glutamate (mGlu) receptors.

AMPA receptors (AMPARs) are the main responsible for fast synaptic transmission in the basal ganglia. They are tetramers which can be composed by different combinations of its 4 subunits

(26)

(GluR1-26

4). AMPA receptors have a non-selective cation channel which allows the flux of Na+ and K+, and also Ca2+ in case of absence of the Ca2+-impermeable GluR2 subunit. These receptors have a desensitization mechanism that closes the channel quickly, terminating its depolarizing effect (Hammond, 2008).

NMDA receptors (NMDARs) are heteromeric ligand-gated ion channels. Each receptor can be formed by 4 subunits: 2 NR1 subunits (which are critical for the formation of functional channels), and 2 other subunits (NR2 A/B/C/D, or NR3 A/B) (Emson et al., Handbook of Basal Ganglia). NMDAR is a nonselective cation channel, allowing the flux of Na+, K+ and Ca2+. At resting potential, the channel is blocked by Mg2+, and the putative binding of an agonist wouldn’t result in the opening of the channel.

Depolarization (usually via AMPA receptors) repels the Mg2+ from the channel, allowing its opening upon ligand binding. This condition is present in SPNs during the alternation of “up” and “down” states. When in an “up” state glutamate is released from the presynaptic terminal, Na+ influx through AMPA and NMDA receptors causes further depolarization, keeping the NMDA channel in the unblocked state, and allowing a strong influx of Ca2+. This is a trigger for several synaptic and cellular events, such as synaptic plasticity (Hammond, 2008). NMDA receptors are indeed necessary for the induction of both long-term depression and potentiation at cortico-striatal synapses.

At most excitatory central synapses, a single presynaptic stimulus produces a large and brief glutamate transient in the synaptic cleft (Clements et al., 1992; Diamond and Jahr, 1997), which is limited by removal of the neurotransmitter by diffusion and glutamate transporters. This transient is generally sufficient to activate AMPA and NMDA receptors.

When a train of stimuli reaches the synapse, instead, much more glutamate release is induced, also because of mechanisms of presynaptic facilitation and delayed release (Carter and Regehr, 2000). The resulting large glutamate levels may overwhelm clearance mechanisms, allowing an extended glutamate signal and even glutamate spillover to nearby sites. Here mGluRs, which are often located at extrasynaptic sites and have a high affinity for glutamate, can be activated (Lujan et al., 1996; Scanziani et al., 1997).

(27)

27

- Group I (mGluR1 and 5), which are coupled to the adenylyl cyclase activating Gq protein (Hammond, 2008). They are facilitators of neural transmission, and are expressed in SPNs (Tallaksen-Greene et al., 1998).

- Group II (mGluR2 and 3) and III (mGluR4, 6, 7 and 8), which are coupled to adenylyl cyclase inhibiting Gi/o protein (Hammond, 2008). These receptors are inhibitory of neural transmission, and are expressed on the terminals of cortico-striatal afferents (Testa et al., 1998).

The activation of these different glutamatergic receptors, depending on the pattern of incoming glutamatergic signaling, drives the activity of striatal SPNs.

SPNs are neurons with a medium cell body size (≈9-17 μm), and a large and extensive dendritic tree. The average SPN has 20-60 dendritic branches, and each branch has approximately 500 input spines, so that the typical SPN integrates inputs coming from 10000-30000 synaptic boutons. This impressive number of input stations reflects the intrinsic firing properties of these neurons, whose firing activity requires the coordinated activation of many different excitatory inputs. SPNs are normally silent, held in a “down” state (with their membrane potential at hyperpolarized potentials of ≈-80 mV) by a continuous shunting current mediated by rapidly activating inwardly rectifying potassium (Kir) channels (Wilson and Kawaguchi, 1996). In the down state, the synchronous input to few dendritic branches is insufficient to reach the firing threshold. Instead, when a large number of highly synchronous inputs span most of the dendritic arbor, the shunting current collapses and the neuron enters an “up” state (their membrane potential reaches a more depolarized voltage of ≈--60-55 mV) in which firing activity may occur (Wilson and Kawaguchi, 1996; Millers, 2010).

The electric potential of this “up” state is just below the neuron’s firing threshold, and a much smaller synchronous input is required to get the SPN to fire. This state may last for tenths of seconds, or even a few seconds.

In the “down” state, excitatory postsynaptic potentials are mainly mediated by AMPA glutamatergic receptors, while “up” states allow the recruitment of NMDA receptors by eliminating their block by Mg2+. The recruitment of both receptor types leads to prolonged excitatory potentials in the upstate, thus increasing the likelihood of temporal summation of incoming stimuli. Additionally, the activation of Ca2+-permeable NMDA receptors during the “up” state allows further source of Ca2+ entry (in addition to Ca2+ channels gated by the depolarization of the neuron, called L-type voltage-gated calcium channels), which has been shown to play a pivotal role in processes of synaptic plasticity

(28)

28

(Carter and Sabatini, 2004). The resulting effect of “up” and “down” states is that only those neurons which will receive highly coordinated glutamatergic activity (from the cortex and the thalamus) together with permissive monoaminergic, cholinergic and GABAergic inputs (from midbrain dopaminergic and serotoninergic neurons and from striatal interneurons) will signal to their downstream targets.

All these different inputs target specific compartments of the dendritic branches (Bolam et al., 2000). Cortical afferent terminals generally contact in a 1:1 ratio dendritic spines located more distally on the dendrites (Kemp and Powell, 1971). Also, midbrain dopaminergic neurons project to the distal branches of SPNs in the dorsal striatum, with large spherical-shaped dendritic arbors which overlap with one another (Millers, 2010; Hersch et al., 1995) The soma and proximal dendrites of SPNs are mainly reached by recurrent collaterals of neighboring spiny neurons (Taverna et al., 2008; Chuhma et al., 2011) and by striatal interneurons (Kita et al., 1990; Bennett and Bolam, 1994).

As specified formerly, cortical and midbrain inputs converge on these neurons and are conveyed through two different pathways to final basal ganglia outputs. The subpopulation of SPNs projecting directly to substantia nigra forms the striatonigral or direct pathway, while the SPNs projecting to final outputs through other structures such as the globus pallidus pars externa (GPe) and the subthalamic nucleus (STN) belong to the indirect or striatopallidal pathway. According to the classic Albin-DeLong model the direct pathway activity promotes movement while the indirect pathway inhibits movement (Albin, Young, & Penney, 1989; DeLong, 1990) (Fig.5). At the molecular level, the direct pathway neurons express predominantly D1 like dopamine receptors, substance P, dynorphin and muscarineM4 acetylcholine receptors, while the indirect pathway neurons express D2 like dopamine receptors, enkephalin and A2A adenosine receptors (Gerfen & Surmeier, 2011).

The Albin-DeLong Model discussed

As discussed previously, dopamine inputs to the striatum are of crucial importance in regulating movement initiation and reward based instrumental learning. In the following chapter I will try to depict in details current theories regarding the basal ganglia network and its motor function.

Dopamine binds to different classes of G-protein-coupled receptors, and activates multiple signaling cascades. By modulating the gating and trafficking of voltage-dependent and ligand-gated ion channels, dopamine modulates SPNs response to glutamatergic signals.

(29)

29

Dopamine exerts opposite effects on the activity of SPNs, depending on the receptor subtype expressed by the cell. Within SPNs expressing mostly dopamine receptors of class 1 (D1, D5) (D1 SPNs), dopamine signal coupled to Gαs proteins increases neuronal excitability. In SPNs expressing mainly dopamine receptors of class 2 (D2, D3, D4) (D2 SPNs), dopamine stimulates Gαi proteins. This stimulation results in inhibition of neuronal activity. According to the Albin-DeLong model (Albin et al., 1989; DeLong 1990). In the striatonigral direct pathway, D1 SPNs constitute the first step propagating cortico-thalamic inputs, which project to the internal segment of the globus pallidum (GPi) and to the substantia nigra pars reticulata (SNr). The GABAergic neurons in these two nuclei project to the thalamus, and display a relatively high level of tonic activity which results in the constant inhibition of their thalamic targets. Thus, the activation of the direct pathway is proposed to lead to disinhibition of the target thalamic nuclei, and to movement promotion (Fig.5).

The second canonical pathway is called striatopallidal or indirect pathway, and is suggested to ultimately inhibit the thalamus through the activation of GPe and SNr nuclei. The activation of striatal D2 SPNs causes the inhibition of the external segment of the globus pallidum (GPe). This nucleus, contains mainly GABAergic neurons exerting inhibitory activity on the subthalamic nucleus (STN). The STN, alternatively, contains glutamatergic excitatory neurons which activate the GABAergic GPi and SNr. The disinhibition of the STN by D2 SPNs activity produces an increase in the activity of basal ganglia outputs, this oppose the effect of the direct pathway and ultimately inhibits the thalamus (Albin et al., 1989; DeLong 1990) (Fig.5).

The integration of the signals originating from the two direct- and indirect-pathways occurs in the thalamic nuclei, which in turn project back to the cortex, gating movement.

Complexity of the basal ganglia networks and Albin-DeLong model have increased within the years. Particularly, multiple connectivity studies indicate that GPe GABAergic neurons project to most of the basal ganglia nucleus, including direct projections to the striatum. Indeed, neostriatum interneurons have been reported to receive GABAergic projections from the GPe (Bevan et al., 1998; Kita and Kita., 2001; Mallet et al., 2012; Sato et al., 2000), while other pallidal projections are proposed to project as well to the SNr (Bevan et al., 1998; Samantaray et al., 2004) and the GPi (Mastro et al., 2014). Lastly, numerous thalamic nuclei have been reported to send glutamatergic projections to the striatum (Hunnicut et al., 2014, 2016) (Fig.5).

(30)

30

The Albin-DeLong model, applied to physiological conditions, originally indicated that movement is inhibited by the indirect pathway activation, while direct pathway activity favors movement. Kravitz and colleagues in 2010 tested this hypothesis by directly activating specifically one of the two pathways by optogenetic control of D1 SPNs or D2 SPNs. With this approach, they could observe hypokinesia upon indirect pathway activation. Conversely, direct pathway activation produced an increased locomotion in naïve animals, and restored normal locomotion in a bradykinetic parkinsonian animal (Kravitz et al, 2010). However, the optogenetic strategy used doesn’t simulate a physiological activation of the striatal pathways, instead the SPN sub-population is activated strongly and as a whole. This implies that, while a massive activation of each pathway confirms the Albin-DeLong prediction, the interplay between the two pathways may be different under physiological conditions.

The model has been recently challenged by a study measuring Ca2+ transients in freely moving animals upon action initiation (Cui et al., 2013). The authors took advantage of genetically engineered mice which selectively express an optical indicator of changes in intracellular Ca2+ concentrations in one of the two SPNs populations. Variations in the optical signal of this reporter manifest the depolarization and activation of a cell. Cui and colleagues monitored the light emitted by each neuronal subpopulation in the dorso-medial striatum through fiber-optic cables implanted in-vivo. After having trained the animals in an instrumental task where they had to press a lever in order to get a food reward, they were implanted with the fiber-optic cables and tested while the activity of their striatonigral and striatopallidal SPNs was monitored. The classical model would have predicted the activation of striatonigral SPNs upon behavioral entrainment of the animal and the engagement of striatopallidal neurons in correspondence of movement cessation. Alternatively, during instrumental learning the activation of SPNs belonging to both pathways appeared to occur just before the initiation of movement. Similarly, both neuronal sub-classes became relatively inactive when the movement stopped. The authors suggested that both pathways are needed for fine control of motor function, with bursts of striatonigral SPNs disinhibiting specific thalamo-cortical targets promoting desired actions, while the activation of striatopallidal neurons would inhibit different thalamo-cortical targets suppressing competing motor programs. In view of their data, the authors commented the results reported by Kravitz and co-workers (Kravitz et al., 2010). In order to promote the selection and initiation of a precise movement when required by the environmental context, the coordinated and timely activation of clusters of neurons belonging to both pathways may be required. Conversely, the

(31)

31

simultaneous activation of the striatonigral or the striatopallidal pathway as a whole (as Kravitz and colleagues induced) can result in the activation or the inhibition, respectively, of all motor programs, and not only desired ones. Synaptic plasticity occurring within the two striatal pathways could reliably support learning of a motor task and of automated response as discussed below.

(32)

32

Figure 5: The Albin-DeLong model of basal ganglia (a) sagittal diagram of mouse brain and (b) flowchart illustrating the striatonigral direct and striatopallidal indirect pathways.

(33)

33

Habit formation and synaptic plasticity

It has been proposed that the behavioral expression of goal directed behavior and habits is dependent on experience-dependent plasticity (Graybiel 2008).

As to goal directed behavior, recent studies from the laboratory of Bernard W. Balleine highlighted that goal directed actions generates opposing plasticity in direct and indirect pathways of the DMS. In this study, authors unveiled that AMPA/NMDA ratio increases within D1 SPNs while the opposite effect was seen in D2 SPNs (Shan et al., 2014). Shan and colleagues propose that the opposing plasticity described above could provide the basis for rapidly re-biasing the control of task-specific actions, and that its dysregulation could underlie disorders associated with striatal function (Shan et al., 2014). We shall note that this form of plasticity was estimated during the learning of the task and not after a procedure assessing sensitivity to the value of the reward or sensitivity to contingency between the action and the reward.

Within the Albin-DeLong model framework, this form of plasticity could be seen as a way to strengthen circuits promoting movements necessary for the task (D1 SPNs) while inhibiting SPNs regulating unwanted movements (D2 SPNs) not necessary during an established motor behavior. In the habit framework, this view is partially supported by a recent paper from Nicole Calakos group. In this study authors describe some plasticity features which strongly correlate with habitual behavior within SPNs. Habitual behavior correlated with a strengthened DLS output to both pathways but with a tendency for action-promoting direct SPNs to fire before indirect pathway SPNs. In contrast, habit suppression correlated solely with a weakened direct pathway output which could indicate a depression of previously activated SPNs recruited to perform the task automatically (i.e. habitually) (O'Hare et al., 2016).

Because different neural substrates drive goal-directed or habitual behavior, we asked in our study whether long-term modifications of synaptic and structural plasticity in the dorsal striatum may be responsible for the overtraining-induced expression of habitual behavior.

Consistent with this hypothesis, a recent study performed in our group pointed towards a role for plasticity in the DLS to promote habit expression (Nazzaro et al., 2012). In this study, it was shown that long-term synaptic depression mediated by endocannabinoids (eCB-LTD) is impaired in SPNs of the DLS striatopallidal pathway (but not in the DMS) of mice chronically exposed to Δ9-THC, an agonist of the eCB receptor CB1. This loss of plasticity associates with a bias towards habitual control

(34)

34

of behavior. Interestingly, this shift from goal- directed behavior towards habits is not restricted to THC-treated mice. Unpublished observations of our laboratory indicate that, after overtraining, animals that express habitual behavior show the same loss of striatal eCB-LTD (Greco et al., unpublished observation). eCB biosynthesis depends on the co-activation of D2 dopamine receptors, group I metabotropic glutamate receptors (mGluR1/5s) and L-type calcium channels (Gerdeman et al., 2003). Biosynthesized eCBs act as retrograde messengers on presynaptic cannabinoid CB1 receptors to

decrease the release-probability of glutamate from cortico-striatal terminals (Gerdeman et al., 2002), ultimately triggering LTD.

The endocannabinoid (eCBs) system

eCBs are a family of lipid molecules that belong to an unconventional

neurotransmitter/neuromodulatory system. This system comprises synthesizing and inactivating enzymes, a transport protein, and the CB receptors (Jonsson et al., 2006; Marsicano and Lutz, 2006; Katona and Freund, 2012) (Fig.6).

The two most well-described eCBs are anandamide (AEA) (Devane et al., 1992) and 2-arachidonoyl glycerol (2-AG) (Mechoulam et al., 1995; Sugiura et al., 1995). Among other eCBs, 2- arachidonylglycerylether (2-AGE, or nolandin), O-arachidonoyl-ethanolamine (virodhamine), and Narachidonoyl-dopamine (NADA) are the most studied; however, their physiological role remains largely unknown.

eCBs production and release is usually part of an on-demand response, which occurs upon either physiological and/or pathological stimuli in several brain regions (Di Marzo et al., 1994; Cadas et al., 1996). However, there is increasing evidences of tonic eCBs activity, but its physiological role is still debated within the literature (Gifford and Ashby, 1996; Pan et al., 1998; Zhou and Shearman, 2004; Hentges et al., 2005, Sperlágh et al., 2009, Lenkey et al., 2015).

Once released, eCBs activate mainly CB type 1 (CB1) receptors, which primarily inhibit

neurotransmitter release and are located in the presynaptic compartment. Through this mechanism, eCBs specifically reduce synaptic inputs onto the releasing neuron(s), thus influencing diverse forms of synaptic plasticity. eCBs are rapidly cleared from the extracellular space by a specific uptake system (Beltramo et al., 1997; Hillard and Jarrahian, 2000) the AEA membrane transporter (AMT), which is widely distributed throughout the brain (Giuffrida et al., 2001). AEA and 2-AG are subsequently

(35)

35

degraded by mainly two enzymes: the fatty acid amide hydrolase (FAAH) and the monoacylglycerol lipase (MAGL), respectively (Cravatt et al., 1996; Sugiura and Waku, 2000; Dinh et al., 2002) (Fig.6). Although MAGL is proposed as the main degrading enzyme of 2-AG, the activity of alpha/beta hydrolase domain 6 (ABHD6) and 12 (ABHD12) in the brain have been shown to account for a portion of 2-AG hydrolysis (Blankman et al., 2007; Marrs et al., 2010; Savinainen et al., 2012) (Fig.6).

A considerable amount of research has unraveled the different biological mechanisms involved in the AEA and 2-AG metabolic pathways (Marco and Laviola, 2012; Petrosino and Di Marzo, 2010; Paradisi et al., 2006; Di Marzo, 2009; Pertwee, 2012). AEA is derived from the cleavage of a arachidonoyl-phosphatidyl-ethanolamine (NAPE), a precursor synthesized by the enzyme N-acyltransferase (NAT), which requires the presence of Ca2+ and is regulated by cyclic adenosine monophosphate (cAMP) (Cadas et al., 1996; Piomelli, 2003). Its release is catalyzed by a specific phospholipase D (NAPE-PLD) (Hansen et al., 2000; Okamoto et al., 2004) subsequent to the depolarization and/or activation of ionotropic (e.g., NMDA, acetylcholine nicotinic α7- neuronal receptors) or mGluR (Giuffrida et al., 1999; Stella and Piomelli, 2001; Kim et al., 2002; Varma et al., 2002; Piomelli, 2003). 2-AG originates from the metabolism of triacylglycerols, via receptor-dependent activation of phosphatidylinositol-specific phospholipase A1 (PLA1) and/or phospholipase C (PLC) (Sugiura and Waku, 2000) (Fig.6).

The canonical model proposes that activation of mGluRs (e.g., mGluR1/5s, dopamine D2 receptor, muscarinic acetylcholine-mACh-types M1/M3) coupled to the PLC and diacylglycerol (DAG) lipase pathways harvest 2-AG (Fig. 6; Stella et al., 1997; Piomelli, 2003) (Fig.6).

Irrespective of their different metabolic pathways, the binding properties and intrinsic activity at CB receptors, both AEA and 2-AG activate CB receptors (Stella et al., 1997; Hillard, 2000; Howlett, 2002). The CB receptors are part of the superfamily of G protein-coupled receptors. The type 1- CB (CB1) receptor is the most abundant G protein-coupled receptor expressed in the brain (Howlett et al.,

1990; Herkenham et al., 1991) (Fig.6). Additionally, increasing pharmacological evidence suggests that type 2-CB (CB2) receptors (Van Sickle et al., 2005; Ashton et al., 2006; Onaivi, 2006; Xi et al., 2011),

transient receptor potential vanilloid-1 (TRPV1) receptor (Szallasi et al., 1995; Zygmunt et al., 1999; Szabo et al., 2002; Toth et al., 2005; Cristino et al., 2006; Marinelli et al., 2006) (Fig.6), and at least two non-CB1, non-CB2 receptors (Hajos et al., 2001; Howlett et al., 2002; Kunos et al., 2002) are present in the brain.

Riferimenti

Documenti correlati

L’opposizione dei puristi non ne impedisce tuttavia il rapido consolidamento nell’uso (probabilmente favorito dall’analogia con la costruzione di signi cato equivalente per mezzo

terms of acquired land [Land Matrix, 2015], acquired water [Rulli and D’Odorico, 2013], the number of people who could potentially be fed by crop production on acquired lands [Rulli

The acquisition of a vancomycin reduced susceptible phenotype is a multifactorial event and it has been linked to several abnormal physiological

Genes differentially expressed in the normal and Dlx5-/- developing murine olfactory system, prioritized using human KS-causing genes and the coexpression network.. regulator

How public opinion is shaped depends on the form of participation (i.e. the dynamic aspects of public de- bates) but also the quantity and quality of news and its

Fly ashes from combustion of biomass characterize by high level of leachability of ions of potassium, chromium, sulfates and chlorides in a scale greatly higher than the

To investigate cell cycle dynamics, BrdU was administered to E15 pregnant dams and P1 and P5 postnatal pups either 2 or 24 hours before harvest in order to evaluate the pool of

The proposed technique of re-use of the Characteristic Basis Functions (CBFs) calculated at the highest frequency of the band, termed Ultra-wide band