Introduction

Motor learning is an essential process by which the organism acquires skilled movements and the ability to associate between new sensory information and actions, thus adapting to the ever-changing environmental demands of the world1,2,3,4.

Learning, planning, and execution of movements, are carried out by a highly complex and distributed system, with the primary motor cortex (M1) being one of its main hubs, generating the output cortical motor commands to the downstream brainstem and spinal cord execution centers5. Surmounting evidence supports M1 to be crucial for motor skill learning6,7,8,9,10. Previous studies have shown that during motor learning M1 undergoes major plasticity changes, manifesting at multiple levels ranging from motor representations, dendritic computations, branch spike properties, and spine turnover and clustering6,8,11,12,13,14.

The functional role of M1 in motor control and learning is expected to be cell-type dependent13,15,16,17. Here we concentrate on layer 2-3 pyramidal neurons (PNs), which were shown to represent movement, error estimation, and outcome-related activity8,13,16,18,19. In addition, layer 2-3 PNs of M1 were shown to undergo plasticity changes during motor learning both at the activity and structural spine levels16,19,20.

Central to motor learning is a reward-based learning process1, where reward can in principle inform about the consequences of actions and drive the motor learning process via plasticity mechanisms21. The neuronal substrate carrying reward in motor learning and adaptation is the dopaminergic projection systems to both the basal ganglia and cortex22. The ventral tegmental area (VTA) was shown to be the main source of dopaminergic projections to M1, innervating dendrites of both superficial and deep layers23,24 with forelimb regions being preferentially innervated25.

Our working hypothesis is that the dopaminergic VTA projections to M1 directly drive synaptic plasticity mechanisms and the ensuing network connectivity changes in M1, which underlies motor learning. Previous studies have demonstrated the crucial role of dopaminergic signals in M1 for motor task acquisition as well as synaptic plasticity. Destroying the direct dopaminergic projections to M1 eliminated acquisition of a new skilled reaching motor behavior24. Dopaminergic system was demonstrated to mediate plasticity changes in both the structural and cellular synaptic plasticity levels of M1 including long term potentiation (LTP) of layer 2-3 horizontal connections, immediate early gene expression, and regulation of spine turnover21,26,27. In contrast to the effects of dopamine at the synaptic level, little is known regarding the effects of the dopaminergic inputs to M1 in altering the activity and functional connectivity of the M1 network during motor learning.

Here, we set out to investigate first, if and how functional activity and connectivity within layer 2-3 network of M1 are altered during learning, second, how the VTA projections to M1 participate in motor learning at the behavioral, network activity and functional connectivity levels, and third, whether VTA projections are essential for the development of outcome signaling in M1. Toward this end, we recorded the activity of layer 2-3 PNs in M1 using two-photon calcium imaging with the genetically encoded calcium indicator GCaMP6s28 during learning of a head-fixed hand reach for pellet motor task10,13 with and without inhibiting M1 dopaminergic projections from VTA using designer receptors exclusively activated by designer drugs DREADDs29. We developed a novel non-linear analysis approach to explore the correlational structure of the network throughout training. We used graph theory and Riemannian geometry30,31,32,33,34 which allow us to compare the functional connectivity of the network at different stages of training. We find that motor learning is associated with gradual and monotonic reorganization of functional connectivity of the M1 layer 2-3 network. In contrast to previous reports19,35, we find the majority of neuronal activity as well as the population connectivity gradually transforms towards a new “expert” configuration. Blockade of dopaminergic neurotransmission locally in M1 prevented motor learning at the behavioral level, and concomitantly halted plasticity changes at both the individual neuron activity level as well as at the network activity and functional connectivity levels.

Results

Dopaminergic VTA projections to M1 are essential for hand reach motor learning

To study the reorganization of network activity and functional connectivity of layer 2-3 PNs and their dependence on dopaminergic VTA activity in M1 during motor learning, we trained mice to perform a head-fixed version of the forelimb grasping task, where mice learn to reach, grab, and eat a food pellet10,13 while performing longitudinal two-photon calcium imaging of GCaMP6s28 (Fig. 1a–c). To explore the effects of the direct dopaminergic VTA projections to M1 on motor learning progression and the consequent changes in neuronal activity, we divided the animals into two groups: a manipulated group for which we inhibited the VTA dopaminergic projections to M1 for three consecutive training sessions after an initial shaping session, and a second group trained with no manipulations, serving as controls. Dopaminergic projections to M1 were inhibited by DREADDs (hM4D), which were expressed in dopaminergic neurons of the VTA in DAT-Ires-Cre mice. Inhibition was performed by locally injecting Clozapine-N-Oxide (CNO) to M1 via an access port (CNO sessions) (Fig. 1a–e). It should be noted that all mice expressed the DREADDs virus and later assigned to control or manipulated groups.

Fig. 1: Experimental and behavioral set up for investigating the effect of direct ventral tegmental area (VTA) dopaminergic projections to M1 on motor learning.
figure 1

a Schematic of the viral injection of the genetically encoded GCaMP6s to M1 and inhibitory (hM4D) DREADDs to the VTA. CNO was applied locally to M1 via an access port in the relevant experiments. b Behavioral setup. Mice were head-fixed and trained to grab a food pellet positioned on a rotating table. Trial structure (12 sec); tone (go signal) after 4 sec, plate rotates, and pellet reaches position at tone+0.4 sec. c Example of a typical field of view showing average projections of two-photon imaging of GCaMP6-positive layer 2-3 neurons (250 µm from pia) of an M1 field. d Left panel: A coronal section showing staining of hM4D DREADDs with mCherry (red) in the VTA. White dashed lines label the VTA and other surrounding nuclei based on the Paxinos and Franklin’s mouse brain atlas. White rectangle encloses the VTA and is magnified and shown in the right panel. Histology was validated for all manipulated mice. e Experimental design for learning a motor task for the control and manipulated mice groups during VTA inhibiting with local CNO in M1. Shaded points in the blue box indicate the manipulated session. f Population and average data (mean ± SEM) of success rate (fraction of successful trials) per training session during learning, n = 12 for controls and n = 10 for manipulated mice (p = 3.6·10−8 unpaired two-tailed t-test comparing success rate in the 3rd session of manipulated group n = 10 mice to control group, n = 12). Shaded data in the blue box indicate the manipulated session. g Success rate (mean ± SEM) of four groups: the Ringer group and sham group (n = 4 and n = 3 mice respectively) compared with success rate of the two main groups (control and manipulated, n = 12 and n = 10 mice respectively, same as in f). h Success rate (mean ± SEM) of four groups: the substantia nigra pars compacta (SNc) group and primary visual cortex (V1) group (n = 3 mice each) compared with the two main groups (control and manipulated, n = 12 and n = 10 mice respectively, same groups as in f). i Success rate of expert animals (n = 8, left bar) compared with an additional session where CNO was injected locally to M1 (expert+CNO, right bar) (mean ± SEM). Source data are provided as a Source Data file. Illustrations 1a, e credit to Dafna Antes. Figure 1b, e are adapted from Yara Otor et al., Dynamic compartmental computations in tuft dendrites of layer 5 neurons during motor behavior.Science376,267-275(2022).DOI:10.1126/science.abn1421 with permission.

The control animal group demonstrated a steady and gradual improvement in motor execution of the task until reaching a stable level. We quantified task performance by the success rate and by evaluating the sequence of motor events during learning. To calculate the success rate, we determined trials successful if animals succeeded in grabbing and consuming the pellet. We observed a gradual increase in success rates at the first four training sessions (R2 = 0.55, p = 2·10–9), after which success rate did not change significantly (p = 0.7, F = 0.37, for one-way ANOVA comparing success rate in sessions 5 through 7, n = 12 mice) where all animals maintained high proficiency (0.55 ± 0.03 at 7th session, n = 12 mice, Fig. 1f; Supplementary Movies 13). Learning as evaluated by success rate was impaired and remained low for the manipulated group in sessions where CNO was injected and dopaminergic afferents in M1 were inhibited (Fig. 1f; Supplementary Movies 4, 5; p = 3.6·10-8 unpaired two-tailed t-test comparing success rate in the 3rd session of manipulated group n = 10 mice to control group, n = 12). In consecutive sessions, once CNO was lifted, learning progressed gradually reaching comparable success values to expert controls only on the 8th training day (p = 0.05 and p = 0.36 t-test for comparison of the 7th and 8th training days of the CNO group to the 7th training days of control group, respectively; Supplementary Movie 6.

We performed several experiments to control for multiple parameters related to the CNO, the viral injection, the origin of dopaminergic afferents, and the spatial cortical location of the observed effect in the manipulated mice: 1. We expressed hM4D DREADDs in the VTA, but instead of CNO, we injected Ringer to M1. We did not observe a significant difference in the progression of the success rate of learning compared to the control group (Fig. 1g; p > 0.19 unpaired t-test for all training days comparing control and Ringer injections; n = 4). 2. We expressed a sham virus that contains only the fluorophore (pAAV-flex-tdTomato) to the VTA, and CNO to M1. Here as well we did not observe a significant difference in the progression of the success rate of learning compared to the control group (Fig. 1g; p > 0.15 unpaired t-test for all training days comparing control and sham viral injection; n = 3). 3. We expressed hM4D DREADDs in the VTA, but applied CNO in a seemingly unrelated cortical region; the primary visual cortex (V1). In this case as well we did not observe a significant difference in the progression of the success rate during learning compared to the control group (Fig. 1h; p > 0.16 unpaired t-test for all training days comparing CNO injections in M1 and CNO in V1; n = 3). 4. Although the majority of dopaminergic axons to the motor cortex originate from the VTA still some axons may originate from the substantia nigra pars compacta (SNc)24. To address the question of the possible contribution of the SNc projections to M1, we expressed hM4D DREADDs in the SNc (Supplementary Fig. 1, and Fig. 1h; n = 3) and applied CNO locally in M1 during the learning process. In this case, we did not observe a significant delay in the learning process and the learning did not differ significantly from the control group (p > 0.41 unpaired t-test for all training days comparing control and SNc injected group). 5. Finally, to address the question of whether inhibiting VTA dopaminergic inputs disrupts motor performance we introduced CNO locally in M1 in mice expressing hM4D DREADDs in the VTA, after the mice reached expert level (Fig. 1i). We found that inhibiting VTA dopaminergic afferents to M1 using hM4D DREADDs at the expert level did not change significantly the success rate of motor performance (p = 0.2, n = 8, comparing expert sessions with and without CNO injections to M1). Thus, inhibiting dopamine afferents to M1 did not hamper motor performance but rather disrupted the learning process itself.

Taken together these results indicate that the effect of CNO injection on motor learning progression is mediated via the hM4D DREADDs inhibition of VTA dopaminergic axons projecting to M1 and is not due to the viral injection nor to the injection of CNO per se to M1 or general inhibition of movement.

Re-organization of layer 2-3 neurons’ activity during motor learning and the role of direct M1 dopaminergic projections in the process

We next explored changes in the activity of layer 2-3 neurons during the training sessions (Fig. 1e) using consecutive two-photon calcium imaging from the same neurons. When considering the overall mean activity (cells across trials, per training session) we found it did not change significantly with training for both control and manipulated groups16,36 (Fig. 2a, p = 0.97, F = 0.2, n = 12 control group; p = 0.3, F = 1.2, n = 10 manipulated group, one way ANOVA test).

Fig. 2: Reorganization of layer 2-3 PN activity in M1 during motor learning at the single neuron and the population levels.
figure 2

a Population data showing average ±SEM activity (across cells, time, and trials) for control (n = 12) and manipulated (n = 10) mice. b Examples of individual neuron activity vs. time, averaged across consecutive trials per training session (1-7). Color encodes the percent change in fluorescence (ΔF/F). c Example traces of the mean activity along the trial time (average across cells and trials) per training session. Same mice as in B. d Population data showing histograms of correlation values between the average activity trace of each cell per session and its own average trace in session 7. Gray dashed vertical line indicates correlation = 0.5. e Fraction of cells having correlation >0.5 with their average activity trace in session 7 (corresponding to data in e). f Fraction of trials being classified as related to the 7th day in training by a binary classifier trained to separate ensemble activity of the 1st day (beginner profile) from the 7th day profile, trained using activity profile averaged across a 1 sec time window centered at t = 1 sec. g Heat map showing probability for trials to be classified as related to the 7th day (vs. the 1st) based on ensemble activity for a sliding window analysis (window size 1 sec, window hop 0.5 sec), n = 12 for controls and n = 10 for manipulated mice. Results for sessions 1 and 7 are a product of 10-fold cross-validation. For days 2-6 results are a product of the trained models. Scale bar represents fraction of trials classified as 7th session. For Fig. 2a, e, and f, dots are population data (per mice), orange and blue lines are mean ( ± SEM) across mice, n = 12 for controls and n = 10 for manipulated mice, correspondingly. Shaded data in the blue box indicate the manipulated session. Source data are provided as a Source Data file.

While these results might indicate that M1 layer 2-3 network activity remains unchanged with training, a closer look at the dynamics of the activity profile of the cells throughout the training sessions reveals that the network undergoes a gradual reorganization, with an emergence of a tri-phasic average activity pattern13. Examples of the mean activity of cells (across trials), per session, sorted by their variance at the 7th session, extracted from a control animal and for a manipulated animal are shown in Fig. 2b and their corresponding average activity traces across cells and trials per training sessions (Fig. 2c). To quantify this transformation of activity dynamics during training, we calculated the correlation coefficient between the average activity traces of each cell in the different training sessions with its trace of average activity on the 7th session (see Methods). For the control group, we observed a shift towards higher values with training, indicating that a growing number of cells adopt a dynamic pattern, similar to their individual dynamics as experts (i.e., 7th session in the control group, Fig. 2d). For the manipulated group, this process emerged only at the 5th session as training with no CNO is resumed (Fig. 2d). We quantified this process by counting the fraction of cells with a correlation coefficient higher than 0.5 (Fig. 2d, dashed line) in each training session (Fig. 2e). We found a gradual increase in the control group (R2 = 0.45, p = 1.64∙10-10, for linear fit) while for the manipulated group a gradual increase was observed only after lifting the CNO (Fig. 2e; R2 = 0.3, p = 0.0001, for linear fit).

To further quantify the reorganization of the layer 2-3 PN population towards a specific activity profile throughout the training process, we trained a linear classifier to separate between the activity profile of the 1st session and the 7th session. The classifier was based on the ensemble activity profile averaged across a short sliding time window throughout the trials (1 second time window, with 0.5 seconds window hop, see Methods). We then applied this classifier to sessions 2–6 and counted the fraction of trials in each session that were classified as related to the 7th session (Fig. 2f for a time window centered at go cue+1 sec and Fig. 2g for the entire trial time, window length = 1 sec, hop = 0.5 sec, see Methods). For the control group, we observed a monotonic increase of trials classified as related to the 7th session, indicating a gradual and steady shift of population activity towards an expert profile (days in training 1-7, R2 = 0.74, p = 5.9∙10−26, linear fit for training days 1–6, n = 12). Interestingly, the gradual progression in similarity to the expert activity is apparent already at the pre-tone period, suggesting a reorganization in network state even before the go-cue. Repeating this analysis for the manipulated group revealed that CNO injection leads to an unrecognizable activity profile, as trials were not consistently classified as either the 1st or the 7th day in training. Once CNO was lifted at the 5th session, we observed a gradual transformation of the neuronal population activity toward its configuration on the 7th day of training (Fig. 2f; R2 = 0.3, p = 0.002, linear fit for training days 4–6, n = 10).

It should be noted that for the two experimental groups controlling for the DREADDs manipulation (sham virus expression and the Ringer injection), the mean activity and the reorganization of the population activity of the layer 2-3 PN towards the expert profile on the 7th day behaved similarly to the control mice (Supplementary Fig. 2a, b). This further emphasizes the specificity of the inhibition of dopaminergic axons in M1 by hM4D DREADDs.

Together, these results highlight the crucial role the dopaminergic innervation in M1 plays in the gradual development of the learning related dynamics at the population of layer 2-3 neurons.

The correlative changes of the behavioral events and activity during motor learning

Next, we investigated the behavioral changes during the learning process in a more detailed manner, examining their correlation with the neuronal activity changes. To do so we identified specific behavioral events during the task (lift, reach, grab, supinate, and at mouth, see Methods) using a semi-supervised software37,38 as exemplified in Fig. 3a, b and Supplementary Fig. 3a, b. We segmented the events into sequences where each sequence is defined as a series of consecutive events ending with the animal bringing the food pellet to the mouth (an “at mouth” event) or with the animal returning its forelimb to the perch. We then evaluated the number of transitions between behavioral events within each sequence per trial at each training session (Fig. 3c). For the control group we observe a gradual increase in the first three training sessions (R2 = 0.3, p = 0.04, n = 5 mice for linear fit), after which the number of transitions slightly decreased and then stabilized (no significant change during sessions 4 through 7, p = 0.98, F = 0.07 for one way ANOVA, n = 5 mice). For the manipulated group, the number of transitions remain low in the first 3 sessions (p = 0.9, F = 0.07 for one-way ANOVA test comparing training days 1-3, n = 5 mice) and was significantly different compared to the control group (p = 0.001, unpaired one tail t-test comparing number of transitions in session 2–4 for the manipulated group vs. the control group). In addition to the number of transitions, we evaluated the time delay between the sensory cue (tone) and the first lift event during the training sessions (Fig. 3d). We found that while in control mice this time delay decreases during the first four training sessions and then levels off, for the manipulated group the time to lift remains high during the CNO training sessions, and decreases only later in training sessions where CNO is lifted (p = 0.01, for unpaired one tail t-test asking if delay time in sessions 2–4 is longer than delay time in the 7th session).

Fig. 3: Behavioral analysis and its correlations with activity.
figure 3

a Example ethograms in control mice annotating the behavior during training over consecutive trials presenting ‘lift’, ‘grab’ and ‘at mouth’ as a function of time on training sessions 1, 2, 4 and 7. Bottom panel, empiric probabilities for events corresponding to the ethograms. See also Supplementary Fig. 3 for additional animals from both groups. b Same as a, for a manipulated mouse (CNO at sessions 2, 4). c Population and average data (mean ± SEM), n = 5 mice for each group, presenting the number of transitions between behavioral states (lift, reach, grab, supinate, at mouth, and back-to) divided by the number of trials per training session. d Population and average data (mean ± SEM), n = 5 mice for each group, presenting time delay from go cue (tone) to lift. e The relative contribution of behavioral and sensory components averaged across neurons (mean ± SEM) for prediction of calcium transients during the peri-movement period (one second before the tone to two seconds after the tone), on the 1st, 3rd and 7th training days for the control (top) and manipulated (bottom) groups. The contribution of the motor components (including kinematics, back to perch, lift-reach, grab, supinate and at mouth) is 19 ± 4% on the 1st training day, 26 ± 7% on the 3rd and 28 ± 9% on the 7th training days for the control group. The contribution of the motor components for the manipulated group is 37 ± 4%, 25 ± 2%, 31 ± 3% for the 1st, 3rd and 7th training days, respectively. For Fig. 3c, d dots are population data (per animal), and orange and blue lines are mean ( ± SEM) across animals. Shaded data in the blue box indicate the manipulated session. See also supplementary Fig. 4 showing individual data points per neuron. Source data are provided as a Source Data file.

We next used a generalized linear model (GLM) to model the activity of layer 2-3 PNs as a function of sensory, motor, and outcome task components during learning of the task13. Similar to our previous work, we modeled the calcium transients of layer 2–3 PNs based on 4 types of predictors: time series of 3D spatial location of the hand39, time-varying orofacial features extracted from recorded videos40, time-varying discrete behavioral events (tone, lift, reach, grab, supinate, at mouth and back to perch)37,38, and outcome labeled as constant binary variables of success or failure (1 or 0, respectively) throughout the trial. We found that during learning, the full GLM (i.e., including all predictors; see Methods) captures the peri-movement activity (1 sec before until 2 sec after the tone) of an increasing fraction of layer 2–3 PNs. The fraction of neurons that were successfully modeled by the full GLM model increased from 4.5% on the first day of training, through 9.5% on the 3rd day of training, to 13% in expert mice in the control group. These results indicate the development of a sparse representation of task variables in the layer 2-3 PNs. For the manipulated group, similar fractions were observed for the 1st and expert mice sessions (3% and 10%, respectively) but remained low on the CNO injected days (2.5%). We further investigated the contribution of task components to modeling the calcium dynamics. For the control group, the relative contribution shifted from the sensory cue of the tone (51 ± 15% on the first day of training) to the outcome component of the task, which explained 58 ± 5% of the activity in expert mice (Fig. 3e and Supplementary Fig. 4). This shift from sensory to outcome was not observed in the manipulated group while CNO was present, as the contribution of the tone component was still high (43 ± 12%). After resuming training with no CNO, mice in the manipulated group exhibited the same trend as the control group where the outcome component was the most prominent (Fig. 3e and Supplementary Fig. 4; 42 ± 4%). Thus, during learning layer 2-3 PNs develop a high order outcome encoding of the task, which depends on activation of VTA dopaminergic inputs to M1.

Development of outcome-related signaling in layer 2-3 network of M1 during motor learning and the role of direct M1 dopaminergic projections in the process

Our previous study showed a specialized sub-population of layer 2-3 neurons, which reported the outcome of the hand reach trial. These outcome-related neurons developed during the learning process13 (Fig. 3e).

Here, we studied the involvement of M1 VTA projections in the development of the outcome signal. We calculated the fraction of indicative neurons that reliably report the outcome of the trial13. We defined “indicative neurons” as neurons that consistently predict success or failure trials with 99% confidence at the different training days in both the control and manipulated groups (Fig. 4a). Similar to our previous results13 we observed a gradual increase in the fraction of indicative neurons along the training sessions. In contrast, in the manipulated group we did not observe the emergence of indicative neurons during sessions where dopaminergic VTA axons in M1 were inhibited (Fig. 4a and supplementary Fig. 5). An increase was observed only after lifting the CNO in subsequent training sessions (Fig. 4a). However, both control and manipulated groups reached peak levels and stabilized the fraction of indicative neurons after three training days (control group by day 3 of training and CNO group by day 6 of training reaching a similar level, p = 0.7, unpaired t-test for comparison of day 3 in control and day 6 in manipulated group).

Fig. 4: Development of success/failure signal in M1 network during motor learning in control and manipulated mice.
figure 4

a Average fraction of indicative neurons (mean ± SEM) per training session (n = 12 for controls and n = 10 for manipulated mice). b Examples of temporal network dynamics during four representative training sessions (1,2,4,7) as captured by the 3 principal components averaged across trials according to outcome: success and failure as indicated by color. Top panel, control mouse, bottom panel manipulated mouse. c Fraction of animals where network dynamics showed a significant difference between successful and failure trials, as a function of time, per training session (see Methods) for the control group (left) and manipulated group (right). d Population data and mean ± SEM showing delta accuracy rate (accuracy minus chance level) for classification of outcome (success/failure) based on the ensemble activity of the entire population of neurons in a sliding window (window length 1 sec, window hop 0.5 sec) for four representative sessions (1,2,4,7). Top, control group (n = 12), and bottom, the manipulated group (n = 10). e Delta accuracy for classification of outcome (success/failure) using models trained on the final analysis window (go cue+7 to go cue +8 sec) of the 7th session and applied to all other sessions. Results for 7th session are the product of 10-fold cross-validation. For Fig. 4d, e, dots are population data (per animal), orange and blue lines are mean ( ± SEM) across animals, n = 12 for controls and n = 10 for manipulated animals. See also Supplementary Fig. 5 and Supplementary Fig. 6. Shaded data in the blue box indicate the manipulated session. Source data are provided as a Source Data file.

We next evaluated the outcome signals at the population level using a dynamical system analysis as we described previously13,41,42. We extracted principal components (PC) of the ensemble activity to capture 95% of variance, and then averaged the PCs across trials according to outcome to obtain two trajectories per session: one representing the average ensemble dynamics in successful trials and the other representing the average dynamics in failed trials (Fig. 4b). We quantified the distance between the success and failure trajectories as a function of time and used permutation testing to determine whether these differences are significant (see Methods). We counted the fraction of animals where success-failure trajectory distances were significant (p < 0.05, see Methods; Fig. 4c). In the first training session the average dynamics did not differentiate by outcome for both control and manipulated animals (Fig. 4b, c). With training, the trajectories of the control group became increasingly separated by outcome mostly after the movement time window. Significant differences emerged in sessions 2-3 and onward, at the post-movement time segment (5–6 seconds after the go cue) (Fig. 4b, c and Supplementary Fig. 6a), which indicates that the population activity adopted a different dynamic for success compared to failure trials. In contrast, for the manipulated group, the separation between success and failure trajectories did not emerge during training sessions with CNO injections (2nd–4th sessions). Instead, it began to emerge in training session 5 and onward, although it remained lower and did not reach the levels of separation seen in the control group by the 7th training session.

We further quantified the extent to which outcome can be decoded from the entire population activity of layer 2-3 PNs by training a set of binary classifiers (see Methods). Sliding the analysis window across trial time (1 sec time window and 0.5 sec window hop) and training a classifier per time window produced a curve of accuracy vs. time window indicating how well the outcome was decoded at the population level above chance (\(\Delta\) accuracy) in a specific session (Fig. 4d and Supplementary Fig. 6b). For the control group, the population activity gradually became predictive from the second session (p = 0.004 for a t-test comparing the 2nd training day to the 3rd, p = 0.6 for a t-test comparing the 3rd training day to the 4th, n = 12) and stabilized after three training sessions from the 4th day of training at the post-movement time window (approximately 5-6 seconds after the go cue, p = 0.9, F = 0.19 for one-way ANOVA comparing training days 4-7, n = 12). For the manipulated group, the outcome became gradually predictive from the 5th training session after CNO was lifted (p = 0.01, F = 4.95 one-way ANOVA comparing training days 5-7) reaching comparable value to control on day 7 (Fig. 4d; p = 0.47 for a t-test comparing the manipulated to control groups on day 7).

We next addressed the question of whether the population encoding of outcome gradually transforms towards a fixed configuration, or rather, the population representing outcome varies across training sessions being specific per training session. Toward this end, we used the model trained on the 7th day of training at the last analysis window (go cue +7.5 sec) and applied it to predict outcome signals from the neuronal activity of previous training sessions. Delta accuracy (Fig. 4e) as a function of the training session showed that the 7th day model was unable to decode the outcome from ensemble activity of beginner animals (1st training session). However, with training, the accuracy of the 7th day model in decoding outcome increased and became significantly above chance from the 4th session and onward (p = 8.5 ∙ 10−4, p = 1.0 ∙ 10−4, p = 6.3 ∙ 10−5, p = 4.0 ∙ 10−9 for sessions 4–7 respectively, one tail t-test, n = 12). This monotonic increase in accuracy confirmed our hypothesis that encoding of outcome by population activity progresses towards a specific profile with training. For the manipulated group the accuracy of the 7th training day model began to rise only when CNO is lifted but did not reach the same level as the control group (Fig. 4e; p = 0.01, unpaired t-test, comparing the 6th session of the manipulated group, n = 10, to the control group, n = 12).

For the two experimental groups controlling for the DREADDs manipulation (sham virus expression and the Ringer injection), we observed a monotonic increase in the decoding outcome accuracy based on 7th model day similar to the control mice (Supplementary Fig. 2c) further stressing the specificity of the inhibition of dopaminergic axons in M1 by hM4D DREADDs.

Collectively these results show that outcome representation by layer 2-3 PNs network is not inherent (as it does not exist in beginner animals) but acquired and gradually transforms towards a population activation profile through training, and that dopaminergic VTA projections to M1 play a fundamental role in facilitating this outcome network learning process.

Functional network connectivity reorganization of layer 2-3 PN during motor learning

In addition to the changes in activity dynamics, we set out to investigate the changes in the functional network connectivity configuration that evolves during the motor learning process. To get insight into the changes in the functional connectivity within the layer 2-3 network during the training sessions, we extracted a correlation matrix per trial for every training session based on pairwise Pearson’s correlation coefficients between all possible pairs of ROIs (Fig. 5a). We next calculated the centroid of all trial-based correlation matrices of a given session, creating a single matrix for the different trials of a session. As correlation matrices are Semi-Positive Definite, they are located on a curved manifold and do not obey Euclidean geometry. Therefore, instead of using a Euclidean average, we used Riemannian geometry to extract their centroid on the curved manifold43,44 (see Methods). Overall, this analysis produced a single matrix per training session, the Riemannian centroid, representing the average correlational structure of the network in a given training session (Fig. 5a).

Fig. 5: Transformation of the functional connectivity of layer 2-3 network during motor learning and the role of ventral tegmental area (VTA) dopaminergic projections to M1 in the process.
figure 5

a Schematic illustration of the analysis pipeline. Construction of correlation matrices of the activity between pairs of neurons per trial, used to evaluate the Riemannian centroid representing the functional connectivity of the network in a given training session. b Example of a two-dimensional diffusion embedding of trial-based correlation matrices (each dot represents a trial and colors indicate the training sessions) and their Riemannian centroids (black diamond) for a control (left) mouse and a manipulated mouse (right, CNO sessions 2, 3, 4). c A schematic description of subsequent analysis in three levels: Micro, Riemannian similarity accounting for pairwise relations between all ROI pairs; Mezzo, degree calculation expressing the strength of connectivity of each cell to all other cells; Macro, mean degree expressing the global connectivity of the network. d Riemannian similarity extracted from centroid matrices across different training sessions. Left, heat maps indicate similarity between pairwise sessions, averaged across animals. Right, Riemannian similarity (mean ± SEM) comparing all training sessions to the 7th session, for the control group. e Same as D for the manipulated group. f Same as D for degree vectors through training sessions for control group. g Same as F for the manipulated group. For D-G, dots are population data (per animal), orange and blue are mean ( ± SEM) across animals, n = 12 for controls and n = 10 for manipulated mice. See also supplementary fig. 7. Shaded data in the blue box indicate the manipulated session. Source data are provided as a Source Data file.

For illustration purposes, we first computed a low-dimensional representation for the trial-based correlation matrices and their centroids for a control animal and a manipulated animal using diffusion maps45,46 combined with Riemannian geometry (Fig. 5b, see Methods), ultimately obtaining a visual representation of each session, where each trial was represented by a colored dot and the whole session by its Riemannian centroid (black)30,32. In the control animal, the embedded correlations were mainly grouped by the training session, that is the trials within each training session had similar connectivity matrices (up to some intrinsic trial variability), whereas trials from different sessions had different connectivity matrices. Furthermore, with training, we observed a typical trajectory of transformation of the connectivity matrices indicating an evolution of the layer 2-3 neurons’ functional connectivity that occurs along the training process. For the manipulated group, the embedded correlation matrices did not differentiate well between the trials of the training sessions with CNO (2nd–4th) which were grouped together with the 1st session, and only in the following (5th–7th) sessions a differentiation developed. These results indicate the essential role of M1 dopaminergic projections in changing the functional connectivity profile of the network during learning.

We quantified the changes of the functional network connectivity with training, at three levels of analysis (Fig. 5c). First, the detailed micro level of individual functional connections, where we took into account the specific identity of pairwise neuronal connectivity. Second, the mezzo level of individual neurons, where we computed the degree of connectivity of each neuron with other neurons in the network regardless of the identity of the connection, thereby quantifying the strength each neuron is connected to the network. Third, the macro level of the entire network, where we evaluated the average functional connectivity of all neurons in the network. For these three levels of analyses, we evaluated the functional connectivity of the network in each session as the Riemannian centroid matrix of all related trial-based correlation matrices (Fig. 5a, see Methods).

To capture the transformation in functional network connectivity at the micro individual connection level, and to account for pairwise relationships between cells, we evaluated the Riemannian similarity between the Riemannian centroid matrices (Fig. 5a) in different training sessions (Fig. 5d, e, see Methods). In the control group, the Riemannian centroid connectivity matrices were most similar to subsequent training sessions, where the similarity to the 7th session was gradually rising (Fig. 5d; R2 = 0.55, p = 1.2·10−13 linear fit, n = 12), while for the manipulated group, this process was observed only after CNO was lifted during the training sessions (Fig. 5e; a rising trend from day 4–7, R2 = 0.45, p = 7.4∙10−5 linear fit, n = 10).

The analysis at the micro level demonstrated that the functional connectivity reorganizes during the training and that the specific pairwise connectivity footprint each neuron makes in the network is important and evolves with training. Moreover, our finding shows that this reorganization process required dopaminergic activation in M1.

Next, we investigated whether the reorganization of the individual functional connections also impacts the average functional strength each neuron is connected in the network. It is possible that individual connections will change but the average connectivity per neuron will remain constant due to a balance between strengthening and weakening of connections of the neuron. To test this, we applied the mezzo-level analysis and computed the degree connectivity of each neuron at a given training session. The degree (overall connectivity) of a neuron was calculated by summing over the rows of each Riemannian centroid connectivity matrix, resulting in a vector expressing the extent each cell is correlated with all other cells within the network in a given training session (Fig. 5c) and compared the values of degree across training sessions (Fig. 5f, g; see Methods). We found that the degree profile of the network transformed toward the configuration on 7th training session in the control group (Fig. 5f; R2 = 0.46, p = 6.2∙10−11, linear fit, n = 12). No such transformation was observed for the manipulated group in training sessions 2–4 where CNO was applied, however, once CNO was lifted during the consequent training sessions the degree similarity transformed fast (Fig. 5g; R2 = 0.6, p = 1.6∙10−7, linear fit, n = 10) reaching similar level as the control group. Repeating this analysis for the pre-tone time segment leads to similar trends yet noisier possibly due to smaller number of time points used for computing the correlation matrices (See Supplementary Fig. 7a, b), suggesting a network organization that is not solely ascribed to the motor execution itself.

Our findings indicate that the reorganization of the network during learning also involves changes in the overall strength of connectivity of individual neurons to the network and verified that dopaminergic inputs at M1 were essential for this transformation.

It should be noted that the Riemannian and degree similarity developed faster after lifting the CNO compared to the control group indicating some network reorganization occurred despite the presence of CNO.

For the two experimental groups controlling for the DREADDs manipulation (sham virus expression and the saline injection), we observed a monotonic increase in the Riemannian similarity and Degree similarity to the 7th day of training, similar to the control mice (Supplementary Fig. 2d, e). This further emphasizes the specificity of the inhibition of dopaminergic axons in M1 by hM4D DREADDs in the reorganization of network functional connectivity.

Finally, at the macro level, we tested whether the global connectivity of the network changes during the training sessions. We compared the average degree across all cells per training session and found that the mean degree of the network did not exhibit a significant change with training (p = 1, F = 0.01, n = 12, for the control group, p = 1, F = 0.15, n = 10 for the manipulated group, one way ANOVA test). This result indicates that despite the reorganization of functional connectivity at the micro and mezzo levels, the overall connectivity level of the network remained unchanged. This result may be explained by the fact that some cells increased, and some decreased their connectivity such that the overall connectivity in the network did not change with training.

Taken together, our results indicate that layer 2-3 network undergoes a reorganization towards an expert functional connectivity configuration with training, which critically depends on activation of the VTA dopaminergic projections to M1. The expert configuration depends on both the specific connectivity profile of each neuron and on the strength of connectivity of each neuron within the network.

Discussion

In this study, we examined the plasticity changes in the layer 2-3 PNs network of M1 during the learning process of a dexterous motor task, and the role that direct dopaminergic inputs to M1 play in this process. We found that while the average activity of layer 2-3 PNs did not change significantly during the training process, the activity kinetics along the trial changed and became more similar to the “expert” configuration of the 7th day of training. Moreover, while we did not observe changes in the average functional connectivity of layer 2-3 PNs, we observed a gradual reorganization of the specific functional connectivity profile of neurons and the strength each neuron is connected in the network, which became more similar to the final “expert” functional connectivity configuration. The fact the average connectivity of the network remains stable probably reflects homeostatic mechanisms that maintain the overall excitability of the network. The monotonic transformation of activity kinetics as well as network functional connectivity during learning were critically dependent on the VTA dopaminergic inputs to M1, as inhibiting these projections during the learning sessions blocked both motor learning and plasticity changes in the reorganization of activity and functional connectivity of layer 2-3 PNs network. Importantly, inhibiting dopaminergic inputs to M1 in expert mice did not affect performance, underscoring that blocking these inputs did not impair motor performance but is essential for the learning process itself. In addition, we found dopaminergic VTA projections to M1 are crucial in encoding outcome of the learnt task at the single cell and population of layer 2-3 neurons. This VTA dependent reorganization of specific functional connections during learning can serve to generate specific subnetworks (ensembles) that underlie memory.

We report that the total averaged layer 2-3 network activity did not change significantly during learning of the hand reach task. This is consistent with a previous study, that showed stability in the fraction of active layer 2-3 neurons during a lever press task16. However, despite this seemingly stable network activity a deeper look revealed learning was associated with major reorganization of both the neuronal activity dynamics and the network connectivity. During learning, the activity dynamics of neurons along the trial changed and transformed to the expert activity configuration adopting a typical three phasic kinetics13.

In addition to activity dynamics, we examined the effect of learning on functional connectivity. During most experimental paradigms, the access to real connectivity levels of neurons in networks is extremely limited. Thus, a common practice is to evaluate the functional connectivity by measurements of pairwise correlations between neuronal signals for various signals such as functional magnetic resonance imaging (fMRI)47,48, calcium imaging or single unit recordings49 where configurational structure of the network is compared across time or across subjects through centrality measures such as degree50.

In this work, we use a similar practice, where we evaluate the functional connectivity between individual neurons using the correlation matrix, calculated per trial. Here we developed a geometric approach pipeline for tracking changes in correlational structure throughout the training process. In this new approach, we calculate the Riemannian centroid across trials, to express the correlational structure of all neuronal pairs, in a given session. We found that while there were no significant changes in the overall connectivity of the network (globally averaged degree centrality), the learning process induced a significant change in the functional connectivity of the individual neurons and their total connections in the network thus keeping a homeostatic level of connectivity while introducing specific task related connectivity changes. The correlational structure expressed by all pairwise relations gradually transform and converges towards an “expert” configuration indicating that the learning influences specific connections in the network.

Taken together our findings support four processes that occur during learning. First, formation of new specific connections between neurons and thereby probably forming specific ensembles for storing the new task51,52. Second, to maintain the overall excitability of the network, weakening of other unrelated functional connections occurs. Third, shaping of a new tri-phasic dynamic of the neuronal response13. This is probably mediated both by the formation of new connections between excitatory neurons, and probably by connectivity changes with inhibitory interneurons53. The activity dynamics and connectivity of interneurons were not examined in this study, yet it is important to stress that interneurons in M1 are being innervated by the VTA54 possibly contributing to the tri-phasic dynamics of the layer 2-3 activity. Finally, learning was associated with the development of a specialized subpopulation of neurons that encoded task outcome. We hypothesize that these outcome signals are crucial for reinforcement motor learning13, providing feed-back for future adaptations.

Our results are in line with the increased temporal correlations of pairs of neurons reported for anterolateral (ALM) and posterior medial motor cortexes during learning of lick no-lick odor task, though correlations were generally low, and the network correlative configuration was not studied36. In addition, our results are also consistent with the previously reported increased stability of the temporal activity pattern of neurons and the relationship between activity and movement in pairs of trials that became more consistent with learning of a lever press task16,19.

Our results, however, are inconsistent with two previously described findings with regard to layer 2-3 neurons during motor learning. First, a previous study that reported unchanging total predictive decoding accuracy of the motor behavior by both the neuronal population or single layer 2-3 neurons of M1 during lever pull motor learning35, a property that was attributed specifically to layer 5a but not 2-335. Second, our results are also inconsistent with the notion that the single neurons are unstable in their activity during trials, tuning or predictive decoding of behavior during the learning sessions16,19,35,36. We show that increasing fraction of neurons gradually change and converge toward the “expert” activity configuration where each neuron adopted gradually its specific activity pattern during the learning process, which can potentially explain the gradual transformation towards the expert configuration seen at the network functional connectivity level.

Previous work demonstrated that the main source of dopaminergic projections to M1 originate from the meso-cortical dopaminergic system, the VTA. Further, this pathway was shown to be necessary for hand reach motor skill learning24 as eliminating the dopaminergic terminals with 6-OHDA in M1 or blocking D1 and D2 receptors affected the motor skill learning55. Consistent with previous works, behaviorally, we found that dopaminergic inputs to M1 are essential for motor learning as inhibiting these inputs locally in M1 prevented learning progression24,27,56. However, in contrast to a previous study that showed VTA lesioning did not affect learning within session rather only between sessions24, we show that inhibiting VTA terminals with DREADDs in M1 halted both within and between sessions motor learning, preventing mice from increasing the success rate in the CNO sessions. In addition, our analysis of the motor behavior during the hand reach task revealed that inhibition of the direct dopaminergic projection also decreased the number of task related movements and delay time from go cue to first lift, indicating the importance of direct dopaminergic projections on learning not only through success rates but also on fine behavioral parameters. Furthermore, our GLM analysis indicate that layer 2-3 neurons represent task variables in a small but significant portion of the neurons ( ~ 13%) suggesting a sparse representation of movement in this layer. In contrast, layer 5 pyramidal tract (PT) neurons densely encode movement13,57. Moreover, the GLM analysis shows a shift of the modeled activity from sensory to outcome representation during the learning process, a shift that was also dependent on the activation of VTA dopaminergic projections to M1 during learning. These findings are in line with recent work showing a role of dopamine in coordinating fine reaching skilled movements58 though in this case midbrain dopaminergic neurons in the SNc and not VTA dopaminergic neurons were manipulated.

It should be noted, that in our experiments, although learning in the manipulated group was gradual after the release of CNO, the rate of learning was faster compared to the control group in the functional connectivity measures. This may indicate that some motor learning occurred in the presence of CNO, potentially due to either an incomplete blockade of the dopaminergic signals to M1, allowing some plasticity in M1 during the CNO days, or the involvement of additional mechanisms beyond dopaminergic inputs to M1. For example, dopaminergic signals in the striatum, conveyed to the cortex via thalamocortical inputs, or other plasticity mechanisms in M1 independent of the dopaminergic VTA axons, may also contribute to motor learning.

Interestingly, previous studies have reported the importance of dopaminergic SNc afferents to the striatum in motor learning59,60,61,62. Worth noting that we did not observe an effect on the progression of learning when inhibiting SNc projection to M1, which is consistent with the sparser direct innervation from the SNc to M1 compared to that from the VTA24. Thus, motor learning probably requires concerted activation of both direct dopaminergic signaling to M1 via the VTA and dopaminergic signaling to the striatum. It is yet unclear whether the two different pathways carry different information and/or are responsible for different aspects of learning or whether learning requires a double gate dopaminergic mechanism. The fact M1 receives both a general reward prediction error signals1,22 and develops a specific task related outcome signal as part of the learning process may be related to this question. Further studies are required to unravel this interesting and important question.

While the role of direct dopaminergic VTA afferents to M1 on learning was described previously at the behavioral level, information regarding the role of dopaminergic VTA projections to functional connectivity and activity dynamics in M1 during learning was lacking. In this study we attempted to close this gap and show that for both single neuron dynamics and network functional connectivity reorganization to emerge, and for the concomitant motor learning to proceed, activation of direct VTA dopaminergic inputs to M1 is essential.

It is interesting to note that the dopaminergic VTA axons project to a wide range of areas in the brain63, with the majority of projections innervating subcortical regions, primarily the nucleus accumbens (medial and lateral) and amygdala64. VTA is also known to innervate other cortical regions, especially the prefrontal cortex65, however, other cortical regions including M1 are innervated to a lesser degree. Here we show that despite this scarcer innervation, dopaminergic VTA projections to M1 are vital for motor learning and for the reorganization of the M1 network during the training process.

Dopamine is thought to play a major role in reinforcement learning providing a reward prediction error signal, comparing between expected and experienced outcome which than leads to strengthening of the appropriate synapses via plasticity mechanisms64. In this study we did not address the cellular mechanisms by which dopamine participates in reorganization of the M1 network during learning. Previous studies showed that the dopaminergic innervation to M1 from the VTA is important for inducing LTP in M1 layer 2-3 synapses27 in line with the findings that Horizontal connections of layer 2-3 motor cortex undergo LTP during forelimb motor skill learning21,56. At the structural plasticity level task related spine appearance and elimination in layer 2-3 neurons during motor learning was shown16,20. Moreover, in layer 5 neurons spine turnover was shown to depend on direct mesocortical projections to M166. It is likely that one or more of these mechanisms are responsible for the effects we describe in this study. Thus, our findings together support the notion that the reorganization at the single cell and network levels of M1 are supported by dopamine released during the motor learning process from the VTA (see summary Fig. 6).

Fig. 6: Schematic highlighting the essential role of dopaminergic inputs from the VTA to M1 in motor learning.
figure 6

During training for a hand-reach task, the beginner M1 layer 2-3 network (light blue) reorganizes into an expert configuration (dark blue), while the overall network connectivity (represented by the number of edges) remains unchanged. Dopaminergic inputs from the VTA to M1 are essential for both motor learning and the transformation of the network into the expert configuration. Figure credit to Dafna Antes.

Methods

Experimental model and subject details

All animal procedures were performed in accordance with guidelines established by the NIH on the care and use of animals in research, as confirmed by the Technion Research Campus Institutional Animal Care and Use Committee. Adult male Slc6a3 (DAT-Cre) mice were used in this study. Mice (Jackson Laboratory; strain number #020080 from a mixed genetic background of C57BL/6 J and C57BL/6 N) were housed in a 12:12 hours reverse light: dark cycle, kept at controlled humidity (50%) and temperature (220C). For behavioral training and experiments, food intake was limited to 2.65–3 g/day with ad libitum water. The typical age at which animals began experiments was 8 weeks.

Experimental design

To investigate the activity and network dynamics in M1 during motor learning, we trained mice to perform a head-fixed version of the hand reach grasping motor task. After the first successful hand reach to pellet attempt, all subsequent training sessions (1–7 sessions) were performed with simultaneous two-photon calcium imaging (control group). Chronic calcium imaging was performed with the genetically encoded calcium indicator GCaMP6s, which was introduced to the cells via a viral vector injection. The resulting permanent expression of the indicator allowed us to monitor the activity for many weeks. We selectively labeled layer 2-3 PN using a specific promotor (CaMKII). We implanted a chronic window over the M1 forelimb area that allowed us to image the activity for long time periods. For inhibiting the activity of VTA dopaminergic projections to M1 we used chemogenetic inhibition using DREADDs which was introduced to the cells via a viral vector injection. We selectively labeled dopaminergic neurons in the VTA using Cre-dependent viral expression in DAT-Cre mice, the virus that encodes for the inhibitory DREADDs (hM4D) was injected in the VTA of both control and manipulated groups and assignment to the two experimental groups was done later. In the manipulated group, CNO was injected for three consecutive training sessions (2, 3 and 4). Injections were performed locally to M1 via an access port67 which allowed specific inhibition of the dopaminergic projections from VTA to M1. In some cases (as indicated in the text), we also applied CNO in expert sessions to examine the effect of the inhibiting dopaminergic projections to M1 on movement.

In addition to the control and manipulated groups, we had four additional groups, which controlled for the CNO, the viral injection, the origin of dopaminergic afferents, and the spatial cortical location:

1. The Ringer group. In this group, we used mice that underwent the same surgery, with injections of GCaMP6s in M1 and with DREADDs in the VTA. However, in the manipulated sessions (second, third, and fourth) instead of injecting CNO locally to M1, we injected Ringer solution. 2. The sham group. Mice were injected with GCaMP6s in M1, however, instead of injecting DREADDs to the VTA, we injected a sham virus (pAAV-flex-tdTomato). During the manipulated sessions, this group was injected locally with CNO to M1. 3. The V1 group. This group was injected with DREADDs in the VTA, however, during the manipulated sessions instead of injecting CNO locally to M1, CNO was injected locally in V1 (coordinates 2.7 mm posterior and 2.5 mm lateral to Bregma), which created a spatial negative control. 4. The SNc group. In the SNc group, instead of injecting hM4D DREADDs to the VTA, DREADDs were injected to the SNc bilaterally (3.4 mm posterior, 1.5 mm lateral to Bregma; depth of 4000 µm, volume of 120 nl for each side). This group was injected with CNO locally to M1 during the manipulated sessions.

Viral injections and cranial window surgery

The procedures were performed on 2-3 months old mice, under isofloren anesthesia (4% for induction and 1.5–2% for maintenance during the surgery). Mice were secured on a stereotaxic apparatus (KOPF), and a heat pad was used to maintain body temperature at 36–37 °C. Pre-operation medications were administrated subcutaneously for analgesia, Ketoprofen (5 mg/Kg) buprenorphine (0.1 mg/Kg). The scalp was shaved and cleaned with ethanol. The skull surface was exposed after a subcutaneous injection of 2% Lidocaine. Viral injection was performed using a hydraulic manipulator (M0-10 Narishige) via thinned skull at two points: 1. M1 forelimb area10 (0.6 mm anterior and 1.6 mm lateral to Bregma) at three depths (150 µm, 250 µm, 350 µm, with 70 nl each depth). For calcium imaging at M1, we injected AAV.CamKII.Gcamp6s. 2. VTA bilaterally (3.1 mm posterior, 0.3 mm lateral to Bregma; depth of 4200 µm, volume of 120 nl for each side). For VTA inhibition we used DREADDs AAV2-hSyn-DIO-hM4D(Gi)-mCherry. After injections we performed a circular craniotomy over the M1 forelimb area (0.6 mm anterior and 1.6 mm lateral to Bregma) and implanted a 3 mm diameter laser-cut glass optical window with a 1 mm access hole which we sealed with biocompatible silicone for future local injections. The window was sealed with superglue and dental cement. Next, a custom-made 3D printed plastic headpost13 was affixed to the skull with dental cement. Mice were injected with ketoprophen (5 mg/kg) and buprenorphine (0.1 mg/kg) administered subcutaneously for 2-days post operation and were left to recover from the surgery for two weeks with ad libitum food and water.

Local CNO/ ringer injections

CNO or Ringer were injected unilaterally in M1 and contralateral to the hand performing the task immediately before starting the manipulated sessions. In the Ringer group, Ringer was injected locally to M1, while in the V1 group, CNO was injected locally to V1. The injections were done via an access port in the cranial window which was sealed with biocompatible silicone67 using hydraulic micromanipulator with a glass pipette that was inserted into the cortex with an angle to reach the specific imaged area in M1 (typically 250–300 µm from the dura). CNO was dissolved in saline to reach a concentration of 100 µM. 500 nl of CNO (100 µM) or of Ringer solution was injected according to the experimental group. The experiments started 5 min after the injection of the CNO and lasted for 30–40 min.

Behavioral training

After recovering from the surgery mice were restrained to 2.65–3 gr/day and ad libitum water. Training started when mice reached 80–90% of their initial body weight. Mice were habituated to head fixation in a custom-built apparatus13 in dark and quiet conditions, monitored by a webcam. We trained to reach for a food pellet (20 mg; Test Diet; St Louis, MO) from a rotating table placed in front of the animal and driven by either a NI USB DAQ device or a Teensy microcontroller driven by custom-made LabVIEW software. An auditory tone (200 ms, 1 kHz) was used as a cue during plate rotation. Initially, the rotating table was placed directly below the mouth allowing the animal to tongue reach the pellet upon the tone. We gradually placed the table in further position until the first successful hand reach. Thereafter, training was combined with two-photon calcium imaging in 7 subsequent sessions. The first training day with imaging was called “session 1”. Each training session consisted of ~50 trials. Each trial lasted 12 seconds; the food pellet comes to position 4 sec after trail start along with a tone. The pellet is available to the mouse for 8 sec.

Two-photon calcium imaging

Images (512 × 512 pixels) were acquired using a two-photon microscope (Bruker) equipped an 8 kHz bidirectional resonant galvo scanner and a Nikon 16X CFI Plan Fluorite objective (NA 0.8, 3 mm working distance), controlled by the software package PrairieView 5.3. GCaMP6s was excited at 940 nm using a femtosecond pulsed laser (InSight X3, Spectraphysics). Emission light was detected by a GaAsP photomultiplier tube (Hamamatsu).

Each trial was 12 sec with an auditory cue that was presented in the 4th second, with an imaging frame rate of 30 Hz. The same field of view was imaged over all sessions in the same animal. Behavioral performance was monitored at 200 Hz using two cameras (side and front view; Flea3 FL3-U3-13Y3M, PointGrey). The timing of two-photon calcium imaging, behavioral task, and video recording were coordinated via a National Instruments board (PCI-6110), using custom-made software written in MATLAB.

Histology

At the end of experiments, mice were deeply anesthetized and transcardially perfusion with PBS 0.1% followed by paraformaldehyde 4%. The brains were removed and put in fixative solution (paraformaldehyde 4%) for 48 hours, and later stored in PBS 1% solution. Coronal sections (100 µm thickness) were made. The sections were mounted on slides embedded with Fluoroshield containing DAPI (Sigma). Slices were scanned using a spinning disk confocal system (Olympus IXplore-Spin), equipped with a X20 objective (UPLXAPO 20x NA 0.8 WD 0.6 mm Olympus). The presented histology images (Fig. 1d; Supplementary Fig. 1) were acquired as z-stacks (at a definition of 1024 × 1024 pixels with pixel size of 0.646 µm and Z stack interval 0.67 µm) and presented as max-projection images. Injection sites were validated via histology for all manipulated animals.

Behavioral data analysis

We labeled the behavioral events (lift, reach, grab, supinate, at mouth and Back to perch) using a semi-supervised software37,38 as follows: lift – a vertical movement of a forelimb from the perch or the table, lasting till peak height is reached; reach – a horizontal movement towards the pellet; grab – starts when animals spread their fingers, ending when fingers are wrapped into a fist. Supinate – the forepaw is rotated outward while being brought towards the mouth; at mouth – fist is close to the mouth; back to perch - forepaw returns to perch.

To characterize behavior throughout training we segmented the data into sequences of behavioral events a sequence is defined as a series of consecutive events ending with an “at mouth” event or a “back to perch” event. Behavioral transitions were counted as the number of transitions between events within each sequence per trial at each training session.

Two-photon data analysis

The fluorescence data acquired by the two-photon microscope was first registered to correct for brain motion artifacts. Our registration method was based on68, using Fourier transform based correlation between two successive images. The maximal value position in the correlation image specifies the relative shift between the two images; we designate them \({u}_{t}\) and \({v}_{t}\). This method requires a template specification and matching against an image stack. The template image \({I}_{{temp}}(x,y)\) was defined as the average of all images in the selected trial over time.

$${I}_{{temp}}\left(x,y\right)=\frac{1}{N}\sum _{t=1..N}I\left(x-{u}_{t},y-{v}_{t},t\right)$$
(1)

The set \(\left\{{u}_{t},{v}_{t}\right\},t=1..N\) is an image shifted in the XY plane after alignment. We initially start with \({u}_{t}=0,{v}_{t}=0\) and then update their values according to the registration maxima. This procedure is repeated several times, when each time we compute the new template \({I}_{{temp}}(x,y)\) using previously computed \(\left\{{u}_{t},{v}_{t}\right\}\) for each image. Typically, this procedure converges after several iterations, in our case three iterations.

To align the imaging data over many trials, we used a similar technique, utilizing the previously computed averaged templates for each trial. For each trial k, we performed a single trial registration using the template algorithm for three iterations. To align the image data over many trials, we treated the final templates \({I}_{{temp}}^{k}(x,y)\) from each trial k as unaligned image data and repeated the same registration procedure to find offsets \(\left\{{u}_{k},{v}_{k}\right\}\) for each trial. These offsets, along with previously found offsets \(\left\{{u}_{t},{v}_{t}\right\}\), account for the final image shift in the XY plane.

Regions of interest (ROIs) were detected manually using average fluorescence images and \(\varDelta F/F\) projection images, which highlighted active neurons. The pixels within each ROI were averaged for every frame. The ROI “mask” was used to detect the same neurons on multiple imaging sessions on different days.

\(\varDelta F/F\) was computed using the following formula:

$$\Delta F/{F}_{n}[t]=\frac{{F}_{n}\left[t\right]-{Min}10({F}_{n}\left[t\right])}{{Min}10\left({F}_{n}\left[t\right]\right)+{Bias}}$$
(2)

\({Min}10\left({F}_{n}\left[t\right]\right)\) is a mean value of the lowest consecutive 10% values of the fluorescence signal \({F}_{n}\left[t\right]\). This minimal fluorescence value was calculated both per trial and across the entire experimental session, with no significant differences in the results with these two calculation variants. A small bias factor in the denominator prevented zeros when the cell was completely silent.

Statistical analysis

In each experiment, the mouse repeated the hand reach task several dozen times. The data were analyzed using custom Matlab code, except for training and testing the SVM classifiers41,69, for which we used the standard LIBSVM software70.

The imaging data collected at each session \(l\) are stored in a 3-dimensional matrix (tensor) \({X}^{l}\) of size \({N}_{r}\times t\times {T}^{l}\), where \({N}_{r}\) is the number of neurons, \(t\) is the number of time samples per trial, and \({T}^{l}\) is the number of trials. \({X}^{l}\left(i,j,k\right)\) is the neuronal activity of the \(i\)-th neuron at the \(j\)-th time sample in the \(k\)-th trial related to the \(l\)-th session, where \(l=1,\ldots,7\).

Activity of cells through training

The overall average activity of the entire population of cells per training session was obtain as:

$${\bar{x}}^{l}=\frac{1}{{T}^{l}{N}_{r}}\mathop{\sum }\limits_{i=1}^{{N}_{r}}\mathop{\sum }\limits_{j=1}^{t}\mathop{\sum }\limits_{k=1}^{{T}^{l}}{X}^{l}\left(i,j,k\right),l=1,\ldots,7$$
(3)

The average dynamics of cells in each session was obtained by averaging over trials as follows,

$${\bar{X}}^{l}\left(i,j\right)=\frac{1}{{T}^{l}}\mathop{\sum }\limits_{k=1}^{{T}^{l}}{X}^{l}\left(i,j,k\right),j=1,\ldots,t;i=1,\ldots,{N}_{r};l=1,\ldots,7$$
(4)

We obtained the correlation coefficient between the average dynamics of each cell in the 7th session with its own dynamics in all other sessions:

$${\rho }^{l}\left(i\right)=\frac{\sum _{j}\left({\bar{X}}^{l}\left(i,j\right)-{\bar{x}}^{l}\left(i\right)\right)\left({\bar{X}}^{7}\left(i,j\right)-{\bar{x}}^{7}\left(i\right)\right)}{\sqrt{\sum _{j}{\left({\bar{X}}^{l}\left(i,j\right)-{\bar{x}}^{l}\left(i\right)\right)}^{2}}\cdot \sqrt{\sum _{j}{\left({\bar{X}}^{7}\left(i,j\right)-{\bar{x}}^{7}\left(i\right)\right)}^{2}}}$$
(5)

where \({\bar{x}}^{l}\left(i\right)\) is the activity of the \(i\)-th neuron in the \(l\) session, averaged across time and trials.

SVM classification of ensemble activity

To further quantify the activity profile of the cellular ensemble throughout training, we trained a linear SVM classifier to separate neuronal activity in trials related to the first session from the activity in trials related to the 7th session. We used a sliding window of 1 sec with 0.5 sec hop. In every time window, we evaluated the average activity of each neuron at each trial,

$${\bar{x}}_{b}^{l}(i,k)=\frac{1}{B}\sum _{n\in {\rm{time}}\,{\rm{bin}}\,b}{X}^{l}(i,n,k)$$
(6)

where \(b\) is the index of time window and \(B\) is the window length measured in samples. The activity of the network in each time window for each trial was represented by the following \({N}_{r}\times 1\) vector:

$${\bar{x}}_{b,k}^{l}=\left(\begin{array}{c}{\bar{x}}_{b}^{l}\left(1,k\right)\\ {\bar{x}}_{b}^{l}\left(2,k\right)\\ \vdots \\ {\bar{x}}_{b}^{l}\left({N}_{r},k\right)\end{array}\right),k=1,2,...,T$$
(7)

For each time bin, we extracted the set \({\left\{{\bar{x}}_{b,k}\right\}}_{k=1}^{{T}^{l}}\), representing the averaged activity of the ensemble in all trials related to all 7 training sessions. We paired the activity of each trial with a label indicating whether it is associated with expert activity (7th session) or with beginner activity (1st, session):

$${y}_{k}=\left\{\begin{array}{cc}1 & {\text{trial}}\,{{k}}\,{\rm{is}}\; {\rm{related}}\; {\rm{to}}\; {\rm{expert}}\\ 0 & {\text{trial}}\,{k}\,{\rm{is}}\; {\rm{related}}\; {\rm{to}}\; {\rm{beginner}}\end{array}\right.$$
(8)

We trained a linear classifier (SVM) to separate expert from beginner, per time bin, using 10-fold cross validation. We then applied the trained classifiers to estimate trials from sessions 2–6. For the 1st and 7th sessions, we used the output of the cross-validation procedure. For each session we evaluated the fraction of trials identified as expert, per time window \(b\),

$${a}_{l}(b)=\frac{1}{\#{\rm{trials\,{on}}}\,{\rm{session}}\,{l}}\sum _{k\in \,{\rm{session}}\,s}{\hat{y}}_{b,k}$$
(9)

where \({\hat{y}}_{b,k}\) is the estimated label. Figure 2f presents mean ± SEM values of \({a}_{l}\left(b\right)\), averaged across animals, for a time window centered at tone+1 sec. Figure 2g presents the average values as a function of the time bin \(b\).

Generalized linear model (GLM)

We used a GLM to measure the relative contribution of behavioral-sensory signals to modeling the dynamics of the Calcium transients. To this end, we used a similar strategy to the previously published one13,71 where a model is trained to predict the calcium signal of each neuron on a trial-by-trial basis, based on behavioral-sensory data. The predictor signals were of four types:

Hand trajectories: Time series of hand trajectories and 4 fingers were extracted using DeepLabCut software39 from videos taken with side and front view cameras. We extracted x and y locations, altogether 20 predictors.

Orofacial features: Time series of orofacial motion variability features were extracted using FaceMap software40 from recorded videos. A single ROI was placed to include the face area, where the first 20 principal components were taken as predictors.

Time-varying and binary events: tone, lift, reach, grab, supinate, at mouth, and back to perch). Movement events were extracted using the modified JAABA software37,38.

Whole trial binary event: success/failure trial status (outcome). We convolved the time-varying binary events to model the time course of single-neuron calcium signals with a set of 7-degree-of-freedom regression splines. We used a set of 4 durations: 0.25 sec, 0.5 sec, 1 s, and 2 sec, all generated using the ‘bSpline’ package in R. We had 32 spline functions in total, which resulted in 32 × 7 = 192 convolved signals used as predictors. Altogether, we used 265 predictors (224 convolved predictors + 20 hand trajectories + 20 orofacial features + 1 whole binary success/failure outcome). We performed this analysis in the peri-movement time window (from one sec before until 2 s after the tone). The data for this analysis included 850 cells for the control group (pooled from 5 animals) and 606 cells for the manipulated group (also pooled from 5 animals). We trained a GLM per cell, per training session and accounted for cells for which the explained variance was at least 5%. We then evaluated the relative contribution of each component we trained a series of 9 partial models, each excluding predictors related to one category. We computed the contribution of each category as:

$$\alpha \left(i,c\right)=1-\frac{{R}_{p}^{2}\left(i,c\right)}{{R}_{f}^{2}\left(i\right)},c=1,\ldots,9$$
(10)

where \({R}_{f}^{2}\left(i\right)\) is the explained variance of the \(i\) -th cell using the full model (i.e., including all predictors) and \({R}_{p}^{2}\left(i,c\right)\) is the explained variance of the \(i\) -th cell based on the partial model (excluding the components related to the category c. All explained variance values were obtained using a 10-fold cross-validation process, where per fold 10% of trials were set aside as a test set. Figure 3e presents the relative contribution of each category evaluated as \(\alpha \left(i,c\right)/{\sum }_{c=1}^{9}\alpha \left(i,c\right)\).

Indicative neurons

We examined whether single neurons can reliably report success or failure by using a similar analysis for the detection of “indicative neurons” as we described in ref. 13.

We trained an SVM-classifier for the prediction of trial outcome based on the average activity of each neuron in a given time bin (window length 1 sec, window hop 0.5 sec). We then evaluated the accuracy rate using a 10-fold cross-validation process. We obtained the mean and STD values of the accuracy rate, per neuron and at each time window, by averaging over folds. To determine whether a neuron is indicative, we compared its respective mean accuracy rate to prior probability, which is the maximum between the fraction of successful trials and the fraction of failure trials. In each time window, we marked a neuron as indicative if its mean accuracy rate was higher than the prior probability, with a 99%.

PCA trajectories of network dynamics

We explore the evolution of the network in encoding outcome using Principal Component Analysis (PCA). We reshaped the imaging data tensor of a given session, \({X}^{l}\), into a 2-dimensional matrix of \({N}_{r}\times [t{T}^{l}]\), where each row consists of the neuronal activity of a specific cell across all time samples and trials:

$${\widetilde{X}}^{l}=\left(\begin{array}{ccc}{X}^{l}\left(1,\cdot,1\right) & \ldots & {X}^{l}\left(1,\cdot,{T}^{l}\right)\\ \vdots & \vdots & \vdots \\ {X}^{l}\left({N}_{r},\cdot,1\right) & \ldots & {X}^{l}\left({N}_{r},\cdot,{T}^{l}\right)\end{array}\right)$$
(11)

We computed the sample covariance of \({\widetilde{X}}^{l}\) and extracted its eigenvalue decomposition to obtain \(d\) principal components explaining 95% of the variance of \({\widetilde{X}}^{l}.\) Finally, we reshaped the new representation into a \(d\times t\times {T}^{l}\) tensor of the principal components of \({X}^{l}\). Figure 4b presents the first 3 principal components, averaged according to outcome, computed per session for two animals.

Significance in separation

Let \({\boldsymbol{S}}_{{suc}}\) be a \(d\times t\times {T}^{{suc}}\) tensor of all trajectories related to successful trials in each session, and \({\boldsymbol{S}}_{{fail}}\), a \(d\times t\times {T}^{{fail}}\) be the corresponding tensor for failed trials. First, we computed the average trajectories for success and for failure, \({\boldsymbol{\mu }}_{{suc}}(n)\) and \({\boldsymbol{\mu }}_{{fail}}(n)\), and the corresponding sample covariance matrices, \({\Sigma }_{{suc}}(n),{\Sigma }_{{fail}}(n)\):

$${\boldsymbol{\mu }}_{i}\left(n\right)=\frac{1}{{T}^{i}}\mathop{\sum }\limits_{k=1}^{{T}^{i}}{S}_{i}( \cdot,n,k)$$
(12)
$${\Sigma }_{i}\left(n\right)=\frac{1}{{T}^{i}}\mathop{\sum }\limits_{k=1}^{{T}^{i}}\left({S}_{i}\left(\cdot,n,k\right)-{\mu }_{i}\left(n\right)\right){\left({S}_{i}\left(\cdot,n,k\right)-{\mu }_{i}\left(n\right)\right)}^{T}i=\left\{{suc},{fail}\right\}$$
(13)

Second, we calculated the separation between success and failure dynamics using:

$$ {d}_{{suc},{fail}}(n)=\\ {\left({\mu }_{{suc}}(n)-{\mu }_{{fail}}(n)\right)}^{T}{\left(\left({\Sigma }_{{suc}}(n)+{\Sigma }_{{fail}}(n)\right)/2\right)}^{-1}\left({\mu }_{{suc}}(n)-{\mu }_{{fail}}(n)\right)$$
(14)

Third, we shuffled the trial outcome labels and then re-computed the distances. We repeated this computation 1000 times per session and obtained the empirical probability for the randomized distance to be larger than the actual distance, denoted by \(\hat{p}\left(n\right)\). Finally, for each training session \(l\), and time point \(n\), we counted how many animals yielded a significant difference, i.e., \(\hat{p}\left(n\right)\) < 0.05, as presented in Fig. 4c.

Classification of outcome per training session

We further quantified outcome separation emerging with training based on the predictive power of the imaging data tensor \({{\boldsymbol{X}}}^{l}\) using a linear classifier. The classification accuracy was evaluated for each training session, using a sliding window of 1 second with 0.5-second hop. In every time window \(b\), we evaluated the average activity of each neuron at each trial, \({\bar{x}}_{b,k}^{l}\), as defined above. We then trained a binary SVM classifier (linear) to separate successful trials from failed trials, per time window. Training and testing were performed using a 10-fold cross-validation procedure. As success rate varies with training and across animals, we evaluated the decoding power of the trained classifiers as the difference between accuracy and chance level, calculated per training session.

Classification of outcome using expert model

We applied the SVM models trained to separate success from failure based on the activity of expert animals, at the last time window (go cue+7 seconds) to data extracted from previous sessions (last time window as well).

Riemannian centroid of correlation matrices

We computed the pairwise Pearson’s correlation coefficients based on the activity of cells at each trial:

$$\begin{array}{c}{C}_{i,j,k}^{l}=\frac{\sum _{n}\left({X}^{l}\left(i,n,k\right)-{\bar{X}}^{l}\left(i,k\right)\right)\left({X}^{l}\left(j,n,k\right)-{\bar{X}}^{l}\left(j,k\right)\right)}{\sqrt{\sum _{n}{\left({X}^{l}\left(i,n,k\right)-{\bar{X}}^{l}\left(i,k\right)\right)}^{2}}\cdot\sqrt{\sum _{n}{\left({X}^{l}\left(j,n,k\right)-{\bar{X}}^{l}\left(j,k\right)\right)}^{2}}}\\ k=1,2,...,{T}^{l}\;\;\;\;\;\;\;\;i,j=1,2,\ldots,{N}_{r}\;\;\;\;\;\;\;\;l=1,\ldots,7\end{array}$$
(15)

where \({{\rm{C}}}_{k}^{l}\) is the Pearson correlation matrix of the \(k-\) th trial in the \(l\)-th training session. The resulting Pearson correlation matrices \({\left\{{{\rm{C}}}_{k}^{l}\right\}}_{k=1}^{{T}^{l}}\) are Symmetric Positive Semi Definite (SPSD) matrices and therefore do not obey Euclidean geometry. To correctly evaluate the overall connectivity in a given session and compare connectivity across sessions, we utilized the particular geometry proposed in ref. 30 and ref. 32 as follows. Consider two Pearson correlation matrices \({C}_{k}\) and \({{\rm{C}}}_{l}\) of rank r. We decomposed each of these matrices such that:

$${{\mathbf{C}}}_{{\rm{k}}}={{\mathbf{G}}}_{{\rm{k}}}{{\mathbf{P}}}_{{\rm{k}}}{{\mathbf{G}}}_{{\rm{k}}}^{T}$$
(16)
$${{\mathbf{C}}}_{{\rm{l}}}={{\mathbf{G}}}_{{\rm{l}}}{{\mathbf{P}}}_{{\rm{l}}}{{\mathbf{G}}}_{{\rm{l}}}^{T}$$
(17)

where \({{\rm{G}}}_{{\rm{k}}}\) and \({G}_{{\rm{l}}}\) are \({N}_{r}\times r\) matrices with orthogonal columns that lie on Grassmann manifold72 and \({{\rm{P}}}_{{\rm{k}}}\) and \({{\rm{P}}}_{{\rm{l}}}\) are \(r\times r\) Symmetric Positive Definite (SPD) matrices that lie on the SPD cone manifold31. The SPSD distance proposed in30 is given by:

$${d}_{{\mathcal{S}}}\left({{\rm{C}}}_{{\rm{k}}},{{\rm{C}}}_{{\rm{l}}}\right)={d}_{{\mathcal{G}}}\left({{\rm{G}}}_{{\rm{k}}},{{\rm{G}}}_{{\rm{l}}}\right)+\alpha {d}_{{\mathcal{P}}}({{\rm{P}}}_{{\rm{k}}},{{\rm{P}}}_{{\rm{l}}})$$
(18)

where \(\alpha \, > \, 0\) is a tunable hyperparameter, \({d}_{{\mathcal{G}}}\) is the Riemannian distance on the Grassmann manifold72, and \({d}_{{\mathcal{P}}}\) is the Riemannian distance on the SPD manifold31. The centroid of the Pearson correlation matrices in each training session was calculated as their Frechet mean32:

$${\bar{C}}^{l}={{\arg }}\mathop{\min }\limits_{{\rm{C}}}{\sum }_{k}{d}_{{\mathcal{S}}}^{2}\left({\rm{C}},{{\rm{C}}}_{k}^{l}\right)$$
(19)

Diffusion embedding of SPSD matrices

We made use of this Riemannian distance to illustrate the development of the correlations of cells along the training sessions, by incorporating it into Diffusion Maps46, a non-linear dimensionality-reduction algorithm. Using Diffusion Maps, the collection of Pearson correlation matrices \({\left\{{{\boldsymbol{C}}}_{k}\right\}}_{k=1}^{T},T=\sum _{l}{T}^{l}\) corresponding to trials from all experiments were embedded into a Euclidean space. First, we constructed a \(T\times T\) affinity matrix, whose elements are given by

$${\boldsymbol{W}}({\boldsymbol{k }},{\boldsymbol{l}})=\exp \left\{-\frac{{\left({d}_{{\mathcal{S}}}\left({C}_{k},{{\rm{C}}}_{l}\right)\right)}^{2}}{{\sigma }^{2}}\right\}$$
(20)

where \(\sigma\) is a scaling tunable parameter. This affinity matrix could be viewed as the weight matrix of the edges of graph, whose nodes are the trials. By normalizing the rows of \({\boldsymbol{W}}\), we obtained a row stochastic matrix \({\boldsymbol{P}}\),

$$\begin{array}{cc}{\boldsymbol{P}}={\boldsymbol{D}}^{-1}{\boldsymbol{W}},{\boldsymbol{D}}(k,k)=\mathop{\sum }\limits_{l=1}^{T}W\end{array}(k,l)$$
(21)

Then, we applied eigenvalue decomposition to \({\boldsymbol{P}}\),

$$\begin{array}{cc}{\boldsymbol{P}{\psi }}_{\ell }={\lambda }_{\ell }{\boldsymbol{\psi }}_{\ell },& \ell=0,...,T-1\end{array}$$
(22)

where \({{\mathbf{\psi }}}_{\ell }\) and \({\lambda }_{\ell }\) are the right eigenvectors and eigenvalues of \({\boldsymbol{P}}\). Finally, we defined \({\Psi }_{k}\) as the diffusion map of the \(k\) th trial to a Euclidean space \({R}^{d}\) by

$${{\mathbf{\Psi }}}_{k}=\left(\begin{array}{c}{\lambda }_{1}{\psi }_{1}\left(k\right)\\ {\lambda }_{2}{\psi }_{2}\left(k\right)\\ \vdots \\ {\lambda }_{d}{\psi }_{d}\left(k\right)\end{array}\right)\in {{\mathbb{R}}}^{d}$$
(23)

where \(d\le T-1\). The Euclidean distance between trials in the embedded space constitutes the diffusion distance, which is related to a distance between the respective transition probabilities on the graph.

Riemannian similarity

We quantified the distance between centroids related to different training sessions using the SPSD distance, \({d}_{{\mathcal{S}}}({\bar{{\bf{C}}}}^{l},{\bar{{\boldsymbol{C}}}}^{m}),l,m=1,\ldots,7\), described above. As the number of cells greatly affect the dynamic range of these distances, we z-scored each matrix (per animal) to produce \({\widetilde{d}}_{{\mathcal{S}}}({\bar{{\bf{C}}}}^{l},{\bar{{\boldsymbol{C}}}}^{m})\). Finally, the similarity between centroids was obtained as:

$${{\rm{R}}}_{{sim}}\left(l,m\right)=\exp \left\{-{\widetilde{d}}_{{\mathcal{S}}}\left({\bar{{\bf{C}}}}^{l},{\bar{{\bf{C}}}}^{m}\right)\right\}$$
(24)

Degree similarity

We obtained the degree of each neuron per training session \(l\):

$${d}_{i}^{l}=\sum _{j\ne i}{\bar{C}}_{i,j}^{l}$$
(25)

We calculated the distances between degree vectors \({{\boldsymbol{d}}}^{l}\) related to different days as:

$${\rm{D}}(l,m)={\left\Vert {\boldsymbol{d}}^{l}-{\boldsymbol{d}}^{m}\right\Vert }_{2}^{2},\;l,m=1,\ldots,7$$
(26)

We then z-scored each matrix \({\rm{D}}\) per animal to obtain \(\widetilde{{\rm{D}}}\) and extracted the degree similarity presented in Fig. 5f, g as:

$${{\rm{Degree}}}_{{sim}}\left(l,m\right)=\exp \left\{-\widetilde{{\rm{D}}}\left(l,m\right)\right\}$$
(27)

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.