Skip to article frontmatterSkip to article content

Spiking neural network models of sound localisation via a massively collaborative process

Authors
Affiliations
Laboratoire Jean Perrin, Institut de Biologie Paris-Seine, CNRS, Sorbonne Université, Paris, France
Department of Electrical and Electronic Engineering, Imperial College London, United Kingdom
School of Psychological Science, University of Bristol, United Kingdom
Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milano, Italy
Department of Data Analysis, Ghent University, Ghent, Belgium
Faculty of Brain Sciences, University College London, United Kingdom
School of Engineering and Informatics, University of Sussex, Falmer, Brighton, United Kingdom
Centre for Neural Dynamics and Artificial Intelligence, University of Ottawa, Ottawa, Ontario, Canada
Department of Physics, University of Ottawa, Ottawa, Ontario, Canada
Department of Electrical and Electronic Engineering, Imperial College London, United Kingdom
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
COMBYNE lab, University of Surrey, Guildford, United Kingdom
Department of Electrical and Electronic Engineering, Imperial College London, United Kingdom
Department of Engineering, University of Cambridge, United Kingdom
Max Planck Institute for Brain Research, Frankfurt, Germany
School of Life Sciences, Technical University of Munich, Freising, Germany
Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
Department of Applied Mathematics, University of Washington, Seattle, WA, USA
Edmond and Lily Safra Center for Brain Sciences (ELSC), The Hebrew University of Jerusalem, Jerusalem 91904, Israel
Gatsby Computational Neuroscience Unit, University College London, London, UK
Laboratoire Jean Perrin, Institut de Biologie Paris-Seine, CNRS, Sorbonne Université, Paris, France
Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milano, Italy
Department of Electrical and Electronic Engineering, Imperial College London, United Kingdom

Abstract

Neuroscientists are increasingly initiating large-scale collaborations which bring together tens to hundreds of researchers, and make their data, methods and results publicly available. This approach not only allows for larger scale problems to be tackled, but also enables contributions from a wider community including more diverse perspectives. Inspired by projects in pure mathematics, we set out to test the feasibility of making large-scale collaborative neuroscience even more inclusive by running a grassroots, massively collaborative project in computational neuroscience. The key difference with previous approaches in neuroscience was that anyone was welcome to join at any time. We launched a public Git repository, with code for training spiking neural networks to solve a sound localisation task via surrogate gradient descent. We then invited anyone, anywhere to use this code as a springboard for exploring questions of interest to them, and encouraged participants to share their work both asynchronously through Git and synchronously at monthly online workshops. The aim was to use the diversity of perspectives to make new discoveries that a single team would have been unlikely to find. At a scientific level, our work investigated how a range of biologically-relevant parameters, from time delays to membrane time constants and levels of inhibition, could impact sound localisation in networks of spiking units. At a more macro-level, our project brought together 31 researchers from multiple countries, provided hands-on research experience to early career participants, and opportunities for supervision and teaching to later career participants. Although the scientific results were not groundbreaking in this pilot project, looking ahead, our project provides a glimpse of what open, collaborative science without borders could look like, and provides a necessary, tentative step towards it.

Introduction

Inspired by the success of endeavours like the Human Genome Project and CERN, neuroscientists are increasingly initiating large-scale collaborations. The largest efforts, such as the International Brain Laboratory Abbott et al., 2017Wool, 2020, The Blue Brain Project and Human Brain Project bring together tens to hundreds of researchers across multiple laboratories. These projects have generated scientific insights, large-scale datasets, tools and educational materials, and represent a step-change in scale. Organisationally, they follow a formal collaborative model with open outputs. That is, there are participating laboratories who collaborate together and then make their data, methods and results available. As such, to participate, individuals must join a participating laboratory, initiate a collaboration with the project, or wait for the publication of data and resources. So, is it possible to have an informal grassroots collaborative model where anyone can join at any time and all work is done in the open throughout the project Mainen et al., 2016? Such an approach, by reducing the barriers to entry, may help to identify so-called “lost Einsteins” Bell et al., 2019, people who may have made an important discovery except that they never had the opportunity to do so.

One alternative is bench marking contests, in which participants compete to obtain the best score on a specific task. Bench marking contests have driven progress in fields from computer vision Deng et al., 2009 to protein folding, and have begun to enter neuroscience. For example, in Brain-Score Schrimpf et al., 2018Schrimpf et al., 2020 participants submit models, capable of completing a visual processing task, which are then ranked according to a quantitative metric. As participants can compete both remotely and independently, these contests offer a low barrier to entry. However, defining quantifiable endpoints for neuroscientific questions remains challenging Bowers et al., 2022.

Another alternative is massively collaborative projects, in which a problem is openly stated and contributions are welcomed from anyone willing to participate. For example, in the Polymath Project unsolved mathematical problems are posed, and then participants share comments, ideas and equations online as they collectively work towards solutions. Similarly, the Busy Beaver Challenge recently announced a formal proof of a conjecture that was open for decades, based mainly on online contributions from amateur mathematicians. Inspired by this approach, we founded COMOB (Collaborative Modelling of the Brain) - to explore if and how this collaborative model could be leveraged in neuroscience.

Here, we share our experiences and results. We start by detailing how we ran the project in terms of organisation and infrastructure in Towards open collaborative science. We then briefly summarise our scientific results in Training SNNs for sound localisation. We conclude the main text with a Discussion of what went well, what went wrong, and how future projects like this could learn from our experiences. Finally, in the Appendices we provide detailed write-ups of our scientific results.

Towards open collaborative science

Workflow

Our project was launched at a tutorial at the Cosyne conference (2022) for which we provided video lectures and code online Goodman et al., 2022. Students in that tutorial were invited to take part in an ongoing research project starting from the material covered in the tutorial, and this was also opened to anyone in the world and advertised via mailing lists and social media. Participants joining the project were encouraged to review the material from the tutorial, and then to work through an introductory Jupyter Notebook Kluyver et al., 2016 containing Python code, figures and markdown text, which taught them how to train a spiking neural network to perform a sound localisation task. Participants were then directed to our website where we maintained a list of open scientific and technical questions for inspiration. For example, how does the use of different neuron models impact network performance and can we learn input delays with gradient descent? Then, with a proposed or novel question in hand, participants were free to approach their work as they wished. In practice, much like a “typical” research project, most work was conducted individually, shared at monthly online meetings and then iteratively improved upon. For example, several early career researchers tackled questions full-time as their dissertation or thesis work and benefited from external input at monthly workshops.

Our workflow. To on-board new participants we provided text, videos and code outlining our scientific starting point. This material formed a springboard for participants to pursue research either individually or in small groups. We then iteratively improved on this work through monthly online workshops and by writing this paper together through an open, collaborative process.

Figure 1:Our workflow. To on-board new participants we provided text, videos and code outlining our scientific starting point. This material formed a springboard for participants to pursue research either individually or in small groups. We then iteratively improved on this work through monthly online workshops and by writing this paper together through an open, collaborative process.

We consciously decided on this free-form structure to experiment with the feasibility of a more bottom-up approach to doing team science, with minimal top-down supervision. We discuss the advantages and disadvantages of this approach in the Discussion.

Infrastructure

Code

We provided a Starting Notebook to give participants an easy way to get started. This is a Python-based Jupyter Notebook Kluyver et al., 2016, an interactive cell-based environment which allows a mixture of text, image and code cells to be weaved together, combined with an easy user interface. Participants could either work locally using their own Python distribution, or using a cloud compute service such as Google Colab. We choose Google Colab to minimise the entry barrier for participation, as it is a free service (although blocked in certain countries unfortunately) where all the software packages needed are pre-installed, meaning it takes users only a few seconds to go from reading about the project to running the code.

Our starting notebook used a combination of NumPy Harris et al., 2020, Matplotlib Hunter, 2007, and PyTorch Paszke et al., 2019. The code for surrogate gradient descent was based on Friedemann Zenke’s SPyTorch tutorial Zenke, 2019Zenke & Ganguli, 2018.

Note that we didn’t require participants to use our starting notebook, and indeed in Contralateral glycinergic inhibition as a key factor in creating ITD sensitivity, De Santis and Antonietti implemented a very different sound localization model from scratch.

GitHub

Like many open-source efforts, our public GitHub repository was the heart of our project. This provided us with three main benefits. First, it made joining the project as simple as cloning and committing to the repository. Second, it allowed us to collaborate asynchronously. That is, we could easily complete work in our own time, and then share it with the group later. Third, it allowed us to track contributions to the project. Measured in this way, 28 individuals contributed to the project. However, interpreting this number is challenging, as these contributions vary significantly in size, and participants who worked in pairs or small groups, often contributed under a single username. We return to this point in the Discussion.

Website via MyST Markdown

For those interested in pursuing a similar project our repository can easily be used as a template. It consists of a collection of documents written in Markdown and executable Jupyter Notebooks Kluyver et al., 2016 containing all the code for the project. Each time the repository is updated, GitHub automatically builds these documents and notebooks into a website so that the current state of the project can be seen by simply navigating to the project website. We used MyST Markdown to automate this process with minimal effort. This paper itself was written using these tools and was publicly visible throughout the project write-up (with the exception of the response to reviewers’ comments).

Teaching with this framework

This project emerged from a tutorial, and, although the project was primarily research-oriented, the code remains well suited for teaching several concepts from across neuroscience. As such, we integrated our project into a Physics MSc course on Biophysics and Neural Circuits. Working individually or in pairs, students actively engaged by adjusting network parameters and modifying the provided code to test their own hypotheses. Later, brief progress report presentations stimulated dynamic discussions in class, as all students, while working on the same project and code, pursued different hypotheses. We found that this setup naturally piqued interest in their peers’ presentations, enhanced their understanding of various project applications, and facilitated collaborative learning. Moreover, it allowed for engagement from students at a range of skill levels, and helped bridge the gap between teaching and research. For those interested in teaching with this framework, introductory slides and a dedicated Python notebook are available on our GitHub repository.

Training SNNs for sound localisation

Introduction

In the Cosyne tutorial Goodman et al., 2022 on spiking neural networks (SNNs) that launched this project, we used a sound localisation task. Reasoning that sound localisation requires the precise temporal processing of spikes at which these networks would excel.

Animals localise sounds by detecting location- or direction-specific cues in the signals that arrive at their ears. Some of the most important sources of cues (although not the only ones) come from differences in the signals between two ears, including both level and timing differences. Respectively, termed interaural level difference (ILD) and interaural timing difference (ITD). In some cases humans are able to detect arrival time differences as small as 20 μs.

The classic model of ITD sensitivity is the delay line model of Jeffress (1948) in which an array of binaural coincidence detector neurons receive inputs from the two ears with different delays. When a neuron’s delays exactly match the acoustic delays induced by the sound location, it will be maximally active. Therefore, the identity of the most active neuron indicates the direction of the sound. This model is widely accepted, though was shown to be inefficient with respect to neural noise by McAlpine & Grothe (2003), who proposed an alternative model based on the two binaural hemispheres average firing rates - which is optimally robust to neural noise. However, Goodman et al. (2013) showed that these models perform too poorly to account for behavioural data, especially in situations where sounds had complex and unknown spectral properties, or in the presence of background noise, and proposed an alternative based on a perceptron-like neural network - which is both robust to neural noise and performed well across a range of conditions.

Building on this literature, and our Cosyne tutorial, the starting point of our project was to ask: what solutions would you find if you directly optimised a spiking neural network to localise sounds? Two things are worth noting here. Firstly, this may be very different to the optimal solutions you would find in the unconstrained space of computational solutions, or even of neural network-based solutions. This literature is reviewed in Grumiaux et al. (2022), including classical engineering approaches such as beamforming, and deep learning approaches including convolutional neural networks, recurrent neural networks and increasingly attention-based networks as used in large language models. Secondly, our setup is very limited in terms of the available cues and network structure: we only use pure tones we have no spectral or cross-frequency cues, we fix the level so we have no interaural level differences, etc. We would not necessarily expect this approach to explain a wide range of observed phenomena, but it may still throw light on some fundamental aspects of interaural time or phase difference circuits. With these restrictions in place, we can ask several questions about the optimal spike-based solutions we find. How would those solutions depend on the available neural mechanisms and statistics of the sound? Could we understand the solutions found? What properties would the solution have in terms of robustness to noise, generalisation, and so forth? And could the solutions found by optimisation throw light on features found in the auditory systems of different animals?

A simple spiking neural network model

We started with a simple, spiking neural network model (described in more detail in Methods) trained to solve a highly abstracted sound localisation task.

Overall model architecture. Sinusoidal audio signals are passed through two populations of units, with a range of preassigned phase delays, representing the left and right ears. These units generate Poisson spike trains which pass forward to a layer of leaky-integrate and fire (LIF) units, then a layer of leaky-integrator output units from which we readout the networks estimate of the interaural phase difference (IPD).

Figure 2:Overall model architecture. Sinusoidal audio signals are passed through two populations of units, with a range of preassigned phase delays, representing the left and right ears. These units generate Poisson spike trains which pass forward to a layer of leaky-integrate and fire (LIF) units, then a layer of leaky-integrator output units from which we readout the networks estimate of the interaural phase difference (IPD).

The task is to estimate the interaural phase difference (IPD) of a pure tone (sine wave) at a fixed frequency. This is equivalent to estimating the ITD, as the ITD is ambiguous for a sine wave. The model consists of three layers. First, a layer of spiking input neurons which fire spikes according to a Poisson process with a time-varying rate determined by the input stimulus, a simple model of the auditory nerve. This layer is divided into two subpopulations corresponding to the two ears, with signals to one ear delayed with respect to the other. Each neuron within a subpopulation has a different phase delay. Next, comes a single hidden layer of leaky integrate-and-fire (LIF) neurons, and an output layer of leaky, non-spiking neurons. Each output neuron is associated to a particular IPD, and the estimated IPD of the model is the identity of the most active output neuron.

The input neurons are all-to-all connected to the layer of hidden neurons via a trainable weight matrix. In this way, during training the model is free to select the neurons with the appropriate phase delays to generate the desired properties for the hidden layer neurons. This lets the model learn to make use of delays without having to directly implement trainable delays, as this is a challenging problem (which we tackled later in Learning delays).

Using this setup, we successfully trained SNNs on this task, and found that accuracy increased as we reduced the membrane time constant of the units in the hidden layer (Improving Performance: Optimizing the membrane time constant). This initially suggested that coincidence detection played an important role. However, further analysis in (described in more detail in A minimal trainable model of IPD processing) showed that in fact, the network was not using a coincidence detection strategy, or indeed a spike timing strategy. Rather, it appears to be using an approach similar to the equalisation-cancellation theory Durlach, 1963Culling, 2020 by subtracting various pairs of signals to find the point where they approximately cancel. Careful analysis of the trained model showed that it could be extremely well approximated by a 6-parameter model that is quite easy to describe, but does not obviously correspond to any known features of the auditory system.

As an alternative approach, we also used Tensor Component Analysis (TCA) Williams et al., 2018 to explore the spiking activity of this model, and to compare it across multiple trained networks (Tensor component analysis).

Building on this base model, we explored two main questions: how changing the neuron model alters the network’s behaviour and how the phase delays (within each ear) can be learned.

Alternative neuron models

Dale’s principle

In biological networks most neurons release the same set of transmitters from all of their synapses, and so can be broadly be considered to be excitatory or inhibitory to their post-synaptic partners; a phenomenon known as Dale’s principle Dale, 1935Strata & Harvey, 1999. In contrast, most neural network models, including our base model, allow single units to have both positive and negative output weights.

To test the impact of restricting units to being either excitatory or inhibitory, we trained our base model across a range of inhibitory:excitatory unit ratios, and tested it’s performance on unseen, test data (Sound localisation following Dale’ law). We found that networks which balanced excitation and inhibition performed significantly better than both inhibition-only networks - which perform at chance level as no spikes propagate forward, and excitation-only networks - which were roughly 30% less accurate than balanced networks.

To understand where in the network inhibition is required, we then trained a second set of networks in which we forced either the input or hidden units to be all excitatory, and set the remaining units to be half inhibitory and half excitatory. Networks with all excitatory hidden units performed as well as networks with balanced units, while networks with purely excitatory inputs performed significantly worse, demonstrating a role for inhibition in the input-hidden connections / delay lines.

Inhibition therefore plays an important role in this model, in line with experimental data that shows that blocking inhibition eliminates ITD-sensitivity in the medial superior olive Brand et al., 2002Pecka et al., 2008.

Filter-and-fire

Unlike most point neuron models, in which pairs are connected by a single weight, many biological neurons make multiple contacts with their post-synaptic partners at different points along their dendritic tree. These contacts evoke post-synaptic potentials (PSPs) with distinct temporal dynamics, depending on their distance from the soma, with distal/proximal contacts inducing prolonged/brief PSPs. These features are captured by the filter-and-fire neuron model (F&F) Beniaguev et al., 2024, in which units make multiple contacts with their partners and each input is convolved with a distance-from-soma dependent synaptic filter. While networks of F&F units outperform networks of LIF units on a temporal version of MNIST, we hypothesised that this difference would be magnified in our sound localisation task, given it’s natural temporal structure. We found that while training performance was increased using the F&F model, test performance was much worse, suggesting overfitting.

Learning delays

As in our base model, many studies incorporate axonal and/or dendritic delays as non-learnable parameters. Here, we explore how these phase delays, as well as synaptic delays, can be learned through two approaches.

The first method was to develop a differentiable delay layer (DDL). This method uses a combination of translation and interpolation, where the interpolation allows the delays to be differentiable even though time steps are discrete. This can be placed between any two layers in a spiking neural network, and is capable of solving the task without weight training. This work is described in more detail in Learning delays.

While we were developing our DDL-based method, a paper introducing synaptic delays using dilated convolutions with learnable spacings (DCLS) was published Khalfaoui-Hassani et al., 2023Hammouamri et al., 2024, prompting us to explore this approach as well. This method also relies on interpolation and is very similar to the DDL method, serving as a generalization for synaptic delays. It uses a 1D convolution through time to simulate delays between consecutive layers. The kernels include a single non-zero weight per synapse, which corresponds to the desired delay. This method co-trains weights and delays.

We found that both methods performed well and eliminated the artificial phase delays introduced in the basic model, although the DCLS reached a lower level of error overall.

Detailed inhibition-based model

Finally, we developed a more detailed model in which we used over 170,000 units, with conductance-based synapses, to approximate the structure of the mammalian brainstem circuit (see more details in Contralateral glycinergic inhibition as a key factor in creating ITD sensitivity).

In short, input spectrograms representing sounds at azimuth angles from -90° to +90° were converted into spikes, then passed forward to populations representing the globular and spherical bushy cells, and subsequently the lateral and medial superior olivary nuclei, from which we readout sound source angle predictions. Note that, unlike the work with our base model, we used no learnable parameters in this model, and instead based parameters on neurophysiological data. For example, the MSO units had excitatory inputs from both the ipsi and contralateral SBCs and dominant inhibition from contralateral GBCs.

This model generated realistic tuning curves for lateral and medial superior olive (LSO and MSO) neurons. Moreover, removing inhibition shifted ITD sensitivity to the midline, as in Brand et al., 2002Pecka et al., 2008.

Discussion

What went well

The decision to start from the code base of the Cosyne tutorial Goodman et al., 2022 was very helpful. It meant that users had a clear entry path for the project without needing prior expertise, and a code base that was designed to be easy to understand. In addition, the popularity of the tutorial (over 38k views on YouTube at the time of writing) meant that many people heard about this project and were interested in participating. In addition, the GitHub-based infrastructure allowed for asynchronous work our website, that was automatically updated each time anyone made a change to their code or to the text of the paper, allowed for easy sharing of results.

By providing models which used spiking neurons to transform sensory inputs into behavioural outputs, participants were free to explore in virtually any direction they wished, much like an open-world or sandbox video game. Indeed over the course of the project we explored the full sensory-motor transformation from manipulating the nature of the input signals to perturbing unit activity and assessing network behaviour. Consequently, in addition to it’s role in research, our code forms an excellent basis for teaching, as concepts from across neuroscience can be introduced and then implemented in class. In this direction, we integrated our project into two university courses and provide slides and a highly annotated python notebook, for those interested in teaching with these models.

Beyond providing teaching and hands-on research experience, the project also offered many opportunities for participants to improve their “soft” scientific skills. For early career researchers (undergraduate and master’s students) these included learning how to work with Git, collaborate with researchers from diverse countries and career stages, and contribute to a scientific publication. For later career researchers (PhD, Postdoc) the project provided many supervision and leadership opportunities. For example, during online workshops, later career participants were encouraged to lead small groups focussed on tackling specific questions.

What went wrong

While our sandbox design offered several advantages (discussed above), the open nature of the process did present three challenges. Our first challenge was standardising work across participants; for example, ensuring that everyone used the same code and hyperparameters. Along these lines, future projects would benefit from having participants dedicated to maintaining the code base and standardising participants work.

Our second challenge, was the project’s exploratory nature. While this appealed to many participants, the lack of a clear goal or end-point may have been off-putting to others. For future efforts, one alternative would be to define clear goals a priori, however if handled carelessly, this runs the risk of reducing to a to-do list passed from more senior to more junior researchers. A more appealing alternative could be to structure the project in clearly defined phases. For example, early months reviewing literature could be followed by a period of proposals and question refinement, before a final stretch of research. Another alternative, would be to begin by collecting project proposals and allowing participants to vote on a project to pursue. This could be done by having each participant check a box for each project they would choose to work on if it were selected, and then selecting the project with the most checks (thereby maximising the expected number of participants). Multiple projects could even be launched simultaneously if this approach proved popular.

A third challenge, which arose towards the end of the project, was how to fairly assign credit. We had initially - and perhaps somewhat idealistically - stated that anyone who contributed to the project, either by writing code or participating in one of the workshops, would be included on the author list. To the extent that it was possible, we have followed through with this, though we were simply unable to contact several of the participants and so could not include them as authors. Another issue with this system is that participants with unequal contributions, e.g. attending a workshop vs contributing an entire section of the paper, would be assigned similar credit, i.e. authorship. To resolve this, we experimented with using the number or size of GitHub commits to order authors, however we found that these metrics did not accurately reflect contributions. For example, it may be quicker to commit a large-amount of low quality text than a concise well written section, and similarly there is no good reason to distinguish between two authors who submit the same amount of work through a different number of commits. We attempted to address this challenge by providing a contributions table (Contributors) and agreeing an author order. This order was agreed on unanimously, though could easily cause issues in other projects. Consequently, we recommend that a strategy for credit assignment be determined collaboratively at the start of the project, and made explicit so that participants can clearly understand how their contribution will translate to credit. Alternatively, such projects could publish under a pseudonym, e.g. COMOB.

Ultimately, while the project explored many interesting directions, which will form the basis for future work, we did not reach a point where we could draw strong scientific conclusions about sound localization. From group discussions we concluded that this is likely due to the free-form nature of our project, which would have benefited from a more coordinated approach. The question is, how to do this without compromising the ideals of a grass-roots project? Extending the voting idea above, one approach would be to make the proposer of the democratically selected project responsible for making sure that results are comparable and generally keeping the project on the right track. A role similar to a traditional supervisor, but with the critical difference that they are elected by their peers and only on a project by project basis.

Conclusions

This paper does not present a scientific breakthrough. However, it does demonstrate the feasibility of open research projects which bring together large number of participants across countries and career stages to work together collaboratively on scientific projects. Looking ahead, we hope that by lowering the barrier to entry, these projects will welcome a wider and more diverse set of expertise and perspectives, generating new ideas and leading to discoveries beyond what any single group could realise.

Contributors

Table 1:Contributors, ordered by GitHub commits as of 2024-07-16.

NameGitHubContribution
Tomas Fiers@tfiersBuilt the website infrastructure, created Figure 2 based on Dan’s sketch of the model architecture.
Dan Goodman@thesamovarConceived the project, wrote the paper, wrote and recorded the Cosyne tutorial. Conducted research (Starting Notebook, Analysing performance and solutions as time constants change).
Marcus Ghosh@ghoshmManaged the project, wrote the paper, conducted research (Quick Start Notebook, Sound localisation following Dale’ law), gave the Cosyne tutorial.
Francesco De Santis@francescodesantis(Conducted research (Inhibition Model Notebook) and wrote the paper (Contralateral glycinergic inhibition as a key factor in creating ITD sensitivity))
Dilay Fidan Erçelik@dilayercelikConducted research (Quick Start Notebook, Quick Start Notebook with 250 Hz input).
Pietro Monticone@pitmonticoneCleaned paper and notebooks
Karim Habashy@KarimHabashyConducted research (Learning delays, Learning delays (v2), Vanilla sound localization problem with a single delay layer (non-spiking)), wrote the paper (Learning delays), project management (Quick Start Notebook)
Balázs Mészáros@mbalazs98Wrote the paper (DCLS based delay learning in the appendix). Conducted research (Noise offsets in every iteration, Dilated Convolution with Learnable Spacings).
Mingxuan Hong@mxhongConducted research (Altering Output Neurons, Dynamic threshold).
Rory Byrne@rorybyrneOrganised the source code structure, conducted research (Improving Performance: Optimizing the membrane time constant).
Sara Evers@saraeversConducted research (Analysing Dale’s law and distribution of excitatory and inhibitory neurons).
Zach Friedenberger@ZachFriedenbergerConducted research (Improving Performance: Optimizing the membrane time constant).
Helena Yuhan Liu@Helena-Yuhan-LiuConducted research (Analysis: thresholding W1W2 plot).
Jose Gomes (Portugal, PhD)@JoseGomesJPGConducted research (Sound localisation following Dale’ law).
(Unknown)@a-dtkConducted research (Robustness to Noise and Dropout).
Ido Aizenbud@ido4848Conducted research (Filter-and-Fire Neuron Model).
Sebastian Schmitt@schmittsConducted research (background on neuromorphic hardware in Background).
Rowan Cockett@rowanc1MyST technical support
Alberto Antonietti@alberto-antoniettiSupervised Francesco De Santis, wrote the paper (Contralateral glycinergic inhibition as a key factor in creating ITD sensitivity).
Juan Luis Riquelme@luis-rrConducted research (Sound localisation with excitatory-only inputs surrogate gradient descent)
Adam Haber@adamhaberConducted research (Compute hessians (jax version))
Gabriel Béna@GabrielBenaConducted research (Analysing trained networks - workshop edition, Sound localisation following Dale’ law).
Peter Crowe@pfcroweConducted research (Improving Performance: Optimizing the membrane time constant).
Umar Abubacar@UmarAbubacarConducted research (TCA Analysis) and wrote the paper (Tensor component analysis).
Gabryel Mason-WilliamsNone/unknownConducted research (Analysing trained networks - workshop edition).
Josh BourneNone/unknownConducted research (Analysing trained networks - workshop edition).
Zekai XuNone/unknownConducted research (Analysing trained networks - workshop edition).
Leonidas RichterNone/unknownConducted research (Learning delays).
Chen LiNone/unknownConducted research (Improving Performance: Optimizing the membrane time constant).
Brendan BicknellNone/unknownSupervised Dilay Fidan Erçelik.
Volker BormuthNone/unknownDeveloped teaching materials and used the project to teach two university courses. Supervised Marcus Ghosh & students at Sorbonne University.

Notebook map

The following lists the notebooks, authors, summary and related notebooks in this project.

Introductory notebooks

Background
Explanation of the background. (Author: Dan Goodman.)
Questions & challenges
List of research questions and challenges. (Author: everyone.)

Templates / starting points

Starting Notebook
The template notebook suggested as a starting point, based on the Cosyne tutorial that kicked off this project. (Author: Dan Goodman.)
Quick Start Notebook
Condensed version of Starting Notebook using the shorter membrane time constants from Improving Performance: Optimizing the membrane time constant and Dale’s law from Sound localisation following Dale’ law. (Author: Dilay Fidan Erçelik, Karim Habashy, Marcus Ghosh.)

Individual notebooks

Filter-and-Fire Neuron Model
Using an alternative neuron model. (Author: Ido Aizenbud based on work from Dilay Fidan Erçelik.)
Altering Output Neurons
Comparison of three different ways of reading out the network’s decision (average membrane potential, maximum membrane potential, spiking outputs) with short and long time constants. (Author: Mingxuan Hong.)
Analysing trained networks - workshop edition
Group project from an early workshop looking at hidden unit spiking activity and single unit ablations. Found that some hidden neurons don’t spike, and ablating those does not harm performance. Builds on (WIP) Analysing trained networks. (Author: Gabriel Béna, Josh Bourne, Tomas Fiers, Tanushri Kabra, Zekai Xu.)
Sound localisation following Dale’ law
Investigation into the results of imposing Dale’s law. Incorporated into Quick Start Notebook. Uses a fix from Analysing Dale’s law and distribution of excitatory and inhibitory neurons. (Author: Marcus Ghosh, Gabriel Béna, Jose Gomes.)
Dynamic threshold
Adds an adaptive threshold to the neuron model and compares results. Conclusion is that the dynamic threshold does not help in this case. (Author: Mingxuan Hong.)
Sound localisation with excitatory-only inputs surrogate gradient descent
Results of imposing an excitatory only constraint on the neurons. Appears to find solutions that are more like what would be expected from the Jeffress model. (Author: Juan Luis Riquelme.)
Learning delays, Learning delays (v2) and Vanilla sound localization problem with a single delay layer (non-spiking)
Delay learning using differentiable delay layer, written up in Learning delays (Author: Karim Habashy.)
Dilated Convolution with Learnable Spacings
Delay learning using Dilated Convolution with Learnable Spacings, written up in Learning delays. (Author: Balázs Mészáros.)
Robustness to Noise and Dropout
Test effects of adding Gaussian noise and/or dropout during training phase. Conclusion is that dropout does not help and adding noise decreases performance. (Author: Unknown (@a-dtk).)
Version with 250 Hz input, Quick Start Notebook with 250 Hz input
Analysis of results with a higher frequency input stimulus and different membrane time constants for hidden and output layers. Conclusion is that smaller time constant matters for hidden layer but not for output layer. (Author: Dilay Fidan Erçelik.)
Analysing performance and solutions as time constants change
Deeper analysis of strategies found by trained networks as time constants vary. Added firing rate regularisation. Extends Improving Performance: Optimizing the membrane time constant. Written up in more detail in A minimal trainable model of IPD processing. (Author: Dan Goodman.)
Workshop 1 Write-up
Write-up of what happened at the first workshop. (Author: Marcus Ghosh.)

Inconclusive

The following notebooks did not reach a solid conclusion.

Compute hessians (jax version)
An unfinished attempt to perform sensitivity analysis using Hessian matrices computed via autodifferentiation with the Jax library. (Author: Adam Haber.)
Noise offsets in every iteration
Analysis of an alternative way of handling noise. (Author: Balázs Mészáros.)
Analysis: thresholding W1W2 plot
Unfinished attempt to improve analysis code. (Author: Helena Yuhan Liu.)

Historical

This subsection includes notebooks whose content got merged into an updated notebook later.

(WIP) Analysing trained networks
Early work on analysing the strategies learned by trained networks. Folded into Analysing trained networks - workshop edition. (Author: Dan Goodman.)
Improving Performance: Optimizing the membrane time constant
Analyses how performance depends on membrane time constant. Folded into Analysing performance and solutions as time constants change. (Author: Zach Friedenberger, Chen Li, Peter Crowe.)
Analysing Dale’s law and distribution of excitatory and inhibitory neurons
Fixed a mistake in an earlier version of Sound localisation following Dale’ law. (Author: Sara Evers.)

Appendices

In this section, we provide detailed write-ups of our scientific results.

A minimal trainable model of IPD processing

Section authorsDan Goodman
NotebooksAnalysing performance and solutions as time constants change

This section describes the initial model developed for the Cosyne tutorial that served as the starting point for this project Goodman et al., 2022. It also describes some small variants of this basic model produced in the course of the project, which can be seen in the notebook Analysing performance and solutions as time constants change. The background and aims of this model are described in Introduction.

Methods

The model consists of the following pathway, illustrated in Figure 2: IPD \rightarrow stimulus \rightarrow input neurons \rightarrow hidden layer neurons \rightarrow readout neurons.

The IPD is an angle uniformly randomly selected in α[π/2,π/2]\alpha\in[-\pi/2,\pi/2] (frontal plane only).

The stimulus is a sine wave, and we model interaural phase differences (IPDs) by adding the IPD α as a phase delay to the right ear. Enumerating the ears with index i{0,1}i\in\{0,1\} so that i=0i=0 is the left ear, we get that the stimulus in ear ii is:

Si(t)=sin(2πft+iα)S_i(t)=\sin(2\pi f t+i\alpha)

To model a population of spiking neurons with different lags, we associate each input layer neuron with an ear and an additional phase delay ψ uniformly spaced in the range (0,π/2)(0, \pi/2). This ensures that by comparing a left and right ear you can generate any phase difference in the range (π/2,π/2)(-\pi/2,\pi/2) as required. Specifically, we set neuron jj connected to ear ii to receive the stimulus:

Sij(t)=sin(θij)S_{ij}(t)=\sin(\theta_{ij})

where

θij(t)=2πft+iα+ψj\theta_{ij}(t)=2\pi f t+i\alpha+\psi_j
ψj=jπ/2(Nψ1)\psi_j = j\pi/2(N_\psi-1)

and NψN_\psi is the number of input neurons per ear.

Next, we make these neurons spiking by giving them a time varying firing rate:

Rij(t)=Rmax((1+sinθij(t))/2)2R_{ij}(t)=R_\mathrm{max}((1+\sin\theta_{ij}(t))/2)^2

where RmaxR_\mathrm{max} is the maximum firing rate, which we set as 600 sp/s. Spikes are then generated via an inhomogeneous Poisson process with intensity Rij(t)R_{ij}(t). Some example raster plots are shown in Figure 3.

Examples of generated input spike trains. Each plot shows a raster plot of input spikes for a different sample. The subtitle of each plot gives the true interaural time difference (ITD). The x-axis is time in steps for the whole input duration, and the y-axis is input neuron index. Input neurons are in two groups, with each “ear” consisting of 100 neurons. Within each group, the spikes are delayed by an increasing amount from a minimum to maximum delay. Spikes are produced by a Poisson process with a time-varying firing rate given by a sinusoid.

Figure 3:Examples of generated input spike trains. Each plot shows a raster plot of input spikes for a different sample. The subtitle of each plot gives the true interaural time difference (ITD). The x-axis is time in steps for the whole input duration, and the y-axis is input neuron index. Input neurons are in two groups, with each “ear” consisting of 100 neurons. Within each group, the spikes are delayed by an increasing amount from a minimum to maximum delay. Spikes are produced by a Poisson process with a time-varying firing rate given by a sinusoid.

The 2Nψ2N_\psi input neurons are connected all-to-all to a “hidden layer” of NhN_h spiking neurons. These are standard leaky integrate-and-fire neurons with a membrane potential vv that in the absence of spikes evolves over time according to the differential equation:

τdvdt=v\tau \frac{\mathrm{d}v}{\mathrm{d}t}=-v

where τ is the membrane time constant. An incoming spike on a synapse with weight ww causes an instantaneous increase vv+wv\leftarrow v+w. These weights are stored in a matrix WihW_{ih} of size (2Nψ,Nh)(2N_\psi,N_h). If vv crosses the spike threshold of 1, the neuron emits a spike and instantaneously resets to v0v\leftarrow 0.

The hidden layer is all-to-all connected to a readout layer of NcN_c neurons via a weight matrix WhoW_{ho}. The aim is to divide the set of possible IPDs into NcN_c intervals Ik=[π/2+kπ/Nc,π/2+(k+1)π/Nc]I_k=[-\pi/2+k\pi/N_c,-\pi/2+(k+1)\pi/N_c] and then, if neuron kk is the most active, guess that the IPD must be in interval IkI_k. These hidden layer neurons follow the same differential equation but do not spike. Instead, to guess the IPD we compute their mean membrane potential over the duration of the stimulus, vˉk\bar v_k, and then compute an output vector that is the log softmax function of these mean membrane potentials:

xk=logexpvˉkexpvˉx_k = \log \frac{\exp\bar v_k}{\sum_\ell \exp \bar v_\ell}

We then interpret xkx_k as the estimated probability that αIk\alpha\in I_k. Our estimate of the IPD α^\hat\alpha will be the midpoint of the interval corresponding to the most active neuron k^=argmaxkxk\hat k=\argmax_k x_k. Note that the softmax function and probability interpretation are important for training the network, but once the network is trained you can equally well compute k^=argmaxkvˉk\hat k=\argmax_k \bar v_k.

The network is trained by defining a loss function that increases the further away the network behaviour is from what we would like (defined in detail below), and then using the surrogate gradient descent method Zenke & Ganguli, 2018Neftci et al., 2019. Full details on training parameters can be found in the notebook Starting Notebook.

The loss function we use is composed of two terms. The first is the cross entropy or negative log likelihood loss that measures how far our predicted probability distribution xkx_k is from the true probability distribution (which has value 1 for the correct kk and 0 for all other kk). The second term, which is not used in all the notebooks in this project, is an optional regularisation term. In Analysing performance and solutions as time constants change we regularise based on the firing rates of the hidden layer neurons. We compute the firing rate for each hidden neuron rmr_m. If this is below a mimimum threshold rr_- it contributes nothing to the loss, otherwise we compute Lm=((rmr)/(r+r))2L_m=((r_m-r_-)/(r_+-r_-))^2 for each neuron for a constant r+r_+ explained below. We now compute the average and multiply a constant L=cmLm/NhL=c\sum_m L_m/N_h. The constant r+r_+ is the maximum firing rate we would like to see in the network, so that Lm=1L_m=1 if rm=r+r_m=r_+. The constant cc is chosen to be the expected initial cross-entropy loss of the network before training. This makes sure that a firing rate of rm=r+r_m=r_+ is heavily penalised relative to the cross-entropy loss, but that any firing rate below rr_- is fine. We chose r=100r_-=100 sp/s and r+=200r_+=200 sp/s.

For the results in this section, the model is trained on 1282=16,384128^2=16,384 samples in batches of 128, 100 epochs using the Adam optimiser Kingma & Ba, 2017 with a learning rate of 0.001. The network needs to be retrained for each frequency, in this section we only use f=50f=50 Hz. Test results are shown for a fresh draw of 4,096 samples.

Results

This approach is able to train a network that can perform the task well, using very few neurons (Nψ=100N_\psi=100 input neurons per ear, Nh=8N_h=8 hidden neurons and Nc=12N_c=12 output neurons). Mean absolute error in IPD is 2.6\sim 2.6 deg (Figure 4). Hidden neuron firing rates are between 110 and 150 sp/s (Figure 5).

Confusion matrix. True interaural phase difference (IPD) is shown on the x-axis, and estimated IPD on the y-axis. Colour scale is yellow-blue, so a perfect result would be indicated by a yellow diagonal on a blue background. This plot shows the results of training the network with tone frequency f=50 Hz, membrane time constant \tau=2 ms, number of input neurons per neuron N_\psi=100, number of hidden layer units N_h=8, and number of discrete IPD categories N_c=12. Mean absolute IPD errors are \sim 2.6 deg.

Figure 4:Confusion matrix. True interaural phase difference (IPD) is shown on the x-axis, and estimated IPD on the y-axis. Colour scale is yellow-blue, so a perfect result would be indicated by a yellow diagonal on a blue background. This plot shows the results of training the network with tone frequency f=50f=50 Hz, membrane time constant τ=2\tau=2 ms, number of input neurons per neuron Nψ=100N_\psi=100, number of hidden layer units Nh=8N_h=8, and number of discrete IPD categories Nc=12N_c=12. Mean absolute IPD errors are 2.6\sim 2.6 deg.

Hidden neuron firing rates, with the same setup as in .

Figure 5:Hidden neuron firing rates, with the same setup as in Figure 4.

Analysis of the trained networks show that it uses an unexpected strategy. Firstly, the hidden layer neurons might have been expected to behave like the encoded neurons in Jeffress’ place theory, and like recordings of neurons in the auditory system, with a low baseline response and an increase for a preferred phase difference (best phase). However, very reliably they find an inverse strategy of having a high baseline response with a reduced response at a least preferred phase difference (Figure 6). Note that the hidden layer neurons have been reordered in order of their least preferred delay to highlight this structure. These shapes are consistently learned, but the ordering is random. By contrast, the output neurons have the expected shape (Figure 7). Interestingly, the tuning curves are much flatter at the extremes close to an IPD of ±π/2\pm \pi/2. We can get further insight into the strategy found by plotting the weight matrices WihW_{ih} from input to hidden layer, and WhoW_{ho} from hidden layer to output, as well as the product Wio=WihWhoW_{io}=W_{ih}\cdot W_{ho} which would give the input-output matrix for a linearised version of the network (Figure 8).

Tuning curves of hidden neurons. Each plot shows the interaural phase difference (IPD) tuning curve of one of the eight hidden layer neurons in the model. The x-axis shows the IPD and the y-axis the normalised firing rate. The black curves show the results for the trained spiking neural network. The orange curves show the best fit by a translated and scaled Gaussian curve. The blue curves show the fit for a rate-based approximation where spike times are ignored. Parameters are as in : f=50 Hz, \tau=2 ms, N_\psi=100, N_h=8, N_c=12.

Figure 6:Tuning curves of hidden neurons. Each plot shows the interaural phase difference (IPD) tuning curve of one of the eight hidden layer neurons in the model. The x-axis shows the IPD and the y-axis the normalised firing rate. The black curves show the results for the trained spiking neural network. The orange curves show the best fit by a translated and scaled Gaussian curve. The blue curves show the fit for a rate-based approximation where spike times are ignored. Parameters are as in Figure 4: f=50f=50 Hz, τ=2\tau=2 ms, Nψ=100N_\psi=100, Nh=8N_h=8, Nc=12N_c=12.

Tuning curves of output neurons. Each plot shows the interaural phase difference (IPD) tuning curve of one of the eight hidden layer neurons in the model. The x-axis shows the IPD and the y-axis the normalised firing rate. The black curves show the results for the trained spiking neural network. The blue lines show the fit with a rate-based approximation that ignores spike times. The orange lines show the results if we ignore the trained weight matrix and use the Ricker wavelet approximation described in the text. The green curve shows the tuning curves if we use both approximations, and additionally use the idealised Gaussian fits for the hidden neurons. The dashed red lines indicate the estimated IPD if that neuron is the most active. Parameters are as in : f=50 Hz, \tau=2 ms, N_\psi=100, N_h=8, N_c=12.

Figure 7:Tuning curves of output neurons. Each plot shows the interaural phase difference (IPD) tuning curve of one of the eight hidden layer neurons in the model. The x-axis shows the IPD and the y-axis the normalised firing rate. The black curves show the results for the trained spiking neural network. The blue lines show the fit with a rate-based approximation that ignores spike times. The orange lines show the results if we ignore the trained weight matrix and use the Ricker wavelet approximation described in the text. The green curve shows the tuning curves if we use both approximations, and additionally use the idealised Gaussian fits for the hidden neurons. The dashed red lines indicate the estimated IPD if that neuron is the most active. Parameters are as in Figure 4: f=50f=50 Hz, τ=2\tau=2 ms, Nψ=100N_\psi=100, Nh=8N_h=8, Nc=12N_c=12.

Weight matrices, with hidden neurons reordered by their worst delays. The left image shows the weight matrix from the input layer to the hidden layer. The middle image from the hidden layer to output layer. The right image shows the product of these two, which would be the equivalent weight matrix from the input to output layers if there were no nonlinearity in the system. Colours are on a hot cold scale, with hot colours corresponding to negative weights, cold colours to positive weights, and white corresponding to zero weight. Note that the colour scale is different for each image. Parameters are as in : f=50 Hz, \tau=2 ms, N_\psi=100, N_h=8, N_c=12.

Figure 8:Weight matrices, with hidden neurons reordered by their worst delays. The left image shows the weight matrix from the input layer to the hidden layer. The middle image from the hidden layer to output layer. The right image shows the product of these two, which would be the equivalent weight matrix from the input to output layers if there were no nonlinearity in the system. Colours are on a hot cold scale, with hot colours corresponding to negative weights, cold colours to positive weights, and white corresponding to zero weight. Note that the colour scale is different for each image. Parameters are as in Figure 4: f=50f=50 Hz, τ=2\tau=2 ms, Nψ=100N_\psi=100, Nh=8N_h=8, Nc=12N_c=12.

A number of features emerge from this analysis. The first is that the tuning curves of the hidden neurons have a very regular structure of having a high baseline firing rate with a dip around a “least preferred” delay that varies uniformly in the range π/2-\pi/2 to π/2\pi/2. Indeed, the tuning curves ii can be very well fit with the function a+be(ααi)2/2σα2a+be^{-(\alpha-\alpha_i)^2/2\sigma_\alpha^2} where α is the IPD, αi=π/2+iπ/Nh\alpha_i=-\pi/2+i\pi/N_h is the “least preferred” IPD, and a,b,σαa, b, \sigma_\alpha are parameters to fit (Figure 6, orange lines). This would look likely to be consistent with some form of optimal coding theory that minimises the effect of the Poisson noise in the spike counts, although we did not pursue this explanation.

The second feature is that spike timing does not appear to play a significant role in this network. This may initially appear suprising but in fact it is inevitable because of the coding scheme we have used where the initial layer of neurons fire Poisson spikes, and there is only one spiking layer, meaning there is no scope for spike times to be used (a limitation of this model realised late in the process). Indeed, if we predict the output of the network purely using the firing rates of the input stimulus passed through the weight matrices WihW_{ih} and WhoW_{ho} plus a static nonlinearity for the input to hidden layer, we get an excellent approximation for the hidden neurons (Figure 6, blue lines) and a qualitatively good fit for the output neurons (Figure 7, blue lines). Specifically, if ri(t)r_i(t) is the instantaneous time-varying firing rate of input neuron ii we approximate the instantaneous hidden units firing rates by the function:

rh(t)={0if ri(t)11trefrac+τlogri(t)ri(t)1if ri(t)>1r_h(t)=\begin{cases} 0 & \mbox{if } r_i(t)\leq 1 \\ \frac{1}{t_\mathrm{refrac}+\tau\log\frac{r_i(t)}{r_i(t)-1}} & \mbox{if } r_i(t)>1 \\ \end{cases}

This function is the firing rate of a leaky integrate-and-fire neuron τv=ri(t)v\tau v^\prime=r_i(t)-v with a refractory period trefract_\mathrm{refrac} (which is dt\mathrm{d}t in our case because of the way it is simulated) if the function ri(t)r_i(t) were constant over time, but it fits well even with a time-varying ri(t)r_i(t). From this we can take an average over time to get the mean firing rate. The output units do not spike so their activity is simply approximated by ro(t)=hWhorh(t)r_o(t)=\sum_h W_{ho}r_h(t).

Finally, we note that the weight matrix WhoW_{ho} visible in Figure 8 seems to have a very regular structure of weights that have a broad excitation and a narrowly tuned inhibition (an unusual pattern). Indeed, we can fit this well with a Ricker wavelet (or “Mexican hat”) function:

Whoa(1(δ/σδ)2)eδ2/2σδ2+bW_{ho}\approx a(1-(\delta/\sigma_\delta)^2) e^{-\delta^2/2\sigma_\delta^2}+b

where δ=oNch/Nh\delta=o-N_c h / N_h, hh and oo are the indices of the hidden and output neurons, NhN_h is the number of hidden neurons, NcN_c the number of output neurons, and aa, bb and σδ\sigma_\delta are parameters to estimate. Using this approximation and the rate-based approximation from before, we get the orange curves in Figure 7. If we use both the Ricker wavelet approximation of WhoW_{ho} and the idealised tuning curves, we get the green curves. All in all, this gives us a 6 parameter model that fits the data extremely well, a significant reduction on the 896 parameters for the full model (NψNh+NhNcN_\psi N_h+N_h N_c).

Discussion

This subproject was an extension of the original notebook Starting Notebook with the aim of understanding the solutions found in more detail. We successfully found a 6-parameter reduced model that behaves extremely similarly to the full model, and we can therefore say that we have largely understood the nature of this solution. We did not look in detail for a deep mathematical reason why this is the solution that is found, and this would make for an interesting follow-up. Are these tuning curves and weights Bayes optimal to reduce the effect of the Poisson spiking noise, for example?

The solution that was found gives tuning curves that are unlike those found in natural auditory systems, with an inverted “least preferred phase” structure instead of the typical “preferred phase”. In addition, the weight matrix from the hidden to output layer has broadly tuned excitation and narrowly tuned inhibition, which is an unusual pattern. However, it is worth noting that the model here of detecting IPDs of sinusoids with a fixed amplitude is very simplistic compared to the real conditions faced by the auditory system.

The solution found does not appear to use coincidence detection properties of spiking neurons, and indeed can be well approximated by a purely rate-based approximation. It appears to find a solution similar to that suggested by the equalisation-cancellation theory Durlach, 1963Culling, 2020. This seems initially surprising, but in fact because of the nature of the input stimulus (Poisson spike trains) and the fact that there is only one spiking layer of neurons, there is no temporal structure for coincidence detection to make use of, so it was inevitable that it would not find a solution that uses this strategy. An interesting follow-up would be to use a more detailed model of neuronal firing in the cochlear nucleus for example, or a multi-layer structure, and see if different solutions are found.

Tensor component analysis

Section authorsUmar Abubacar
NotebooksTCA Analysis

We used tensor component analysis (TCA) to explore the representation and learning dynamics of multiple models by examining their neuron, trial, and temporal factors Williams et al., 2018. TCA helps decompose multidimensional data into core components that capture the essential structure and relationships inherent in the neural activities across different dimensions. This analysis provides a clear insight into how models encode and process information over multiple trials, revealing the underlying mechanisms that drive their performance and behaviour. By employing hierarchical clustering with Ward’s method, we focused on assessing the similarity in neuron factor representations across models, which has elucidated the consistency and diversity of neural encoding across different training instances.

Results

We use nonnegative tensor component analysis (TCA) to examine spiking activity from the hidden layer of a neural network. Gaussian smoothing is applied to the spike events, and trials are stacked into a single tensor. The rank-1 decomposition in Figure 9 during training illustrates the learning process of the model, characterised by increased activity across all IPD bins through a subset of hidden neurons. The analysis highlights predominant firing within two critical intervals in the time domain: 0-25ms and 50-75ms, which aligns with periods of active input signals.

Rank 1 decomposition of spiking during training of the simple model.

Figure 9:Rank 1 decomposition of spiking during training of the simple model.

To capture the categorical dynamics within the model, the optimal number of components was determined by minimising reconstruction error and maximising component similarity. An optimal count of six ranks was identified for the model and are shown in Figure 10. The factors derived from tensor component analysis reveal distinct patterns of neural activation across time, highlighting the dynamic nature of responses to stimuli. Specifically when looking at trail factors, Factor 5 is responsive to the extreme IPDs of π/2\pi/2 and π/2-\pi/2, while Factors 1 and 4 focus on more central IPDs. Factors 2 and 3 indicate neuron assemblies responsive to IPDs towards π/2-\pi/2 as is Factor 6 to π/2\pi/2.

Rank 6 decomposition of spiking during training of the simple model.

Figure 10:Rank 6 decomposition of spiking during training of the simple model.

Model variability

To test variability between different instances of the simple model we trained 50 models on the same training data and compared the neuron and temporal factors of the rank 1 decomposition. Neuron factors were sorted by activation strength to align the most significant neurons across models and factors were normalised. Performing clustering analysis of the neuron factors between models helps to elucidate the ensembles responsible for learning the task. The dendrogram from the clustering provided a visual representation of these relationships, with models grouped together showcasing more similar model dynamics compared to those further apart. Here we can see grouping of models into 2 major categories in Figure 11.

Sampling models from each cluster shows that clusters with a higher number of active neurons tend to have better training accuracy but do not consistently achieve better test accuracy. In fact, clusters with fewer active neurons sometimes demonstrate better generalisation which had a slightly higher test accuracy despite fewer active neurons. This suggests that models with a moderate number of active neurons can achieve a balance between training effectiveness and test generalisation, potentially leading to more robust solutions. This discrepency in accuracies can be seen in Figure 12 where cluster 1-4 are increase in average active neurons.

Comparing the activation of neural assemblies with more ranks as in Figure 11 shows similar tuning across trails and models suggesting the models learn the same task but are able to do so with a range of subsets of neurons at the cost of accuracy.

Hierarchical clustering of neuron factors across models.

Figure 11:Hierarchical clustering of neuron factors across models.

Training and test accuracy of models in each cluster.

Figure 12:Training and test accuracy of models in each cluster.

Temporal factors across models.

Figure 13:Temporal factors across models.

Visualising the mean and standard deviation of temporal factors across models shows a common pattern of activation within the hidden layer neurons. The representation in Figure 13 highlights periods of peak neuronal activity. The shifts in peak timings across different models suggest variability in how these models prioritise processing delays related to inputs. Such differences likely stem from the weights learned by each model, reflecting the phase differences prioritised.

Overall the analysis employing TCA and hierarchical clustering provided insights into the learning dynamics and neuron behaviour of the simple model trained under identical conditions. Our study identified the variability and consistency in neuron activation across different models, showcasing how the architecture can learn identical tasks using different neuronal assemblies. This research highlights the balance between neuron activity and model accuracy, suggesting that optimal learning may be achieved with a moderate number of highly active neurons, balancing robust training performance with effective generalisation. These findings allow for further explorations into model optimisation, adjustments and further analysis of similar spiking models.

Learning delays

Section authorsKarim G. Habashy, Balázs Mészáros
NotebooksLearning delays, Learning delays (v2), Vanilla sound localization problem with a single delay layer (non-spiking), Dilated Convolution with Learnable Spacings

We introduce a simple method to solve the sound localization problem with only learnable delays, and discuss a method that learns both weights and delays, introduced in Hammouamri et al., 2024.

Introduction

Following the classic work by Jeffress (1948) on axonal delays, Kempter et al. (2001) investigated the question of how ITD computation maps can arise ontogenetically in the laminar nucleus of the barn owl. They showed that interaural time differences (ITD) computational maps emerge from the combined effect of a Hebbian spike-based learning rule and its transmission along the presynaptic axon. In other words, axonal delays shape network structure when coupled with temporal learning rules. Based on this insight, several studies investigated the combined effect of spike timing-dependent plasticity (STDP), axonal conduction delays and oscillatory activity on recurrent connections in spiking networks. Kerr et al. (2013) demonstrated the selective potentiation of recurrent connections in this scenario, while Kato & Ikeguchi (2016) showed that neural selection for memory formation depends on neural competition and, in turn, for neural competition to emerge in recurrent networks, spontaneously induced neural oscillation coupled with STDP and axonal delays are a prerequisite.

We can use this to develop an intuition behind coupling neuronal delays with STDP. The sign of the STDP rule depends on the order of post- and pre-synaptic spiking, and axonal delays can effectively reverse this. For example, the arrival of presynaptic spikes at the synapse before the backpropagated action potential leads to a synaptic depression. However, reducing the axonal transmission speed would lead to potentiation. In this line of thought, Asl et al. (2017) studied the combined role of delays and STDP on the emergent synaptic structure in neural networks. It was shown that, qualitatively different connectivity patterns arise due to the interplay between axonal and dendritic delays, as the synapse and cell body can have different temporal spike order.

Aside from their role in modeling cortical functions or shaping a network’s synaptic structure, another line of research emerged from the seminal work by Izhikevich (2006). They showed that when including conduction delays and STDP into their simulation of realistic neural models, polychronous groups of neurons emerge (although Pauli et al. (2018) suggests that some of these results may be implementation-specific artifacts). These groups show time-locked spiking patterns with millisecond precision. Subsequent studies investigated the properties and functions of such neuronal groups. For example, Szatmáry & Izhikevich (2010) demonstrated the natural emergence of large memory content and working memory when the neuronal model exploits temporal codes. Specifically, short term plasticity can briefly strengthen the synapses of specific polychronous neuronal groups (PNG) resulting in an enhancement of their spontaneous reactivation rates. In a qualitatively different study, Eguchi et al. (2018) showed that networks that exhibit PNG possess potential capabilities that might solve the dynamic binding problem. These networks respond with stable spatio-temporal spike trains when presented with input images in the form of randomized Poisson spike trains. The functionality of these kind of networks emerged due to the interplay of various factors including: i) random distribution of axonal delays ,ii) STDP and iii) lateral, bottom-up and top-down synaptic connections.

Finally, it should be noted that most of the studies that incorporate axonal and/or dendritic delays, include them as a non-learnable parameter. Few studies investigated the possibility of training transmission delays in order to enhance the computational capabilities of spiking neural networks (SNN). Matsubara (2017) proposed an algorithm that modifies the axonal delays and synaptic efficacy in both supervised and unsupervised approaches. The learning method involved approximates the Expectation-Maximization (EM) algorithm and, after training, the network learns to map spatio-temporal input-output spike patterns. Thus, EM is one way to train SNNs that are cast as probabilistic models. Another approach that exploits recent developments in deep learning is Hammouamri et al. (2024). In this work, delays are represented as 1D convolutions through time, where the kernels include a single per-synapse non-zero weight. The temporal position of these non-zero weights corresponds to the desired delays. The proposed method co-trains weights and delays and is based on the Dilated Convolution with Learnable Spacings (DCLS) algorithm Khalfaoui-Hassani et al., 2023.

In this work we propose a delay learning algorithm that is simple and efficient. The delay learning is mediated by a differentiable delay layer (DDL). This layer can be inserted between any two layers in an SNN in order to learn the appropriate delay to solve a machine learning task. This DDL is architecture agnostic. Also, the method is designed to learn delays separately from synaptic weights.

Methods

The DDL is, mainly, based on a 1D version of the spatial transformer (STN) network Jaderberg et al., 2015. The STN is a differentiable module that can be added into convolutional neural networks (CNNs) architectures to empower them with the ability to spatially transform feature maps in a differentiable way. This addition leads to CNN models that are invariant to various spatial transformations like translation, scaling and rotation. Image manipulations are inherently not differentiable, because pixels are discrete. However, this problem is overcome by the application of an interpolation (for example bi-linear) after the spatial transformation.

The DDL is a 1D version of the spatial transformer where the only transformation used is translation. Translation of a spike along the time dimension can be thought of as a translation of a pixel along the spatial coordinates. The general affine transformation matrix for the 2D case takes the form in the following equation:

[sr1sr2txsr3sr4ty001][xtyt1]=[xsys1]\begin{bmatrix} sr_1 & sr_2 & t_x\\ sr_3 & sr_4 & t_y\\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x_t\\ y_t\\ 1 \end{bmatrix} = \begin{bmatrix} x_s\\ y_s\\ 1 \end{bmatrix}

In the above equation, sr1, sr2, sr3, sr4{sr_1,~sr_2,~sr_3,~sr_4} are the elements responsible for the linear transformations of scaling and rotation. txt_x and tyt_y are the translations in the x-axis and y-axis respectively. xtx_t and yty_t are the location of a spike/pixel (in case of spikes y=0y = 0) in the target/output grid, while xsx_s and ysy_s are the location of the source grid. A grid can be an image or a 2D array of spike trains. For the case of only translation along the x-axis, the affine transformation matrix becomes:

[10tx010001]\begin{bmatrix} 1 & 0 & t_x\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{bmatrix}

Conventionally, for the spatial transformer, after the projection of the target grid onto the source grid comes the process of interpolation as the transformed pixel might not coincide with a representative one in the source grid/image. Hence, interpolation is performed to estimate the value of the transformed pixel from the surrounding ones. However, applying this process to spike trains can lead to a distortion of the spikes as the allowed values are only 1s and 0s. To avoid this, the translation element tx{t_x} should be multiples of the minimum delay. Thus, any transformed spike location from the target grid will find a matching spike location in the source grid. Interpolation (for example bi-linear) will pick the coincident source spike with a weighting of one and provide zero weighting for any other nearby spikes. This process is summarized visually in Figure 14. The input spike trains to the DDL should be padded by zeros so as to not lose information after translation.

Structure of the differentiable delay layer (DDL). The DDL shifts an input spike train by applying translation then interpolation.

Figure 14:Structure of the differentiable delay layer (DDL). The DDL shifts an input spike train by applying translation then interpolation.

We will see below that only the DDL is needed to solve the sound localization problem, where the output classes are the target IPD. This network architecture is shown in Figure 15, with the DDL inserted between the input and output nodes. In this cases, the weights are set to one and the biases to zero.

Differentiable delay layer (DDL) model architecture. The DDL is inserted between the input and output nodes. The output nodes are IPD classes spanning the range [-90^\circ, +85^\circ] in 5^\circ steps.

Figure 15:Differentiable delay layer (DDL) model architecture. The DDL is inserted between the input and output nodes. The output nodes are IPD classes spanning the range [90,+85][-90^\circ, +85^\circ] in 55^\circ steps.

The DDL shifts the spike trains in a differentiable way, which modifies the output membrane potential via a multiplicative synapse, described below. Also, to facilitate learning, a non-spiking output was utilized (i.e. the output membrane potential is used and the output neuron does not fire spikes, as in Methods).

vi=t=0T(u1i(t)u2i(t))2τmduxidt=uxi+Sx(t)Sx(t)=z=1nδ(ttxz)\begin{aligned} v_i &= -\sum_{t=0}^{T}(u_{1i}(t) - u_{2i}(t))^2 \\ \tau_m\frac{\ud u_{xi}}{\ud t} &= -u_{xi} + S_x(t) \\ S_x(t) &= \sum_{z=1}^{n} \delta(t-t_x^z) \end{aligned}

Here vi{v_i} is the voltage at output node ii, uxiu_{xi} is the dendritic potential from input x{1,2}x\in\{1,2\} and output node ii, τm\tau_m is the time constant of a dendritic branch, Sx(t)S_x(t) is the spike train of input xx. The form (u1i(t)u2i(t))2(u_{1i}(t) - u_{2i}(t))^2 looks like a squared error and might, at first glance, seem biologically implausible, but by expanding it as follows we see it is a form of multi-synaptic multiplicative-additive interaction:

(u1i(t)u2i(t))2=u1i(t)22u1i(t)u2i(t)+u2i(t)2(u_{1i}(t) - u_{2i}(t))^2 = u_{1i}(t)^2 -2u_{1i}(t)u_{2i}(t) + u_{2i}(t)^2

In addition to the DDL, we also use the dilated convolutions with learnable spacings Hammouamri et al., 2024 algorithm, which operates by delaying spike trains through a 1D convolution featuring a single non-zero element, equivalent to the synaptic weight, positioned at the appropriate delay value. This method also uses interpolation to identify the optimal delay, facilitating the learning of delays with weights through backpropagation through time in arbitrarily deep SNNs (see Hammouamri et al. (2024) for details).

Results and discussion

For the DDL we discretised IPDs into 36 classes from 90-90^\circ to +85circ+85^circ in 55^\circ increments. To simplify learning, we fix delays from one ear and only allow delays from the other ear to be learned. This can be seen in switch from the vertical bands of blue spikes in Figure 16A before learning, to the diagonal bands in Figure 16B after learning.

Spike histograms for IPDs before (A) and after (B) training. Each panel shows the delayed input spikes for a sample sound at a particular inteaural phase difference (IPD). Note that the first half of the spikes (coloured in red) have fixed delays, while the second half (blue) are learnable. The shift from a fixed delay to a set of delays can be seen in the transition from vertical to diagonal blue bands of spikes after training.

Figure 16:Spike histograms for IPDs before (A) and after (B) training. Each panel shows the delayed input spikes for a sample sound at a particular inteaural phase difference (IPD). Note that the first half of the spikes (coloured in red) have fixed delays, while the second half (blue) are learnable. The shift from a fixed delay to a set of delays can be seen in the transition from vertical to diagonal blue bands of spikes after training.

Loss curves and the histogram of errors after training are shown in Figure 17. Overall errors correspond to a classification accuracy of 11.8% (compared to a chance level of 2.8%), or a mean absolute error of 19.119.1^\circ. Confusion matrices are shown in Figure 18.

Left: Evolution of training loss of the differential delay layer model as a function of the number of training epochs. Right: Histogram of absolute errors (in degrees) after training.

Figure 17:Left: Evolution of training loss of the differential delay layer model as a function of the number of training epochs. Right: Histogram of absolute errors (in degrees) after training.

Analysis of classifications by the trained differential delay layer model. Date is shown for errors made on the training data set (A) and test data set (B). Left shows a histogram of the true IPDs (blue) and estimated IPDs (orange). Right shows the confusion matrices on a blue-yellow colour scale (so perfect prediction would correspond to a blue image with a yellow diagonal).

Figure 18:Analysis of classifications by the trained differential delay layer model. Date is shown for errors made on the training data set (A) and test data set (B). Left shows a histogram of the true IPDs (blue) and estimated IPDs (orange). Right shows the confusion matrices on a blue-yellow colour scale (so perfect prediction would correspond to a blue image with a yellow diagonal).

Next we show results for the dilated convolutions with learnable spacings (DCLS) algorithm, in this case using 12 IPD classes instead of 36, in Figure 19. Performance of this algorithm for this task was better, with a mean absolute error on the test dataset of 4.24.2^\circ.

Analysis of classifications by the trained dilated convolutions with learnable spacings (DCLS) model. Date is shown for errors made on the training data set (A) and test data set (B). Left shows a histogram of the true IPDs (blue) and estimated IPDs (orange). Right shows the confusion matrices on a blue-yellow colour scale (so perfect prediction would correspond to a blue image with a yellow diagonal).

Figure 19:Analysis of classifications by the trained dilated convolutions with learnable spacings (DCLS) model. Date is shown for errors made on the training data set (A) and test data set (B). Left shows a histogram of the true IPDs (blue) and estimated IPDs (orange). Right shows the confusion matrices on a blue-yellow colour scale (so perfect prediction would correspond to a blue image with a yellow diagonal).

Learning synaptic delays with weights enables the visualization of the ‘receptive field’ of postsynaptic neurons, as illustrated in Figure 20. Five randomly chosen neurons from the hidden layer are plotted, revealing clear spatiotemporal separation of excitation and inhibition.

Receptive fields of 5 randomly chosen neurons in the hidden layer of the differentiable delay layer model. The x-axis represents the presynaptic neuron index, while the y-axis displays the learned delay value. Colors indicate the sign of the weight (blue=excitation, red=inhibition), with transparency denoting magnitude. Excitation and inhibition appear to be spatiotemporally separated.

Figure 20:Receptive fields of 5 randomly chosen neurons in the hidden layer of the differentiable delay layer model. The x-axis represents the presynaptic neuron index, while the y-axis displays the learned delay value. Colors indicate the sign of the weight (blue=excitation, red=inhibition), with transparency denoting magnitude. Excitation and inhibition appear to be spatiotemporally separated.

Contralateral glycinergic inhibition as a key factor in creating ITD sensitivity

Section authorsFrancesco De Santis, Alberto Antonietti
NotebooksInhibition Model Notebook

Highlights

Here we use a more biologically inspired model to overcome some limitations that have been highlighted in the classic Jeffress model, whose existence, in mammals, is still debated. In fact, axonal delay lines have not been found in the mammalian MSO. We focused then our attention on the inhibitory inputs to the MSO, which were found both anatomically (there are inhibitory ipsilateral and contralateral pathways) and computationally (in our model) to have a central role for coding ITDs. Experiments with inhibition blocked (by a glycine antagonist: strychnine) shown indeed a loss of ITD-coding in the MSO.

Introduction

Once immersed in the neuroanatomy literature devoted to human sound localization circuits, we realized how to fully grasp the development of ITD sensitivity in the MSO it was necessary to parallel outline the strategy by which ILDs are represented in the LSO (Lateral Superior Olive). From an evolutionary point of view, specific to the mammalian phylogenetic tree, the MSO can in fact be considered a refined version of the older LSO Grothe & Pecka, 2014.

LSO principal cells have a bipolar dendritic tree that receives excitatory input from the ipsilateral ear and inhibitory input from the contralateral one. All mammals appear to apply one common neural strategy for processing ILDs, which consists of subtraction between these two inputs. For instance, neurons in the right LSO will be more excited when sound arrives from the right and more inhibited when sound arrives from the left. The function that describes LSO neurons’ firing rate according to different azimuth angles is usually a sigmoid, presenting the highest values for sound coming from the hemispace ipsilateral to the nucleus, with a steep slope centered on frontal angles (i.e., close to 0°), suggesting a rate-coding strategy for identifying different ILD values with high sensitivity for frontal angles, as suggested by psychoacoustic evidence.

Although the on-off nature of this subtraction strategy, for which exquisite timing of inhibitory influences appears not to be a key prerequisite, the two major inputs to the LSO are specialized for high-fidelity temporal transmission. An explanation is found in the fact that the subtraction process happening in LSO principal cells is realized in a phase-locked way with respect to the stimulus. This implies a purely suppressive coincidence mechanism happening at each period of the phase-locked inputs to the LSO (spiking occurs unless binaural coincidence is happening).

The MSO, on the other hand, receives two additional inputs compared to the LSO: a contralateral excitation and an ipsilateral inhibition. As a result of experimental observations, the main hypothesis is that the combination of these four inputs has converted the suppressive coincidence mechanism present in the LSO into an excitatory coincidence one for the detection of ITDs in the MSO (spiking occurs only if binaural coincidence is happening) Grothe & Pecka, 2014.

Through the model described in this appendix, we therefore maintained the idea that coincidence mechanisms are present in mammalian MSO neurons, as described in the Jeffress Model, already amply depicted in the main text. Nevertheless, we also pointed out how the neuro-anatomical strategies adopted to implement this coincidence are different from those assumed by this gold standard model, since the presence of axonal delay lines in the mammalian MSO has never been experimentally demonstrated.

We evaluated the activity of these four nuclei (left and right LSO and MSO), using the firing rates of our artificial LSOs as a validation platform for our network, while exploring the impact of certain parameters in the processing of ITDs in the in silico MSO. In order to define a firing rate activity target during this second phase of the work, we relied on the results obtained in these two past experimental studies:

  • in Brand et al. (2002), the analysis of in vivo recordings from the MSO of the Mongolian gerbil showed how all the 20 neurons tested responded maximally to sounds leading in time at the contralateral ear. Peaks in the firing rate of these neurons were also found for (artificial) ITD values higher than the highest possible ones generated by the gerbil head, which correspond to almost 120 μs\mu s for a sound coming at 90° from the contralateral hemispace.

  • in Pecka et al. (2008), the physical mechanisms underlying the shift in peaks’ activity towards contralateral ITD values was explored. The results showed how the two inhibitory inputs to the MSO, which are not considered in the Jeffres model, have instead a central role in this process. By blocking the glycinergic inhibition to the MSO neurons by means of the application of its antagonist, strychnine, the authors observed the loss of the peak shift in all their response activity. Having now a symmetry in all the neuron responses, with a peak in the activity corresponding to null ITD values (i.e., 0° azimuth angle), all the information present in the MSO responses had been lost. Inhibition was thus shown to have a central role in identifying the ITD values in the MSO. Consequently, in our model we tried to block these inhibitory inputs to explore the effect that they have on the MSO activity

Finally, we wanted to understand how the firing rate activity of different neurons in the MSO could be differentiated (i.e. different peak positions in the contralateral space) in order to code for different ITD values and consequently different sound source azimuth angles.

Some additional experimental studies reported how inhibitory inputs, especially the contralateral one, can arrive slightly in advance of the excitatory inputs from the corresponding sides, thus influencing the way ipsilateral and contralateral inputs add up in the MSO neurons and thereby generating a range of maximum responses for contralateral ITD values Myoga et al., 2014Roberts et al., 2013. Nevertheless, other work has characterized the hypothesis of anticipation of inhibitory inputs as implausible vanderHeijden et al., 2013.

Since the absence of a clear and shared hypothesis in the literature, we decided to test our own: neurons in the MSO could present different activity given to the different shape of post-synaptic potentials (PSPs), both excitatory (EPSPs) and inhibitory (IPSPs).

Methods

Inspired by the neurophysiological data, we implemented a complex spiking neural network in Python using the NEST Simulator framework Spreizer et al., 2022. The different neuronal populations composing the brainstem circuit and their interconnections are depicted in Figure 21.

The end-to-end brainstem model with the network diagram. The number of neurons for each population is written in the respective block

Figure 21:The end-to-end brainstem model with the network diagram. The number of neurons for each population is written in the respective block

The principal inputs to the network are the spectrogram of a sound stimulus arriving at both ears and the azimuth angle from which univocal values of ITD and ILD are computed.

While the ITD value is obtained realistically considering the interaural path difference, which is the difference between the paths traveled by the sound starting from the source and arriving at the two ears and dividing it by the speed of sound in air (i.e., 330 m/s), the ILD value is more artificial, since it is computed as a difference in the number of spikes between left and right auditory nerve fibers (ANFs). This strategy, although very efficient in modeling the effect of ILD on various neural populations, does not take into account the natural variability of the ILD value in relation to the frequency of the input sound and its distance from the subject. Nevertheless, these aspects were not a problem with regard to the results presented in this appendix, as only one frequency of input sound was used and a fixed source distance was kept.

With regards to the spectrogram, we covered the whole human audible range of sound between a minimum frequency of 20 Hz and a maximum of 20 kHz. We subdivided it into 3500 channels since this value is the most likely estimate of the number of inner hair cells (IHCs) present in the human cochlea. With the intent to mimic the physiological distribution of IHCs along the basilar membrane, the width of each frequency channel was not constant throughout the range but grew exponentially Fettiplace, 2023.

For modeling of ANFs, we used the pulse_packet_generator, a built-in NEST device that produces a spike train containing Gaussian spike clusters centered around given times. These devices mimic the actual behaviour of ANFs subjected to a pure tonal stimulus, which, like all other neuronal populations involved in sound processing, manifest a phase-locked response. The firing of these neurons tends to occur only at a certain restricted phase of the incoming sine wave of sound, due to the working principle of the inner hair cells placed in the cochlea Yin et al., 2019. pulse_packet_generator allowed us to define the number of spikes in each packet (varying accordingly to the ILD), with spike times that are normally distributed with respect to the central time of the pulse. A source of noise was introduced by setting the standard deviation of the random displacement from the center of the pulse equal to 0.1 ms.

All other cell populations were implemented through iaf_cond_alpha models (a simple implementation of a spiking neuron in NEST using integrate-and-fire dynamics with conductance-based synapses and a postsynaptic change of conductance modeled by an alpha function). With this model we implemented in a manner faithful to their biological counterparts the bushy cells (spherical and globular, SBCs and GBCs) located in the anteroventral part of the cochlear nuclei, the glycinergic neurons located in the medial and lateral trapezoidal body (MNTB and LNTB), and finally the main cells of the lateral superior olives (LSOs). All the default parameters of this NEST model were kept unchanged (see Table 6.) apart from the membrane capacitance ‘C_m’, which was lowered to 1 pF to ensure sufficiently quick membrane time constants as seen experimentally in these neurons Cao et al. (2007).

Table 6:Key parameters for the iaf_cond_alpha neural model, including membrane properties, spike-related parameters, and synaptic properties.

ParameterValueDescription
C_m (pF)250 pFMembrane capacitance
g_L (nS)16.6667 nSLeak conductance
E_L (mV)-70 mVLeak reversal potential (resting potential)
refr_T (ms)2 msDuration of refractory period
V_th (mV)-55 mVSpike threshold potential
V_reset (mV)-60 mVReset potential
E_exc (mV)0 mVExcitatory reversal potential
E_inh (mV)-85 mVInhibitory reversal potential
tau_syn,exc (ms)0.2 msSynaptic time constant of excitatory synapse
tau_syn,inh (ms)2 msSynaptic time constant of inhibitory synapse

The MSO principal cells were instead implemented through the iaf_cond_beta model. The use of a beta function to replicate the postsynaptic change of conductance allowed us to change independently the time constants of rise and fall of the conductance change and thus modify both the excitatory and inhibitory post-synaptic potential shapes in the MSO. In this way, we could explore different sets of values and attempt to validate our hypothesis about how inhibitory inputs can code for different ITD values in MSO cells, see Introduction.

Table 7:Brainstem network data about different populations cell types, NEST models used, convergence, numerosity, and mean number of cells within a single frequency channel.

Cell TypeModelConvergenceNumerosityNumber - Frequency Channels ratio
ANFspoisson generator (devices)4 : 1 to SBCs
20 : 1 to GBCs
3500010
SBCsiaf_cond_alpha5 : 1 to LSO PCs,
5 : 1 to MSO PCs
87502.5
GBCsiaf_cond_alpha1 : 1 to MNTB PCs17500.5
MNTB PCsiaf_cond_alpha1 : 1 to LSO PCs,
1 : 1 to MSO PCs
17500.5
LSO PCsiaf_cond_alpha-17500.5
MSO PCsiaf_cond_beta-70003.5

For the validation of the complete brainstem network, including both LSO and MSO of both sides, sound stimuli with frequencies of 100 Hz and 1 s duration from different spatial positions were tested. Specifically, we gave the model azimuth angles ranging from -90° to +90° with an interval of 15°. The MSO response was tested both in physiological conditions and with blocked inhibitory inputs, as in the experiments of Brand et al. and Pecka et al.

Results

As for the LSO response in Figure 22, in our results we obtained the desired subtraction process described in Introduction: considering for the sake of simplicity the left LSO, when the sound arrives from a source placed at 90° (i.e., right), the right ear receives sounds earlier and more intensely than the left. As the azimuth angle proceeds toward 0° (frontal position), the firing rate of the left LSO increases while maintaining a constant slope. This steep segment is the most informative part of the left LSO response curve, as a high sensitivity to changing input azimuth angles is guaranteed. Once past 0°, the firing rate ceases to increase steadily, and the response flattens out to high-rate values. Here, the ipsilateral (left) excitation dominates due to louder sounds.

Mean firing-rate responses of the left LSO (on the left) and the left MSO (on the right) after stimulation with a 100 Hz pure tone sound for 1 second at different azimuth angles. For the MSO, four neurons with different time constants are shown. Neuron 1 responds maximally for input at +15°, Neuron 2 at +30°, Neuron 3 at +45°, and Neuron 4 at +90°.

Figure 22:Mean firing-rate responses of the left LSO (on the left) and the left MSO (on the right) after stimulation with a 100 Hz pure tone sound for 1 second at different azimuth angles. For the MSO, four neurons with different time constants are shown. Neuron 1 responds maximally for input at +15°, Neuron 2 at +30°, Neuron 3 at +45°, and Neuron 4 at +90°.

Regarding the MSO instead (Figure 22), the different curves represent the activity of four neurons in the left MSO for which we varied only the value of the time constants of the decay in conductance generated by input spikes for both excitatory (tau_decay_exc) and inhibitory (tau_decay_inh) inputs. We observed that different combinations of these two values provided coding for a specific angle achieved by a peak at different angles (and thus different ITD values) in that cell’s activity. As observed experimentally, all peaks were present in the contralateral sound space, thus confirming the hypothesis that, in contrast to the coding strategy applied in the LSO, higher activity is present in the MSO for sounds from the contralateral space.

Regarding simulations in which the MSO receives only excitatory inputs (Figure 23), a loss in the coding of different contralateral angles is evidenced by a symmetric firing rate curve, with all the peak values being higher and shifted towards 0° angles with respect to the physiological activity, as measured experimentally in Pecka et al. (2008).

Loss of contralateral ITD peak-coding for the four different neurons in the left MSO: the control condition (with physiological inhibitory inputs) is shown in blue, whereas the curves in red depict the condition with blocked inhibition to the MSO; in the latter scenario, firing-rate values are higher with respect to the former and peaks are shifted to null ITD values, so that the coding of each neuron for a specific azimuth angle is now lost.

Figure 23:Loss of contralateral ITD peak-coding for the four different neurons in the left MSO: the control condition (with physiological inhibitory inputs) is shown in blue, whereas the curves in red depict the condition with blocked inhibition to the MSO; in the latter scenario, firing-rate values are higher with respect to the former and peaks are shifted to null ITD values, so that the coding of each neuron for a specific azimuth angle is now lost.

The achievements of this work are to be considered significant in the investigation of the mechanisms used by the mammalian brainstem to perform sound localization. The computational model developed is a good validation platform for the most recent theories concerning the processing of the ITDs in the MSO. This model has shown, for pure tones of 100 Hz frequency, the physiological functioning of LSO and MSO. The peak-coding strategy applied for the identification of contralateral angles in each MSO can be considered a refinement of the rate-based localization of sounds happening in the LSO. As described in the Introduction, this type of redundancy is also justified by the evolutionary history of spatial hearing mechanisms in mammals. The differentiation in the activity of MSO neurons due to the different time constants of the decay in conductance generated by input spikes was found to be a valid hypothesis for implementing the coding of ITDs. In the end, this work leads to the possible conclusion that strategies similar to those used for the processing of ILDs have also been readapted for the processing of ITDs and that in the brainstem of modern mammals, these two processes occur in parallel, merging at a higher level, and thus providing a more refined and complete spatial map.

Funding and Acknowledgements

  • MG is supported by Schmidt Sciences, LLC. Previously, MG was a Fellow of Paris Region Fellowship Program - supported by the Paris Region, and funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No 945298-ParisRegionFP.
  • FS and AA work is fully funded by the project “EBRAINS-Italy (European Brain ReseArch INfrastructureS-Italy),” granted by the Italian National Recovery and Resilience Plan (NRRP), M4C2, funded by the European Union –NextGenerationEU (Project IR0000011, CUP B51E22000150006, “EBRAINS-Italy”).
  • DFE is funded by the SURF programme (undergraduate research fellowship) from the Simons Foundation (SCGB).
  • DFE and BAB acknowledge the use of the UCL Myriad High Performance Computing Facility (Myriad@UCL), and associated support services, in the completion of this work.
  • ZF is funded by a NSERC PGS-D Scholarship.
  • BM is funded by the be.AI Leverhulme Doctoral Scholarships (Leverhulme Trust).
  • UA is supported by the UK Engineering and Physical Sciences Research Council (EPSRC) DTP Studentship 2753922 for the University of Surrey.
  • JLR was funded by the Max Plank Society.
  • YHL is supported by NSERC PGS-D, FRQNT B2X, and Pearson Fellowship.
  • VB is supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research innovation program, grant agreement number 715980.
References
  1. Abbott, L. F., Angelaki, D. E., Carandini, M., Churchland, A. K., Dan, Y., Dayan, P., Deneve, S., Fiete, I., Ganguli, S., Harris, K. D., Häusser, M., Hofer, S., Latham, P. E., Mainen, Z. F., Mrsic-Flogel, T., Paninski, L., Pillow, J. W., Pouget, A., Svoboda, K., … Zador, A. M. (2017). An International Laboratory for Systems and Computational Neuroscience. Neuron, 96(6), 1213–1218. 10.1016/j.neuron.2017.12.013
  2. Wool, L. E. (2020). Knowledge across networks: how to build a global neuroscience collaboration. Current Opinion in Neurobiology, 65, 100–107. 10.1016/j.conb.2020.10.020
  3. Mainen, Z. F., Häusser, M., & Pouget, A. (2016). A better way to crack the brain. Nature, 539(7628), 159–161. 10.1038/539159a
  4. Bell, A., Chetty, R., Jaravel, X., Petkova, N., & Van Reenen, J. (2019). Who Becomes an Inventor in America? The Importance of Exposure to Innovation*. The Quarterly Journal of Economics, 134(2), 647–713. 10.1093/qje/qjy028
  5. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009, June). ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition. 10.1109/cvpr.2009.5206848
  6. Schrimpf, M., Kubilius, J., Hong, H., Majaj, N. J., Rajalingham, R., Issa, E. B., Kar, K., Bashivan, P., Prescott-Roy, J., Geiger, F., Schmidt, K., Yamins, D. L. K., & DiCarlo, J. J. (2018). Brain-Score: Which Artificial Neural Network for Object Recognition is most Brain-Like? 10.1101/407007
  7. Schrimpf, M., Kubilius, J., Lee, M. J., Ratan Murty, N. A., Ajemian, R., & DiCarlo, J. J. (2020). Integrative Benchmarking to Advance Neurally Mechanistic Models of Human Intelligence. Neuron, 108(3), 413–423. 10.1016/j.neuron.2020.07.040
  8. Bowers, J. S., Malhotra, G., Dujmović, M., Llera Montero, M., Tsvetkov, C., Biscione, V., Puebla, G., Adolfi, F., Hummel, J. E., Heaton, R. F., Evans, B. D., Mitchell, J., & Blything, R. (2022). Deep problems with neural network models of human vision. Behavioral and Brain Sciences, 46. 10.1017/s0140525x22002813
  9. Goodman, D., Fiers, T., Gao, R., Ghosh, M., & Perez, N. (2022). Spiking Neural Network Models in Neuroscience - Cosyne Tutorial 2022. Zenodo. 10.5281/ZENODO.7044500
  10. Kluyver, T., Ragan-Kelley, B., Pérez, F., Granger, B., Bussonnier, M., Frederic, J., Kelley, K., Hamrick, J., Grout, J., Corlay, S., Ivanov, P., Avila, D., Abdalla, S., & Willing, C. (2016). Jupyter Notebooks – a publishing format for reproducible computational workflows. In F. Loizides & B. Schmidt (Eds.), Positioning and Power in Academic Publishing: Players, Agents and Agendas (pp. 87–90). IOS Press. 10.3233/978-1-61499-649-1-87
  11. Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del Río, J. F., Wiebe, M., Peterson, P., … Oliphant, T. E. (2020). Array programming with NumPy. Nature, 585(7825), 357–362. 10.1038/s41586-020-2649-2
  12. Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), 90–95. 10.1109/MCSE.2007.55
  13. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., … Chintala, S. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv. 10.48550/arXiv.1912.01703
  14. Zenke, F. (2019). SpyTorch. Zenodo. 10.5281/zenodo.3724018
  15. Zenke, F., & Ganguli, S. (2018). SuperSpike: Supervised learning in multilayer spiking neural networks. Neural Computation. 10.1162/neco_a_01086