TY - GEN
T1 - Instrumental Conditioning with Neuromodulated Plasticity on SpiNNaker
AU - Enuganti, Pavan Kumar
AU - Bhattacharya, Basabdatta Sen
AU - Gait, Andrew
AU - Rowley, Andrew
AU - Brenninkmeijer, Christian
AU - Fellows, Donal K.
AU - Furber, Stephen B.
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - We present a work-in-progress on implementing reinforcement learning by instrumental conditioning on SpiNNaker. Animals learn to behave by exploring the changing environment around them such that, over a period of time, their behaviour gives a good outcome (reward) i.e. a perception of ‘satisfaction’. While inspired by animal learning, reinforcement learning adopts a goal-directed strategy of maximising rewards in a dynamic environment. Instrumental conditioning is a strategy to strengthen the association between an action and the environmental state when the state-action pair is rewarded i.e. the reward is instrumental in forming the association. However, in the real world, the delivery of a reward is often delayed in time, known as the distal reward problem. Using the concept of eligibility traces and spike-time dependant plasticity (STDP), Izhikevich (2007) simulated both classical and instrumental conditioning in a spiking neural network with Dopamine (DA)-modulated STDP. The current implementation of DA-modulated plasticity on SpiNNaker using trace-based STDP is reported by Mikaitas et al. (2018), who demonstrated classical conditioning with a similar experimental set up as Izhikevich. Our results show that using delayed DA-modulation of STDP on SpiNNaker, we can condition a neural population to maximise its reward over a period of time by firing at a higher rate than another competing population. Ongoing work is looking into a dynamic conditioning scenario where different actions can be selected within the same run as is the case in real world scenarios.
AB - We present a work-in-progress on implementing reinforcement learning by instrumental conditioning on SpiNNaker. Animals learn to behave by exploring the changing environment around them such that, over a period of time, their behaviour gives a good outcome (reward) i.e. a perception of ‘satisfaction’. While inspired by animal learning, reinforcement learning adopts a goal-directed strategy of maximising rewards in a dynamic environment. Instrumental conditioning is a strategy to strengthen the association between an action and the environmental state when the state-action pair is rewarded i.e. the reward is instrumental in forming the association. However, in the real world, the delivery of a reward is often delayed in time, known as the distal reward problem. Using the concept of eligibility traces and spike-time dependant plasticity (STDP), Izhikevich (2007) simulated both classical and instrumental conditioning in a spiking neural network with Dopamine (DA)-modulated STDP. The current implementation of DA-modulated plasticity on SpiNNaker using trace-based STDP is reported by Mikaitas et al. (2018), who demonstrated classical conditioning with a similar experimental set up as Izhikevich. Our results show that using delayed DA-modulation of STDP on SpiNNaker, we can condition a neural population to maximise its reward over a period of time by firing at a higher rate than another competing population. Ongoing work is looking into a dynamic conditioning scenario where different actions can be selected within the same run as is the case in real world scenarios.
KW - Balanced random network
KW - Delayed reward
KW - Dopamine-modulated STDP
KW - Instrumental conditioning
KW - Neuromodulated plasticity
KW - SpiNNaker
UR - http://www.scopus.com/inward/record.url?scp=85161633378&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-30108-7_13
DO - 10.1007/978-3-031-30108-7_13
M3 - Conference contribution
SN - 9783031301070
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 148
EP - 159
BT - Neural Information Processing - 29th International Conference, ICONIP 2022, Proceedings
A2 - Tanveer, Mohammad
A2 - Agarwal, Sonali
A2 - Ozawa, Seiichi
A2 - Ekbal, Asif
A2 - Jatowt, Adam
ER -