After every shot, the bot updates that cell using the Bellman equation:
Q(s,a) ← Q(s,a) + α × [reward − Q(s,a)]
α = 0.2 learning rate
+1 basket
−1 any miss
The more the bot trains, the stronger its opinions become. Watch the cell pulse and change color after each shot!