Reinforcement Learning Playground

Choose an exercise to understand Prediction Error and Q-learning.

Exercise

Trial 0 / 20

Learning Rate (LR): 0.1

Beta (β): 1.0

Q-Value: 0.500

Choice Probability: 50.0%

Q-Value: 0.500

Choice Probability: 50.0%

Softmax input: β * (Q_A - Q_B) = 0.00

Previous Q-Value:

Prediction Error (PE) = Reward - Previous Q =

New Q = Previous Q + LR * PE

New Q =

You've completed 20 trials. Below is a graph of the Q-values over time.