Write short note on Temporal Difference Learning.

**1 Answer**

Write short note on Temporal Difference Learning.

**Solution:**

**Temporal-Difference (TD) Learning:**

a combination of DP and MC methods

updates estimates based on other learned estimates (i.e., bootstraps), (as DP methods) does not require a model; learns from raw experience as MC methods.

constitutes a basis for reinforcement learning.

Convergence to $\mathrm{V}^\pi$ is guaranteed (asymptotically as in MC methods) in the mean for a constant learning rate $\alpha$ if it is sufficiently small. with probability 1 if $\alpha$ decreases in accordance with the usual stochastic approximation conditions.

