[RL basics] Week 1. RL Intro

This page presents the basics of RL. It might be usefull to refresh RL or to explain it to newcommers.

RL framework

An agent learns from the environment by interacting with it through trials and errors and receives rewards as feedback.

To have the best behavior, we need to maximize the expected cumulative reward.

Agent takes decision using only current state (no memory)

Tradeoff between usual actions and new unknown experience.

👌 Use stochastic policy (probability distribution of actions)

Which action to take given current state?

Which state has the highest value?

⬅️ Home

➡️ Week 2. Policy

Page updated

Google Sites

Report abuse