RL sandbox

First steps in Reinforcement learning