Markov Decision Processes (MDPs) and reinforcement learning (RL) are two very successful paradigms adopted in artificial intelligence for designing autonomous agents capable of dealing with sequential decision problems under uncertainty. The decision problem is formalized as a tuple <S, A, T, R> with S the set of system states, A the set of possible actions, T the transition function and R the reward function.