# Markov Decision Processes

Martin Wickham, Paul Cantrell, Mike Precup

Markov Decision Processes, or MDPs, are forms of decision making in which the outcome is partially under the control of the one making the decision and partially random.  They are named after Andrey Markov, a Russian mathematician. It is based on the transition between states through actions. The subject chooses its action based on the expected reward, and that causes it to move on to the next state.

Markov Decision Processes were first created in 1950. The area was opened up further due to work by Ronald A. Howard in Dynamic Programming and Markov Processes (“Markov Decision Process”). These techniques continue to be useful today. MDPs are commonly used in machine learning and optimization techniques. Andrey Markov was a Russian mathemetician well known for these statistical models such as the Markov Decision Processes. Markov was already creating a “new method to solve linear ordinary differential equations” (“Andrey Markov”) at age 17 and was later excommunicated from the Russian Orthodox Church due to his protest of Leo Tolstoy’s excommunication.

Markov Decision Processes are easily utilized in many applications. One example is what you spoke about today, queueing theory. Whether people queue at all and what line they queue in (if by choice) can be modeled with Markov Decision Processes. There are many other applications simply because it models rational behavior quite well. In many situations in life, a person will weigh multiple possibilities against each other based on effort and reward, and the time frame of the reward. The Markov processes are a simplified process of this. That means that they apply to most human actions.

Sources:

http://en.wikipedia.org/wiki/Markov_decision_process

http://en.wikipedia.org/wiki/Andrey_Markov

### One response to “Markov Decision Processes”

1. Ron Howard is a professor here at Stanford! I learned about MDPs from him! MDPs are useful because they can give a basic model of causality that many machine learning algorithms can’t, since they assume independence of parameters.