Skip to main content

Hopping Robots and Reinforcement Learning





submit to reddit

All too often, when we deal with data the outcome needed is a strategy or an algorithm itself. To arrive at that strategy we may have historic data or some model on how entities in system respond to various situations. In this write up, I'll go over the method of reinforcement learning. The general idea behind reinforcement learning is to come up with a strategy to maximize some measurable goal. For example, if you are modelling a robot that learns to navigate around obstacles, you want the learning process to come back with a strategy that minimizes collisions (say) with other entities in the environment.

Pattern Recognition and Machine Learning (Information Science and Statistics)

For the sake of simplicity, lets assume the following scenario. A robot is placed (at random) on flat plank of wood which has some sticky glue in the center. To its left there is a hole which damages the robot a bit and to its right is a reward which is its destination, as shown in the figure below


The robot has the following capabilities
  • It can sense what state \({S_1,S_2,...,S_7}\) it is in
  • It can detect a reward or damage that happens to it while in a particular state
  • It can move one space left or right
  • It can hop a space on to the right
If the robot ends up in either of the terminal states of \(S_1\) or \(S_7\) it is taken and placed in another state at random. State \(S_1\) does damage to the robot while state \(S_7\) rewards it. \(S_4\) is not exactly a damage causing state but it is an avoidable state and we want to learn that over time. All other states are neither harmful nor rewarding. The robot's goal is to "learn" to get to state \(S_7\) in as few steps as possible.

In order to get reinforcement learning to work, you need to know what the reward values are for each of the states the robot can be in. In this particular example, we will assume the following reward structure for each of the states.
Note, the numbers are fairly arbitrary. In addition to this we need a function or a table, mapping out actions/states pairs leading to new states. Given the robot's movement description above, we can use a table as follows
Zero is being used to designate the scenario when the robot is reset to a random state. With the above two sets of information we are good to start the learning process.

Reinforcement learning works in the following way. You maintain a matrix of values for each state/action pair. This is the reward that can be attained by arriving at a particular state by taking a particular action. But the trick is to also account for what possible future state you can get to, given that you arrive at a given state. For example it may be beneficial to be in a certain state \(X\), but all downstream states from \(X\) may be bad to be in. You want to avoid such states. This way, the algorithm tries to ensure that future favourable and achievable states are taken into account. The algorithm does not immediately update itself based on whatever it learns, it preserves old values and learns gradually. The above methodology can be stated as follows.

If \(Q^{*}(s_t,a_t)\) represents the pay off received by being in state then

$$
Q^{*}(s_t,a_t) = R(s_{t+1},s_t) + \alpha max_{a_t}Q^{*}(s_{t+1},a_{t+1})
$$

To slow down learning a bit, we stick with whatever prior estimate of \(Q^{*}(s_t,a_t)\) we have by a fraction of \(\beta\) as shown below

$$
Q^{*}_{new}(s_t,a_t) = \beta Q^{*}_{prior}(s_t,a_t) + (1 - \beta)Q^{*}(s_t,a_t)
$$
That's it! We now let the system take on various initial states, and let the device play around moving over to different states while we constantly update our \(Q\) matrix. After several iterations, the \(Q\) matrix will end with some values which reflect what strategy to take.

To give a swirl, here is an R code that walks through the entire process for this particular example



A thing to note in the code is how the reward function is encoded. There is a penalty imposed on moving from a higher state to a lower state. This is the simple way to ensure that whatever strategy it comes up with does not involve  going backwards from rightward positional gains that have been made. If you try it without the penalty, you will see cases where the strategy does not care going left or right in some states. The above code when run, creates the following output (screen grab below)


The rows represent the states, the columns represent the three possible actions move-left, move-right and hop-right that the robot can take. Notice what it's trying to say:
  • State 1, do nothing 
  • State 2, move right, but don't hop
  • State 3, don't move right, hop
  • State 4, move right or hop
  • State 5, move right (small chance), hop (definitely)
  • State 6, move right, move left (small chance)
This is exactly what you would expect and want! More on this subject can be read from the following books
Reinforcement Learning: State-of-the-Art (Adaptation, Learning, and Optimization)
an
Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning)
If you are looking to buy some books on probability theory here is a good list to own.

Comments

Popular posts from this blog

The Best Books to Learn Probability

If you are looking to buy some books in probability here are some of the best books to learn the art of Probability The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists (and Everyone Else!) A good book for graduate level classes: has some practice problems in them which is a good thing. But that doesn't make this book any less of buy for the beginner. An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd Edition This is a two volume book and the first volume is what will likely interest a beginner because it covers discrete probability. The book tends to treat probability as a theory on its own Discovering Statistics Using R This is a good book if you are new to statistics & probability while simultaneously getting started with a programming language. The book supports R and is written in a casual humorous way making it an easy read. Great for beginners. Some of the data on the companion website could be missing. Fifty Cha

The Best Books for Linear Algebra

The following are some good books to own in the area of Linear Algebra. Linear Algebra (2nd Edition) This is the gold standard for linear algebra at an undergraduate level. This book has been around for quite sometime a great book to own. Linear Algebra: A Modern Introduction Good book if you want to learn more on the subject of linear algebra however typos in the text could be a problem. Linear Algebra (Dover Books on Mathematics) An excellent book to own if you are looking to get into, or want to understand linear algebra. Please keep in mind that you need to have some basic mathematical background before you can use this book. Linear Algebra Done Right (Undergraduate Texts in Mathematics) A great book that exposes the method of proof as it used in Linear Algebra. This book is not for the beginner though. You do need some prior knowledge of the basics at least. It would be a good add-on to an existing course you are doing in Linear Algebra. Linear Algebra, 4th Ed

Fun with Uniform Random Numbers

Q: You have two uniformly random numbers x and y (meaning they can take any value between 0 and 1 with equal probability). What distribution does the sum of these two random numbers follow? What is the probability that their product is less than 0.5. The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists A: Let z = x + y be the random variable whose distribution we want. Clearly z runs from 0 to 2. Let 'f' denote the uniform random distribution between [0,1]. An important point to understand is that f has a fixed value of 1 when x runs from 0 to 1 and its 0 otherwise. So the probability density for z, call it P(z) at any point is the product of f(y) and f(z-y), where y runs from 0 to 1. However in that range f(y) is equal to 1. So the above equation becomes From here on, it gets a bit tricky. Notice that the integral is a function of z. Let us take a look at how else we can simply the above integral. It is easy to see that f(z-y) = 1 when (