Reinforcement Learning

Richard S. Sutton, Andrew G. Barto

Mentioned 4

An account of key ideas and algorithms in reinforcement learning. The discussion ranges from the history of the field's intellectual foundations to recent developments and applications. Areas studied include reinforcement learning problems in terms of Markov decision problems and solution methods.

Mentioned in questions and answers.

Assume you know a student who wants to study Machine Learning and Natural Language Processing.

What introductory subjects would you recommend?

Example: I'm guessing that knowing Prolog and Matlab might help him. He also might want to study Discrete Structures*, Calculus, and Statistics.

*Graphs and trees. Functions: properties, recursive definitions, solving recurrences. Relations: properties, equivalence, partial order. Proof techniques, inductive proof. Counting techniques and discrete probability. Logic: propositional calculus, first-order predicate calculus. Formal reasoning: natural deduction, resolution. Applications to program correctness and automatic reasoning. Introduction to algebraic structures in computing.

This related stackoverflow question has some nice answers: What are good starting points for someone interested in natural language processing?

This is a very big field. The prerequisites mostly consist of probability/statistics, linear algebra, and basic computer science, although Natural Language Processing requires a more intensive computer science background to start with (frequently covering some basic AI). Regarding specific langauges: Lisp was created "as an afterthought" for doing AI research, while Prolog (with it's roots in formal logic) is especially aimed at Natural Language Processing, and many courses will use Prolog, Scheme, Matlab, R, or another functional language (e.g. OCaml is used for this course at Cornell) as they are very suited to this kind of analysis.

Here are some more specific pointers:

For Machine Learning, Stanford CS 229: Machine Learning is great: it includes everything, including full videos of the lectures (also up on iTunes), course notes, problem sets, etc., and it was very well taught by Andrew Ng.

Note the prerequisites:

Students are expected to have the following background: Knowledge of basic computer science principles and skills, at a level sufficient to write a reasonably non-trivial computer program. Familiarity with the basic probability theory. Familiarity with the basic linear algebra.

The course uses Matlab and/or Octave. It also recommends the following readings (although the course notes themselves are very complete):

For Natural Language Processing, the NLP group at Stanford provides many good resources. The introductory course Stanford CS 224: Natural Language Processing includes all the lectures online and has the following prerequisites:

Adequate experience with programming and formal structures. Programming projects will be written in Java 1.5, so knowledge of Java (or a willingness to learn on your own) is required. Knowledge of standard concepts in artificial intelligence and/or computational linguistics. Basic familiarity with logic, vector spaces, and probability.

Some recommended texts are:

The prerequisite computational linguistics course requires basic computer programming and data structures knowledge, and uses the same text books. The required articificial intelligence course is also available online along with all the lecture notes and uses:

This is the standard Artificial Intelligence text and is also worth reading.

I use R for machine learning myself and really recommend it. For this, I would suggest looking at The Elements of Statistical Learning, for which the full text is available online for free. You may want to refer to the Machine Learning and Natural Language Processing views on CRAN for specific functionality.

Jurafsky and Martin's Speech and Language Processing http://www.amazon.com/Speech-Language-Processing-Daniel-Jurafsky/dp/0131873210/ is very good. Unfortunately the draft second edition chapters are no longer free online now that it's been published :(

Also, if you're a decent programmer it's never too early to toy around with NLP programs. NLTK comes to mind (Python). It has a book you can read free online that was published (by OReilly I think).

My toy project to learn & apply Reinforcement Learning is:
- An agent tries to reach a goal state "safely" & "quickly"....
- But there are projectiles and rockets that are launched upon the agent in the way.
- The agent can determine rockets position -with some noise- only if they are "near"
- The agent then must learn to avoid crashing into these rockets..
- The agent has -rechargable with time- fuel which is consumed in agent motion
- Continuous Actions: Accelerating forward - Turning with angle

I need some hints and names of RL algorithms that suit that case..
- I think it is POMDP , but can I model it as MDP and just ignore noise?
- In case POMDP, What is the recommended way for evaluating probability?
- Which is better to use in this case: Value functions or Policy Iterations?
- Can I use NN to model environment dynamics instead of using explicit equations?
- If yes, Is there a specific type/model of NN to be recommended?
- I think Actions must be discretized, right?

I know it will take time and effort to learn such a topic, but I am eager to..
You may answer some of the questions if you can not answer all...
Thanks

If this is your first experiment with reinforcement learning I would recommend starting with something much simpler than this. You can start simple to get the hang of things and then move to a more complicated project like this one. I have trouble with POMDPs and I have been working in RL for quite a while now. Now I'll try to answer what questions I can.

I think it is POMDP , but can I model it as MDP and just ignore noise?

Yes. POMDP stands for Partially Observable Markov Decision Process. The partially observable part refers to the fact that the agent can't know it's state perfectly, but can estimate it based on observations. In your case, you would have the location of the rocket as an observation that can have some noise, and based on the agents previous knowledge you can update it's belief of where the missiles are. That adds a lot of complexity. It would be much easier to use the missile locations as absolutes and not have to deal with uncertainty. Then you would not have to use POMDPs.

In case POMDP, What is the recommended way for evaluating probability?

I don't understand your question. You would use some form of Bayes rule. That is, you would have some sort of distribution that is your belief state (probabilities of being in any given state), that would be your prior distribution and based on observation you would adjust this and get a posterior distribution. Look into Bayes rule if you need more info.

Which is better to use in this case: Value functions or Policy Iterations?

Most of my experience has been using value functions and find them relatively easy to use/understand. But I don't know what else to tell you. I think this is probably your choice, I would have to spend time working on the project to make a better choice.

Can I use NN to model environment dynamics instead of using explicit equations? If yes, Is there a specific type/model of NN to be recommended?

I don't know anything about using NN to model environments, sorry.

I think Actions must be discretized, right?

Yes. You would have to have a discrete list of actions, and a discrete list of states. Generally the algorithm will choose the best action for any given state, and for the simplest algorithms (something like QLearning) you just keep track of a value for every given state-action pair.

If you are just learning all of this stuff I would recommend the Sutton and Barto text. Also if you want to see a simple example of a RL algorithm I have a very simple base class and an example using it up at github (written in Python). The abstract_rl class is meant to be extended for RL tasks, but is very simple. simple_rl.py is an example of a simple task (it is a simple grid with one position being the goal and it uses QLearning as the algorithm) using base_rl that can be run and will print some graphs showing reward over time. Neither are very complex, but if you are just getting started may help to give you some ideas. I hope this helped. Let me know if you have any more or more specific questions.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Computer Science or Cross Validated instead. Otherwise you're probably off-topic.

Reinforcement learning is learning what to do--how to map situations to actions--so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them. In the most interesting and challenging cases, actions may affect not only the immediate reward but also the next situation and, through that, all subsequent rewards. These two characteristics--trial-and-error search and delayed reward--are the two most important distinguishing features of reinforcement learning.

Significant Literature