|
Textbooks and Reference |
-
Udacity, "Deep Reinforcement Learning", GitHub
-
Richard S. Sutton and Andrew G. Barto, “Reinforcement Learning, An Introduction, 2nd Edition” The MIT Press, 2018
-
David Silver's Reinforcement Learning Course
-
Lex Fridman, Deep Reinforcement Learning, MIT
-
Maxim Lapan, “Deep Reinforcement Learning Hands-on,” Packt, 2018
|
|
GitHub |
|
|
|
Course Requirements |
|
pdf |
|
|
2020/3/5 |
|
Lab 0 |
|
|
|
|
2020/4/10 |
1 |
Introduction to Deep Reinforcement Learning |
- What is Reinforcement Learning?
- State, Action, Reward and Policy
- Value-based vs. Policy-based Learning
- On-policy vs. Off-policy
|
pdf |
|
|
2020/3/20 |
2 |
Multi-armed Bandit Problem |
- ε-greedy Formula
- Gradient Bandit Problem
|
pdf |
|
|
2020/3/22 |
3 |
Markov Decision Processes (MDP) |
- Finite Markov Decision Processes
|
pdf |
|
|
2020/3/20 |
4 |
OpenAI Gym |
|
pdf |
|
|
2020/3/27 |
5 |
Dynamic Programming |
- Value Iteration
- Policy Iteration
|
pdf |
FrozenLake |
|
2020/4/18 |
|
Lab 1 |
|
|
code |
|
2020/4/10 |
6 |
Monte Carlo Methods |
- First-visit Monte Carlo
- Importance Sampling for Off-policy
|
pdf |
BlackJack |
|
2020/5/1 |
7 |
Temporal Difference Learning |
- Temporal Difference vs. Monte Carlo
- SARSA
- Q-learning
- n-step TD
|
pdf |
CliffWalking |
|
2020/5/22 |
8 |
Function Approximation |
- Approxiamte Large States via Parameterized Functions
|
pdf |
DQN |
|
2020/5/15 |
|
Lab 2 |
|
|
code |
|
2020/4/10 |
9 |
Policy Gradient |
- Policy Gradient Theorem
- REINFORCE
- Actor-Critic
|
pdf |
REINFORCE |
|
2020/5/22 |
10 |
Model-based Reinforcement Learning |
- Table Lookup Model
- Dyna-Q
- Monte Carlo Tree Search
|
pdf |
|
|
2020/5/29 |
|
Lab 3 |
|
pdf |
|
|
2020/4/10 |
|
Final Project |
|
|
|
|
2020/7/5 |