• Non ci sono risultati.

Introduction to Reinforcement Learning

N/A
N/A
Protected

Academic year: 2022

Condividi "Introduction to Reinforcement Learning"

Copied!
2
0
0

Testo completo

(1)

Introduction to Reinforcement Learning

Dr. Juan Jos ´e Alcaraz Esp´ın

1

1Technical University of Cartagena, Spain email: [email protected]

Timetable: see on https://phd.dei.unipd.it/course-catalogues/

Important note: course to be confirmed based on the result of the Visiting Scientist Call Enrollment: students must enroll in the course using the Enrollment Form on the PhD Program eLearning platform (requires SSO authentication).

Course requirements: Basics of linear algebra, probability theory, Python scripting

Examination and grading: The grading will be based on the students’ solutions to the pro- posed assignments.

SSD: Information Engineering

Aim: The course will provide an introduction to the field of reinforcement learning, covering its mathematical foundations and the description of the most relevant algorithms. The main con- cepts and techniques will be illustrated with Python code and application examples in telecom- munications and other related areas. The students will acquire hands-on experience with the proposed assignments in which they will have to implement Python code for solving several challenges and exercises. The course will start with the basic concepts of learning in sequen- tial decision problems, formalized in the multi-armed bandit (MAB) problem and its variants.

Then, the Markov decision processes (MDPs), which generalize the MAB problem, will be introduced. The objective of reinforcement learning (RL) is to find approximate solutions to MDPs. The main RL approaches will be presented incrementally: 1) tabular methods, which are capable of addressing relatively small problems, 2) value function approximation, which allows scaling up previous algorithms to larger problems, and 3) policy gradient algorithms which follow a different scaling approach and can be used in combination with value function approximation (Actor-Critic methods).

Course contents:

Unit 1. Introduction to Reinforcement Learning

Unit 2. Multi-Armed Bandits: Stochastic Bandits, Boltzmann Exploration, UCB algorithms, Thompsom Sampling, Contextual Bandits.

Unit 3. Markov Decision Processes: Stochastic Shortest Path problems. Policy Iteration. Value Iteration. MDPs with discount.

Unit 4. Tabular Methods: Monte Carlo Method, Temporal Difference, Off-policy algorithms, Planning at decision time.

Unit 5. Value Function Approximation (VFA) Methods: Linear VFA, Monte Carlo with VFA, TD methods with VFA.

Unit 6. Policy Gradient Algorithms: Score functions, Policy Gradient Theorem, Monte Carlo Policy Gradient, Actor-Critic Policy Gradient.

Unit 7 (Optional) Evolutionary Algorithms

DEI-1

(2)

References:

1. Reinforcement Learning: An Introduction, Second Edition, Richard S. Sutton and Andrew G. Barto, MIT Press, Cambridge, MA, 2018.

2. Approximate Dynamic Programming: Solving the Curses of Dimensionality, Second Edi- tion, Warren B. Powell, Wiley, 2011.

3. Dynamic Programming and Optimal Control Vol I and Vol II, 4th Edition, Dimitri P. Bert- sekas, Athena Scientific, 2012.

4. Algorithms for Reinforcement Learning, Csaba Szepesv´ari, Morgan and Claypool, 2010.

5. Reinforcement Learning and Optimal Control, Dimitri P. Bertsekas, Athenea Scientific, 2019.

6. Markov Decision Processes: Discrete Stochastic Dynamic Programming, Martin L. Put- erman, Wiley, 2006.

DEI-2

Riferimenti

Documenti correlati

The extensive analysis conducted for the Julsundet bridge case study highlighted that, for a sus- pension bridge with a main span of about 1600 m and a streamlined closed-box deck

Per tutti questi motivi e quesiti, in questa tesi si è voluto analizzare le attività del Case Manager contestualizzandolo nella realtà di Breast Unit Livornese, studiando il PDTA

Al di là di una più diretta riflessione storica, stimolata dagli ambienti francesi, le implicazioni radicali della ricezione del- la cultura politica francese di matrice

Le ricerche, inoltre, non si limitano al solo studio dei singoli borghi abbandonati, ma, data la loro già citata natura sistemica, spesso coinvolgono i centri storici in senso

Quella collettività, numericamente sempre più consistente, sul finire dell’Ottocento si sarebbe spaccata in due blocchi contrapposti che riflettevano la molteplicità degli

As regards, the introduction of QoS management in OBS networks leads to an in- crease in the number of idle time intervals (voids) available on different wavelengths of a given

All’interno del gruppo dei senatori sardi non è stato constatato neppure un singolo caso di senatore che avvertì nella nomina al laticlavio una sorta di delusione, dovuta alla

14 th International Conference on Information Processing and Management of Uncertainty in Knowledge- Based Systems, IPMU 2012, Catania, Italy, July 9–13, 2012, Proceedings, Part