• Non ci sono risultati.

Selected readings:

N/A
N/A
Protected

Academic year: 2021

Condividi "Selected readings:"

Copied!
1
0
0

Testo completo

(1)

RL Book: ​http://incompleteideas.net/book/RLbook2018.pdf​ (Sutton & Barto)

Selected readings:

Kohl, Nate, and Peter Stone. "Policy gradient reinforcement learning for fast quadrupedal locomotion."

IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA'04. 2004​. Vol.

3. IEEE, 2004.

Abbeel, Pieter, and Andrew Y. Ng. "Apprenticeship learning via inverse reinforcement learning."

Proceedings of the twenty-first international conference on Machine learning​. ACM, 2004.

Abbeel, P., Coates, A., & Ng, A. Y. (2010). Autonomous helicopter aerobatics through apprenticeship learning. ​The International Journal of Robotics Research​, ​29​(13), 1608-1639.

Levine, S., & Koltun, V. (2013, February). Guided policy search. In ​International Conference on Machine Learning​ (pp. 1-9).

Gu, S., Lillicrap, T., Sutskever, I., & Levine, S. (2016, June). Continuous deep q-learning with model-based acceleration. In ​International Conference on Machine Learning​ (pp. 2829-2838).

Konidaris, G., Osentoski, S., & Thomas, P. (2011, August). Value function approximation in reinforcement learning using the Fourier basis. In ​Twenty-fifth AAAI conference on artificial intelligence​.

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013).

Playing atari with deep reinforcement learning. ​arXiv preprint arXiv:1312.5602​.

Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., ... & Dieleman, S.

(2016). Mastering the game of Go with deep neural networks and tree search. ​nature​, ​529​(7587), 484.

Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015, June). Trust region policy optimization. In ​International conference on machine learning​ (pp. 1889-1897).

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. ​arXiv preprint arXiv:1707.06347​.

Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. ​arXiv preprint arXiv:1801.01290​.

Rusu, A. A., Vecerik, M., Rothörl, T., Heess, N., Pascanu, R., & Hadsell, R. (2016). Sim-to-real robot learning from pixels with progressive nets. ​arXiv preprint arXiv:1610.04286​.

Riferimenti

Documenti correlati

Finally we implemented and partially trained the control architecture of the paper Relational Deep Reinforcement Learning (Zambaldi et al. 2018) by DeepMind, which is a deep

However, implementation of reinforcement learning techniques to multi-agent sys- tems have some additional challenges [ 58 , 59 ]. There are two ways to implement reinforcement

• value function and rollout to propagate information in the tree search, as well as to update the

Then the work focused on the development of tools and on a process able to create the soft robot, and on the realization of an electric circuit to control, by means of

In the following section the different behaviour of vehicle in real environment first using laser scanner and then camera will be discussed, making some consideration about all

Once developed the simulator and trained the agents to mimic human behaviors, this synthetic environment has been used to learn a module able to give high-level maneuver for a

However, there are two obstacles in the way of using algorithms designed for repeated games, such as fictitious play, over this global Q-function; first, the Q-function is not

Learning to chose the best action based on rewards or punishments from the interacting environment.. Actions need to maximize the amount of