====== Stage M1R 2017 ====== ===== Pointeurs ===== === RL === * cours M1: {{m1r2017:cm1m22016-17.pdf|MDP et planif}}, {{m1r2017:cm4a2016rl.pdf|RL}} * cours David Silver : [[http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Teaching.html]] * livre de Sutton mis à jour: [[https://webdocs.cs.ualberta.ca/~sutton/book/bookdraft2016sep.pdf]] * Multi-Agent RL : * en premier, lire le chapitre 4 de [[https://tel.archives-ouvertes.fr/file/index/docid/362529/filename/these_matignon.pdf]] * puis lire [[http://liris.cnrs.fr/laetitia.matignon/index/matignon2012KER.pdf]] * Travaux de De Hauwere: Learning multi-agent state space representations * [[http://www.aamas-conference.org/Proceedings/aamas2010/pdf/01%20Full%20Papers/15_02_FP_0421.pdf]] * [[https://ai.vub.ac.be/ALA2012/downloads/paper5.pdf]] === App Constructiviste === * Thèse S. Mazac: [[https://tel.archives-ouvertes.fr/tel-01310583/file/TH2015MazacSebastien.pdf]] === RL et Inspirations Constructivistes === * Intrinsically Motivated RL [Singh2005] [[https://web.eecs.umich.edu/~baveja/Papers/FinalNIPSIMRL.pdf]] ===== Mémentos ===== ==== App Constructiviste ==== * [[compte-rendu-etat-art-these | Etat de l'art (Thèse S. Mazac)]] ==== RL ==== === Multi-agents === * [[memento-Learning-multi-agent-state-space-representations | Learning multi-agent state space representations (CQLearning)]] * [[memento-Processus-décisionnels-de-Markov-et-systèmes-multiagents | Processus décisionnels de Markov et systèmes multiagents (Thèse L. Matignon)]] * [[memento-Independent-reinforcement-learners-cooperative-Markov-games:-a-survey-regarding-coordination-problems | Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems (A terminer)]] * [[memento-Context-Sensitive-Reward-Shaping-for-Sparse-Inter-action-Multi-Agent-Systems | Context-Sensitive Reward Shaping for Sparse Inter-action Multi-Agent Systems]] === Inspirations Constructivistes === * [[memento-Intrinsically-Motivated-RL | Intrinsically Motivated RL [Singh2005]]] ==== Value function approximation ==== * [[memento-Value-function-approximation | Quelques infos]] ==== Temporal Difference - Growing Neural Gas ==== * [[memento-td-gng | TD-GNG]] ===== Réflexions ===== * [[reflexion-gng-qc | CQ-Learning et TD-GNG]] ===== Comptes-rendu de réunion ===== Dossier contenant les tous les slides présentés lors des réunions : [[https://drive.google.com/drive/folders/0B7dh6En0bP-KakRNYllvOVN3N2c | slides]] * [[ reu02-03-17 |02/03/17]] * 14/03/17