Muestra métricas de impacto externas asociadas a la publicación. Para mayor detalle:
| Indexado |
|
||||
| DOI | 10.1007/S10514-018-9786-6 | ||||
| Año | 2019 | ||||
| Tipo | artículo de investigación |
Citas Totales
Autores Afiliación Chile
Instituciones Chile
% Participación
Internacional
Autores
Afiliación Extranjera
Instituciones
Extranjeras
Reinforcement Learning agents can be supported by feedback from human teachers in the learning loop that guides the learning process. In this work we propose two hybrid strategies of Policy Search Reinforcement Learning and Interactive Machine Learning that benefit from both sources of information, the cost function and the human corrective feedback, for accelerating the convergence and improving the final performance of the learning process. Experiments with simulated and real systems of balancing tasks and a 3 DoF robot arm validate the advantages of the proposed learning strategies: (i) they speed up the convergence of the learning process between 3 and 30 times, saving considerable time during the agent adaptation, and (ii) they allow including non-expert feedback because they have low sensibility to erroneous human advice.
| Ord. | Autor | Género | Institución - País |
|---|---|---|---|
| 1 | Celemin, Carlos | Hombre |
Delft Univ Technol - Países Bajos
Universidad de Chile - Chile Department of Cognitive Robotics, TU Delft - Países Bajos Advanced Mining Technology Center - Chile Centro Avanzado de Tecnologia para la Mineria - Chile |
| 2 | RUIZ DEL SOLAR-SAN MARTIN, JAVIER | Hombre |
Universidad de Chile - Chile
Advanced Mining Technology Center - Chile Centro Avanzado de Tecnologia para la Mineria - Chile |
| 3 | Kober, Jens | Hombre |
Delft Univ Technol - Países Bajos
Department of Cognitive Robotics, TU Delft - Países Bajos |
| Fuente |
|---|
| FONDECYT |
| Fondo Nacional de Desarrollo Científico y Tecnológico |
| Fondo Nacional de Desarrollo CientÃfico y Tecnológico |
| CONICYTPCHA |
| DOCTORADO |
| CONICYTPCHA/Doctorado |
| Agradecimiento |
|---|
| This work was partially funded by FONDECYT project 1161500 and CONICYTPCHA/Doctorado Nacional/2015-21151488 |
| Acknowledgements This work was partially funded by FONDECYT project 1161500 and CONICYTPCHA/Doctorado Nacional/2015-21151488 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecomm ons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. |