Table | Card | RUSMARC | |
Allowed Actions: –
Action 'Read' will be available if you login or access site from another network
Action 'Download' will be available if you login or access site from another network
Group: Anonymous Network: Internet |
Annotation
В данной работе исследована возможность применимости многоагентного подхода к задачам глубокого обучения с подкреплением. Реализован алгоритм оптимизации политик Trust Region Policy Optimization. Изучены возможности по параллельного выполнения данного алгоритма и предложена новая архитектура его выполнения, основанная на множестве агентов, генерирующих обучающую выборку для оптимизатора. Предложенная архитектура реализована и протестирована на классических задачах обучения с подкреплением.
In this work, we researched the possibility of application of the multi-agent approach in the field of deep reinforcement learning. Trust Region Policy Optimization algorithm has been implemented. The ways of parallelizing this algorithm have been investigated. A new training workflow has been proposed. This workflow includes multiple agents, who are constantly generating the training batches for optimization in distributed environment. The proposed architecture has been implemented and tested for classic reinforcement learning tasks.
Document access rights
Network | User group | Action | ||||
---|---|---|---|---|---|---|
ILC SPbPU Local Network | All | |||||
Internet | Authorized users SPbPU | |||||
Internet | Anonymous |
Usage statistics
Access count: 41
Last 30 days: 0 Detailed usage statistics |