Enhanced Experience Prioritization : A Novel Upper Confidence Bound Approach

Kővári, Bálint and Pelenczei, Bálint and Bécsi, Tamás (2023) Enhanced Experience Prioritization : A Novel Upper Confidence Bound Approach. IEEE ACCESS, 11. pp. 138488-138501. ISSN 2169-3536

Preview

Text
Enhanced_Experience_Prioritization_A_Novel_Upper_Confidence_Bound_Approach.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.
Download (2MB) | Preview

Official URL: http://doi.org/10.1109/ACCESS.2023.3339248

Abstract

Value-based Reinforcement Learning algorithms achieve superior performance by utilizing experiences gathered in the past to update their so-called value-function. In most cases, it is accomplished by applying a sampling strategy to an experience buffer, in which state transitions are stored during the training process. However, the design of such methods is not so intuitive. General theoretic approaches tend to determine the expected learning progress from each experience, based on which the update of neural networks can be carried out efficiently. Proper choice of these methods can not only accelerate, but also stabilize the training significantly by increasing sampling efficiency, which indirectly leads to a reduction in time and computing capacity requirements. As one of the most critical aspects of using Machine Learning (ML) based techniques originates from the lack of decent computing power, thus endeavour to find optimal solutions has long been a researched topic in the field of Reinforcement Learning. Therefore the main focus of this research has been to develop an experience prioritization method acquiring competitive performance, besides having the overall cost of training considerably lowered. In this paper, we propose a novel priority value assignment concept for experience prioritization in Reinforcement Learning, based on the Upper Confidence Bound algorithm. Furthermore, we present empirical findings of our solution, that it outperforms current state-of-the-art in terms of sampling efficiency, while enabling faster and more cost-efficient training processes.

Item Type:	Article
Uncontrolled Keywords:	Deep learning, experience prioritization, experience replay, machine learning, Q-learning, reinforcement learning, sampling
Subjects:	T Technology / alkalmazott, műszaki tudományok > TA Engineering (General). Civil engineering (General) / általános mérnöki tudományok
Depositing User:	Dr. Tamás Bécsi
Date Deposited:	25 Sep 2024 06:27
Last Modified:	25 Sep 2024 06:27
URI:	https://real.mtak.hu/id/eprint/205725

Actions (login required)

Edit Item