D2D Resource Allocation Based on Reinforcement Learning and QoS

Fang Chang Kuo, Hwang Cheng Wang, Chih Cheng Tseng, Jung Shyr Wu, Jia Hao Xu, Jieh Ren Chang

Research output: Contribution to journalArticlepeer-review

Abstract

Device-to-device (D2D) communications is designed to improve the overall network performance, including low latency, high data rates, and system capacity of the fifth-generation (5G) wireless networks. The system capacity can even be improved by reusing resources between D2D user equipments (DUEs) and cellular user equipments (CUEs) without causing harmful interference to the CUEs. A D2D resource allocation scheme is expected to have the characteristic that one CUE be allocated with a variable number of resource blocks (RBs), and the RBs be reused by more than one DUE. In this study, the Multi-Player Multi-Armed Bandit (MPMAB) reinforcement learning scheme is employed to model such a problem by establishing a preference matrix to facilitate greedy resource allocation. A fair resource allocation scheme is then proposed and shown to achieve fairness, prevent waste of resources, and alleviate starvation. Moreover, this scheme has better performance when there are not too many D2D pairs.

Original languageEnglish
Pages (from-to)1076-1095
Number of pages20
JournalMobile Networks and Applications
Volume28
Issue number3
DOIs
StatePublished - Jun 2023

Keywords

  • Device-to-device (D2D)
  • Dynamic resource allocation
  • Multi-Player Multi-Armed Bandit (MPMAB)
  • Reinforcement learning
  • Resource allocation

Fingerprint

Dive into the research topics of 'D2D Resource Allocation Based on Reinforcement Learning and QoS'. Together they form a unique fingerprint.

Cite this