Abstract
Device-to-device (D2D) communications is designed to improve the overall network performance, including low latency, high data rates, and system capacity of the fifth-generation (5G) wireless networks. The system capacity can even be improved by reusing resources between D2D user equipments (DUEs) and cellular user equipments (CUEs) without causing harmful interference to the CUEs. A D2D resource allocation scheme is expected to have the characteristic that one CUE be allocated with a variable number of resource blocks (RBs), and the RBs be reused by more than one DUE. In this study, the Multi-Player Multi-Armed Bandit (MPMAB) reinforcement learning scheme is employed to model such a problem by establishing a preference matrix to facilitate greedy resource allocation. A fair resource allocation scheme is then proposed and shown to achieve fairness, prevent waste of resources, and alleviate starvation. Moreover, this scheme has better performance when there are not too many D2D pairs.
Original language | English |
---|---|
Pages (from-to) | 1076-1095 |
Number of pages | 20 |
Journal | Mobile Networks and Applications |
Volume | 28 |
Issue number | 3 |
DOIs | |
State | Published - Jun 2023 |
Keywords
- Device-to-device (D2D)
- Dynamic resource allocation
- Multi-Player Multi-Armed Bandit (MPMAB)
- Reinforcement learning
- Resource allocation