Work

Deep Reinforcement Learning based Wireless Resource Management

Public

Next generation cellular networks are expected to support a massive data traffic volume and satisfy a vast number of users that have latency-critical quality-of-service expectations. Towards serving this demand, it is envisaged that the interference management problem will be the main bottleneck due to the likeliness of a heavily interfering wireless environment caused by much denser deployment of base stations and mobiles. Due to inherently scarce shared frequency-band resources over time-varying traffic and multi-channel conditions, a scalable and practical fast-timescale resource management is an absolute necessity towards next generation cellular networks. The conventional optimization based resource management schemes are either practically infeasible, computationally challenging, or intractable due to relying on model-driven techniques. Therefore, in the past few years, there has been extensive research on model-free reinforcement learning based resource management. Reinforcement learning is purely data-driven, and its multi-agent adaptation is promising for scalability on larger networks where agents collaboratively work together towards a shared objective. We initially show the potential of deep reinforcement learning for transmit power control in wireless networks. Existing power control techniques typically find near-optimal power allocations by solving a challenging optimization problem. Most of these algorithms are not scalable to large networks in real-world scenarios because of their computational complexity and instantaneous cross-cell channel state information (CSI) requirement. The proposed method is a distributively executed dynamic power allocation scheme that maximizes a weighted sum-rate objective, which can be particularized to achieve maximum sum-rate or proportionally fair scheduling. Each transmitter collects delayed channel measurements of a time-varying channel from its neighbors and adapts its own transmit power accordingly. Both random variations and delays in the CSI are inherently addressed using deep Q-learning. For a typical network architecture with single subband and full-buffer traffic, the proposed algorithm is shown to achieve near-optimal power allocation in real time based on delayed CSI measurements available to the agents. The proposed scheme is especially suitable for practical scenarios where the system model is inaccurate and CSI delay is non-negligible. Next, we integrate the proposed power control algorithm to the case of mobile devices for which the channel conditions change not only due to fast fading but also due to the device movements. We further include the continuous power control by replacing deep Q-learning, which applies only to discrete action spaces and requires transmit power to be quantized, with deep deterministic policy gradient algorithm which is an actor-critic learning method that applies to the continuous action spaces as well. Additionally, for the case of multiple-frequency bands, we propose a novel approach for the joint subband selection and power allocation problem that consists of two layers, where the bottom layer is responsible for continuous power allocation at the physical layer by using deep deterministic policy gradient algorithm, and the top layer does discrete subband selection with deep Q-learning. Finally, we propose a multi-agent deep reinforcement learning based resource management scheme that can instantaneously respond to the changes in both traffic and channel dynamics. With the help of a novel reward function design, each learning agent appropriately adapts its resources within each time slot to stabilize its own and its neighbors' queue lengths, and agents collaboratively maximize the long-term quality of service over their local environment by minimizing the average packet delay. The local state consists of user priorities and channel measurements. User priorities connect physical layer resource management with the network layer. User priorities can represent link weights of a proportionally fair scheme, queue lengths, or anything else specified by the network layer according to the type of user service and quality of service requirements. We also consider several practicality constraints on channel measurements, so we build the local state set with aggregated interference instead of individual channel gain measurements. Using simulations, we demonstrate the effectiveness of the proposed approach compared to an optimization based resource allocation scheme which follows proportional fairness but lacks instantaneous interaction with queue states.

Creator
DOI
Subject
Language
Alternate Identifier
Keyword
Date created
Resource type
Rights statement

Relationships

Items