Wireless networks are on the cusp of a major transformation with the advent of smart surfaces that can manipulate signals to boost coverage and reliability. However, these systems face two critical s: energy constraints that limit their performance and security threats that can undermine their learning processes. A new study introduces a dynamic hybrid reconfigurable intelligent surface (RIS) that adapts its operation based on harvested energy, coupled with a lightweight defense mechanism against reward poisoning attacks in deep reinforcement learning (DRL). This approach not only enhances the data rates for secondary users in cognitive radio networks but also ensures robust performance even under adversarial conditions, marking a significant step toward practical and secure next-generation wireless systems.
The researchers developed a dynamic hybrid RIS that switches between passive and active modes in real time, depending on the energy harvested from a dedicated power beacon. In passive mode, the RIS reflects signals without amplification, consuming minimal energy, while in active mode, it amplifies signals to overcome fading and noise, at the cost of higher energy use. The system uses a threshold, denoted as τ, to decide the mode: if the total harvested energy (Etotal) is below τ, it operates passively; otherwise, it activates amplification with a gain scaled by the available energy. This energy-aware design allows the RIS to balance performance and power consumption, addressing a key limitation in previous static or fully active RIS architectures that lack adaptability to energy fluctuations.
To optimize the network, the team employed a soft actor-critic (SAC) deep reinforcement learning , which jointly adjusts the transmit beamforming at the secondary user transmitter and the phase shifts at the RIS. The SAC algorithm was chosen for its robustness in continuous and dynamic environments, outperforming other DRL baselines like deep deterministic policy gradient (DDPG) and twin delayed deep deterministic policy gradient (TD3). The optimization problem maximized the sum data rate of secondary users under constraints such as interference limits for primary users and energy availability for the RIS. The state space included parameters like transmission power, channel conditions, and previous actions, while the action space involved beamforming and RIS configuration matrices, with rewards based on the achieved data rates and penalties for energy shortfalls.
Numerical demonstrated the effectiveness of this approach. For instance, increasing the number of RIS elements (R) from 2 to 16 improved the secondary users' sum rate, as shown in Figure 2, due to better signal control and multipath enhancement. However, higher cascade levels (κ) in the channels, which model increased fading from multiple reflections, degraded performance, as seen in Figure 3 where κ=4 led to lower rates compared to κ=2. The dynamic hybrid RIS struck a superior trade-off between throughput and energy consumption: for τ=10 Joules, it achieved performance close to fully active RIS but with significant energy savings, such as a 74.2% reduction in consumption at τ=50 Joules, as detailed in Table IV. Figure 7 illustrated that the hybrid mode outperformed passive RIS and matched active RIS at lower thresholds, while Figure 8 showed how mode-switching frequencies varied with τ, highlighting energy-aware behavior.
The study also addressed security by investigating reward poisoning attacks on DRL agents, where adversaries manipulate reward signals to degrade learning. The researchers proposed a defense combining reward clipping and statistical anomaly filtering, which maintained secondary user performance even under attacks like inversion or scaling of rewards. As shown in Figure 12, this defense mechanism, with a filtering threshold χ=1, preserved stable average rates despite attacks, whereas unprotected systems suffered significant degradation. The defense's lightweight nature, with minimal computational cost, makes it suitable for real-time deployment in resource-constrained networks, filling a gap in existing s that are often computationally intensive.
Despite these advances, the work has limitations. The scalability to larger RIS surfaces or massive MIMO arrays requires further study, and the defense mechanism, while effective against reward poisoning, may not cover all attack types like policy or observation poisoning. Additionally, the energy harvesting model assumes ideal conditions, and practical implementations may face s like variable energy availability or hardware imperfections. Future research could explore hierarchical control for scalability and extend the defense to broader adversarial scenarios, but this study lays a crucial foundation for energy-efficient and secure wireless networks in the 6G era.
Original Source
Read the complete research paper
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn