Intelligent Handover Optimization in 5G TN-NTN Heterogeneous Networks via Deep Reinforcement Learning
A real-time control system that learns to route mobile sessions across a three-tier ground/air/space network — engineered like an exchange backend: low-latency decisions, a synchronous agent protocol, and reproducible, audited evaluation.
Five contributions — read as backend engineering
Each research contribution maps onto a competency that crypto backend job descriptions ask for. The left column is the dissertation; the right is the transfer.
Reproducible hybrid simulator
32 source-level modifications to a C/C++ network simulator (NetSim 14.4), grouped into 6 functional clusters, promoting ground / HAPS / LEO nodes to peers with full standards-compliant signalling.
Systems programming in C/C++ under a pre-compiled-library binary-compatibility constraint — the same memory-layout discipline as performance-critical exchange code.
Synchronous external-agent interface
A synchronous TCP interface delivering a 12-feature state vector at every decision event and returning a per-session control action — an online inference loop bridging the C-side engine and Python policies.
Low-latency client/server protocol design, request/response framing, and a hot decision path evaluated thousands of times per run.
Four-family KPI framework
Augmented the native 12-metric set with radio-tail indicators and organised them into mobility / radio / QoS / fairness families to expose a multi-objective trade-off the literature usually hides.
Observability and SLO design: choosing the metrics that actually expose tail risk, not just the convenient averages.
Off-curve DRL operating points
Double DQN + Prioritised Experience Replay (discrete) and TD3 (continuous) agents reach Pareto operating points unreachable by any static configuration: 78.5% and 71.8% fewer handovers vs baseline.
Multi-objective optimization under hard constraints — minimise transaction churn while holding QoS thresholds, exactly the matching-engine trade-off.
Two robustness signatures
Cross-seed retraining distinguishes outcome-robustness from policy-robustness: TD3 is point-reproducible, DDQN reaches the same region via a different action mixture — a deployment-auditability distinction.
Reproducibility & auditability: the difference between 'the numbers match' and 'the system behaves identically', critical for production sign-off.
The headline result
Two deep-RL agents reach operating points no static configuration can — minimising transaction churn while holding quality-of-service thresholds. The same multi-objective trade-off a matching engine faces every microsecond.
Let's build the backend that runs the on-chain economy.
Looking for Web3 / crypto backend engineering roles — exchanges, DeFi infrastructure, settlement and low-latency systems.