Control of a Twin Rotor using Twin Delayed Deep Deterministic Policy Gradient (TD3)

Transforming Enterprise Control with AI

This paper presents a reinforcement learning (RL) framework for controlling the Twin Rotor Aerodynamic System (TRAS) using the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. TRAS, a non-linear cross-coupled system, is challenging to control with traditional methods. The TD3 algorithm, suitable for continuous state and action spaces, achieved stabilization at specific pitch/azimuth angles and trajectory tracking. It demonstrated superior robustness against external wind disturbances compared to conventional PID controllers. Experimental validation on a laboratory setup confirmed its real-world effectiveness, showing significant advancements over prior control methodologies for multi-rotors.

Schedule Your Strategy Session

Executive Impact Metrics

0% Reduction in Control Error

0% Faster Deployment Post-Training

0% Improvement in Disturbance Adaptability

Discuss Your Implementation

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

TD3 Algorithm Fundamentals

The Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm is a model-free, deterministic, and off-policy actor-critic method, specifically chosen for environments with continuous state and action spaces like the Twin Rotor Aerodynamic System (TRAS). It uses neural networks for both its actor and critic components, allowing the agent to learn complex mappings from environment states to optimal actions.

Key features include delaying policy updates, using two Q-networks to mitigate overestimation bias, and adding noise to target actions for exploration, all contributing to more stable and reliable learning in continuous control tasks.

Enterprise Process Flow

Observe State

→

Select Action (with noise)

→

Execute Action & Observe s', r, d

→

Store Transition in Replay Buffer

→

Sample Batch from Buffer

→

Compute Target Actions (with noise)

→

Compute Targets (min of two Q-networks)

→

Update Q-functions

→

Update Policy (delayed)

→

Update Target Networks (soft)

Optimized Reward Function Design

The effectiveness of the TD3 agent relies heavily on a carefully designed reward function, combining both dense and sparse rewards. The dense reward penalizes deviations from the reference pitch and azimuth angles (rdense = -c * (Δθ² + Δψ²)), encouraging gradual error reduction. A positive sparse reward is given when deviation errors fall below a threshold (e.g., 0.01 radians), incentivizing precise stabilization.

Additionally, a large sparse penalty is applied if the TRAS angles exceed predefined boundaries (e.g., 60°), preventing the system from entering unstable or unsafe regions. This multi-component reward structure guides the agent towards stable and accurate control while avoiding undesirable states.

Twin Rotor Aerodynamic System (TRAS) Characteristics

The TRAS is a laboratory setup simulating helicopter dynamics, featuring two perpendicular rotors enabling vertical and horizontal plane rotations. It is characterized by non-linear and cross-coupled dynamics, making traditional control challenging. The system's state is defined by pitch angle (θ), pitch angular velocity (ωθ), azimuth angle (ψ), and azimuth angular velocity (ωψ). The control inputs are the voltages applied to the vertical and horizontal rotors.

This complexity necessitates advanced control strategies like TD3, which can learn and adapt to these non-linear behaviors without requiring an explicit mathematical model.

Enhanced Robustness to Disturbances

2.5x More robust to wind disturbances than PID

RL vs. PID: Disturbance Handling

Feature	TD3 (RL Agent)	Conventional PID
System Model	Model-free, learns from interaction	Requires system model (tuned gains)
Adaptability to Non-linearities	High, learns complex dynamics	Limited, fixed gains struggle with non-linearity
Robustness to External Disturbances	Significantly higher, adapts control strategy	Lower, fixed gains prone to oscillations
Training vs. Tuning	Requires extensive training time	Requires heuristic tuning (Ziegler-Nichols)
Overshoot & Settling Time (Disturbed)	Lower overshoot, stable settling	Higher overshoot, oscillations, unstable settling
Real-world Application	Validated via lab setup, learns effectively	Can struggle in real-world with unmodeled dynamics

Advanced ROI Calculator

Estimate the potential financial impact of integrating AI into your operations. Adjust the parameters to see a personalized projection.

Your Industry

Number of Employees (Impacted)

Avg. Weekly Hours on Repetitive Tasks

Avg. Hourly Rate (USD)

Estimated Annual Savings $0

Annual Hours Reclaimed 0

Calculate My Custom ROI

Your AI Implementation Roadmap

A structured approach ensures successful integration and measurable results. Our proven methodology guides you through each critical phase.

Data Collection & Environment Setup

Defining the TRAS environment in MATLAB/Simulink and setting up the state and action spaces for RL agent interaction.

TD3 Agent Training (Stabilization)

Initial training of the TD3 agent to stabilize the TRAS at various pitch and azimuth angles using a combined dense and sparse reward function.

TD3 Agent Training (Trajectory Tracking)

Further training of the agent to track complex trajectories, ensuring adaptability and accuracy across different reference signals.

Disturbance Integration & Robustness Testing

Introduction of the Dryden Wind Turbulence Model to simulate external disturbances and evaluate the RL agent's robustness against them, comparing to PID.

Hardware Implementation & Validation

Deploying the trained RL controller on a physical TRAS laboratory setup to validate its performance and effectiveness in real-world conditions.

Book a Detailed Consultation

Ready to Transform Your Enterprise with AI?

Unlock unparalleled efficiency, innovation, and strategic advantage. Our experts are ready to guide you.

Schedule a Free Strategy Session

Control of a Twin Rotor using Twin Delayed Deep Deterministic Policy Gradient (TD3)

Transforming Enterprise Control with AI

Executive Impact Metrics

Deep Analysis & Enterprise Applications

TD3 Algorithm Fundamentals

Enterprise Process Flow

Optimized Reward Function Design

Twin Rotor Aerodynamic System (TRAS) Characteristics

Enhanced Robustness to Disturbances

RL vs. PID: Disturbance Handling

Advanced ROI Calculator

Your AI Implementation Roadmap

Data Collection & Environment Setup

TD3 Agent Training (Stabilization)

TD3 Agent Training (Trajectory Tracking)

Disturbance Integration & Robustness Testing

Hardware Implementation & Validation

Ready to Transform Your Enterprise with AI?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai