Tag: inverted pendulum

Posted 2024-05-20Updated 2024-12-29Project3 minutes read (About 439 words)

Reinforcement Learning for Inverted Pendulum Control

Overview

This project involved applying Reinforcement Learning (RL) algorithms to control both single and double inverted pendulum systems. Using algorithms such as Q-Learning, DQN, DDPG, and PPO, we implemented controllers to achieve swing-up and stabilization tasks. The project explored the dynamic complexities of inverted pendulum systems and highlighted the effectiveness of RL techniques for non-linear control problems.

Results

Single Inverted Pendulum:
- Achieved a 100% success rate for swing-up and stabilization tasks under ideal conditions.
- Maintained a 90% success rate under noisy conditions with a simulation time of 30 seconds.
Double Inverted Pendulum:
- Successfully stabilized the pendulum but encountered challenges in achieving swing-up with model-free RL methods.
Performance Metrics:
- Trained RL models for swing-up and stabilization tasks in under 50,000 episodes.
- Demonstrated the effectiveness of custom reward functions for dynamic control tasks.

Report PDF | GitHub (Chinese README)

Stabilization of the double inverted pendulum

Technical Details

Algorithms Applied:
- Q-Learning and DQN: Explored discrete action spaces for initial experiments.
- A2C and PPO: Achieved robust performance for stabilization tasks in continuous action spaces.
- DDPG: Provided smooth control for swing-up tasks with deterministic policy gradients.
Custom Toolkit:
- Developed RL agents from scratch using PyTorch, including functions for initialization, model updates, and action sampling.
- Designed visualization tools to monitor reward curves and training metrics.
Reward Design:
- Swing-up Task: Rewarded higher pendulum angles while penalizing velocity at the peak.
- Stabilization Task: Encouraged minimal deviation from the vertical position and low angular velocity.

Challenges

Swing-Up Task:
- Coordinating motion during the throw-and-catch process was challenging, especially under noisy conditions.
- Solution: Implemented collaborative agents for swing-up and stabilization, with separate reward functions for each sub-task.
Double Inverted Pendulum:
- Model-free RL struggled with the system’s chaotic behavior.
- Solution: Transitioned to model-based approaches like PILCO for better state-action-reward predictions.

Reflection and Insights

This project deepened my understanding of reinforcement learning and its application to real-world control problems. It highlighted the importance of tailored reward functions and robust algorithm selection for dynamic systems. The challenges in handling chaotic behaviors inspired further exploration into model-based strategies to enhance RL performance.

Team and Role

Team: Worked collaboratively with two teammates on RL model implementation and evaluation.
My Role:
- Focused on the single inverted pendulum tasks, including algorithm selection and reward function design.
- Developed custom RL agents using PyTorch, optimizing hyperparameters for efficient training.
- Led the implementation of the collaborative “throw-catch” process for swing-up tasks.

Overview

Results

Technical Details

Challenges

Reflection and Insights

Team and Role

Links

Categories

Tags

Recents

Archives