Reinforcement Learning for Inverted Pendulum Control

Overview

This project involved applying Reinforcement Learning (RL) algorithms to control both single and double inverted pendulum systems. Using algorithms such as Q-Learning, DQN, DDPG, and PPO, we implemented controllers to achieve swing-up and stabilization tasks. The project explored the dynamic complexities of inverted pendulum systems and highlighted the effectiveness of RL techniques for non-linear control problems.

Results

  • Single Inverted Pendulum:
    • Achieved a 100% success rate for swing-up and stabilization tasks under ideal conditions.
    • Maintained a 90% success rate under noisy conditions with a simulation time of 30 seconds.
  • Double Inverted Pendulum:
    • Successfully stabilized the pendulum but encountered challenges in achieving swing-up with model-free RL methods.
  • Performance Metrics:
    • Trained RL models for swing-up and stabilization tasks in under 50,000 episodes.
    • Demonstrated the effectiveness of custom reward functions for dynamic control tasks.

Report PDF | GitHub (Chinese README)

Swing-up from a stationary state
Swing-up under noisy conditions
Stabilization of the double inverted pendulum

Technical Details

  • Algorithms Applied:
    • Q-Learning and DQN: Explored discrete action spaces for initial experiments.
    • A2C and PPO: Achieved robust performance for stabilization tasks in continuous action spaces.
    • DDPG: Provided smooth control for swing-up tasks with deterministic policy gradients.
  • Custom Toolkit:
    • Developed RL agents from scratch using PyTorch, including functions for initialization, model updates, and action sampling.
    • Designed visualization tools to monitor reward curves and training metrics.
  • Reward Design:
    • Swing-up Task: Rewarded higher pendulum angles while penalizing velocity at the peak.
    • Stabilization Task: Encouraged minimal deviation from the vertical position and low angular velocity.

Challenges

  • Swing-Up Task:
    • Coordinating motion during the throw-and-catch process was challenging, especially under noisy conditions.
    • Solution: Implemented collaborative agents for swing-up and stabilization, with separate reward functions for each sub-task.
  • Double Inverted Pendulum:
    • Model-free RL struggled with the system’s chaotic behavior.
    • Solution: Transitioned to model-based approaches like PILCO for better state-action-reward predictions.

Reflection and Insights

This project deepened my understanding of reinforcement learning and its application to real-world control problems. It highlighted the importance of tailored reward functions and robust algorithm selection for dynamic systems. The challenges in handling chaotic behaviors inspired further exploration into model-based strategies to enhance RL performance.

Team and Role

  • Team: Worked collaboratively with two teammates on RL model implementation and evaluation.
  • My Role:
    • Focused on the single inverted pendulum tasks, including algorithm selection and reward function design.
    • Developed custom RL agents using PyTorch, optimizing hyperparameters for efficient training.
    • Led the implementation of the collaborative “throw-catch” process for swing-up tasks.

eMeritBox

Overview

The eMeritBox project combines traditional Buddhist cultural elements with modern technology, creating an interactive gravity-sensing electronic donation box. Built with a Raspberry Pi, the system integrates PWM control, motion sensing, and a web-based interface to modernize the concept of a traditional donation box. This innovative design bridges traditional practices with digital solutions, offering a seamless and meaningful user experience.

Results

  • System Features:
    • Automatic wooden fish strikes with real-time donation ball accumulation.
    • Gravity-sensing motion control for dynamic donation ball movement.
    • Dual operational modes: manual and auto donation switching.
  • Achievements:
    • Successfully implemented a complete hardware-software system using Raspberry Pi and Flask.
    • Developed reusable classes for matrix display and gravity sensing, enabling future adaptations.

GitHub (Chinese README) | Presentation PDF (Chinese version)

eMeritBox system overview

eMeritBox functional demonstration

eMeritBox functional demonstration

Technical Details

  • System Architecture:
    • Controller: Raspberry Pi handles signal processing, PWM control, and web server operations.
    • Modules:
      • MG-90 servo for wooden fish strikes.
      • GY-25 gyroscope for motion sensing.
      • MAX7219 matrix display for donation ball visualization.
  • Key Functionalities:
    • Gravity-Sensing Donation: Balls dynamically move based on the box’s tilt angle.
    • Flask Web Server: Supports browser-based remote operation of wooden fish strikes.
    • Matrix Display: Visualizes donation balls in real-time, reflecting their position and state.
  • Software Implementation:
    • Developed Python classes for modular control:
      • GY25Ctrl for gyroscope data processing.
      • MatrixCtrl for donation ball display updates.
      • BGMPlayer for background music playback.
    • Solved hardware conflicts by reconfiguring UART ports and enabling additional I²C channels.

Challenges

  • UART hardware resource conflicts: Raspberry Pi’s default UART settings caused resource contention.
    • Solution: Re-mapped hardware and mini UARTs (ttyAMA0 ↔ ttyS0) and configured multiple UART ports (+ttyAMA1, 2, …) for simultaneous operation.
  • I²C channel conflicts: Dual I²C channels on Raspberry Pi conflicted with camera usage.
    • Solution: Disabled the camera function and enabled additional I²C channels with dtparam=i2c_vc=on.
  • SPI and I²C competing with UART ports: Enabling SPI and I²C modules on the Raspberry Pi caused UART port contention.
    • Solution: Adjusted hardware configurations to optimize resource allocation.
  • Synchronization of Multiple Modules: Managing the simultaneous operation of PWM, matrix display, and motion sensing.
    • Solution: Utilized multi-threading to ensure real-time responsiveness and system stability.

Reflection and Insights

The eMeritBox represents a modernized approach to traditional Buddhist donation practices, seamlessly integrating spiritual elements with advanced technology. By reimagining the donation process with dynamic visuals and interactive controls, this project demonstrates the potential of technology to preserve and innovate cultural traditions. The challenges in hardware-software integration further highlighted the importance of modular design and multi-threaded programming in building robust embedded systems.