In this paper, we treat optimization problems as a kind of reinforcement learning problems regarding an optimization procedure for searching an optimal solution as a reinforcement learning procedure for finding the best policy to maximize the expected rewards. This viewpoint motivated us to propose a Q-learning-based swarm optimization (QSO) algorithm. The proposed QSO algorithm is a population-based optimization algorithm which integrates the essential properties of Q-learning and particle swarm optimization. The optimization procedure of the QSO algorithm proceeds as each individual imitates the behavior of the global best one in the swarm. The best individual is chosen based on its accumulated performance instead of its momentary performance at each evaluation. Two data sets including a set of benchmark functions and a real-world problem—the economic dispatch (ED) problem for power systems—were used to test the performance of the proposed QSO algorithm. The simulation results on the benchmark functions show that the proposed QSO algorithm is comparable to or even outperforms several existing optimization algorithms. As for the ED problem, the proposed QSO algorithm has found solutions better than all previously found solutions.
- Particle swarm optimization
- Swarm intelligence