Various global and local optimization algorithms, as well as many practical cases. Besides, this repository aslo uses common language and analogy to explain the thinking of various algorithms. —— Still Updating!!
Contents: common local optimization and global optimization algorithms.
Tips: - The speed of gradient descent algorithm is slow, the number of iterations is large, and the final result is approximate; - Newton method uses the second-order Taylor expansion approximation of the function, converging very fast! - Conjugate gradient method is a compromise between gradient descent and Newton method! Having the convergence rage of Newton method and doesn't need the second derivative! —— Recommend!!
Monte-Carlo(MC): Choose the best one based on a large number of samples. For example, There are 1000 apples in a basket, and you take one out at random every time. Then, you take another one out, comparing it with the previous one and keep the bigger one. Follow this loop. The more times you sample, the more likely you are able to pick the largest apple! But unless you take out all 1,000 apples, you never know for sure if the one you're keeping is the biggest one. You know, even if you've got 999 of them, the last one still has the possibility to be the biggest apple! Advantage: As long as there is a sufficient number of sampling times, we can certainly get a good result, which means that the algorithm must converge. Disadvantage: need a lot of sampling!
Simulated-annealing(SA): This algorithem gradually knows what its goal is! And it keeps moving toward a clearer goal, less and less distracted by temptation. For example, in order to find out the highest mountain on earth, a rabbit doesn't have the right strategy at the beginning. So it jumps at random for a long time. During the trying time, it may go up to high places, step into flat ground or gully. But as the time passes by, this rabbit wakes up and jumps directly to the highest destination, and finally reaches Mount Everest. Advantage: The global optimal solution can be found under the finite number of iterations. Disadvantage: For some very complex functions, such as having many local extremum points, if the initial value is not found well, it is very likely that the global optimal solution cannot be found eventually.
Particle-swarm optimization(PSO): Information is shared! Search as a team! Team members share information and make progress together to avoid individuals have shortsightedness and lack of global consciousness. For example, just like in Go Game, focusing only on one corner of the battle does not necessarily lead to the final victory. You should look at the whole board and think of all the pieces as a team. Advantage: Better than SA, and can be computed in parallel. —— Recommend!!
Tips: - In complex functions(many minimum traps), such as the Schaffer function, MC and PSO have good results, but SA is easy to be caught in traps. - PSO has faster computing speed than MC, and has good parallelization characteristics! Easy to change it to a parallel program. - If the actual function is not too complicated (unlike Schaffer), PSO is a good and economical choice! - MC works for any problem! Yes, it can find out global optimality for any complex functions as long as doing enough sampling(it is easy to conclude from the idea of this method)!
Ant-colony optimization(ACO): Like PSO, the ACO also relys on the strength of the teamwork to find the final target! What's special about the ACO is that its 'pheromones evaporate', this concept is unique!
Tips: - When a problem/function is more complex(many local extreme traps), the local minimum points, which are very close to the global optimal point, are the most confusing! SA, PSO, ACO are all likely to fall into these traps, and this problem cannot be solved easily just by adjusting superparameters. The only way to get out of this trap in the current model is to go back and run the program again! Therefore, the choice of the initial value/location/point is very critical!!
内容：常见全局最优、局部最优算法合集 + 思路总结
注意： - 梯度下降算法速度较慢、迭代次数较大，并且最后的结果是近似值； - 牛顿法利用函数的二阶泰勒展开近似，可以一步到位（收敛很快）！并且结果的精度很高！缺点是需要用到海森矩阵，即函数的二阶导！ - 共轭梯度法是介于梯度下降和牛顿法之间的折中方法，既有牛顿法的收敛速度，又不需要用到函数的二阶导！推荐这种方法！
注意： - 当一个问题/函数较为复杂时(有很多局部极值陷阱)，往往离全局最优“较邻近”的局部极小值点们最具有迷惑性！模拟退火、粒子群、蚁群这三种算法都有可能出现走不出这“最后一关”的情况！这是调参数所不能解决的问题了！唯一可以保证在当前模型下走出这样的陷阱的办法就是重新走。总结：“行百里者半九十”，最后的路最难走。