기울기하강,gradient_descent (rev. 1.4)

아주 대충,
grad f는 f의 direction of steepest ascent.
그래서 그 부호,sign를 반대로 한
-grad f는 f의 direction of steepest descent?
see also 방향도함수,directional_derivative 델,del,나블라,nabla

근사,approximation법과 비슷한 다음단계로 나아가는 반복적 방법(반복,iteration, iterative method)
최적화하고자 하는(최소점을 찾고자 하는) 함수가 $f$ 라면

$x_{i+1} = x_i - \alpha \frac{df}{dx}(x_i)$

그리고 다변수함수로 일반화(확장)하면

$x_{i+1} = x_i - \alpha \nabla f(x_i)$

(angeloyeo)

1. tmp bmks; 읽어서 흡수
2. tmp bmks; del
3. Sub: 확률적기울기하강 stochastic gradient descent (SGD)
4. Sub: adaptive gradient, adagrad - adaptive_gradient

KWs:
기울기,gradient
가중값,weight
비용함수,cost_function
손실함수,loss_function
신경망,neural_network
단변량선형회귀,univariate_linear_regression - up: 선형회귀,linear_regression

https://wikidocs.net/4213

Video
3Blue1Brown https://youtu.be/IHZwWFHWa-w 경사 하강, 신경 네트워크가 학습하는 방법
tmp: Gradient Descent, Step-by-Step https://www.youtube.com/watch?v=sDv4f4s2SB8

tmp bmks ko
https://bskyvision.com/411 (easy!)
https://darkpgmr.tistory.com/133

tmp bmks en
https://calculus.subwiki.org/wiki/Gradient_descent

Links ko
http://www.aistudy.com/math/gradient_descent.htm 보면 역전파,backpropagation와 관련.
https://angeloyeo.github.io/2020/08/16/gradient_descent.html

aka : steepest_descent
대충, 최소가 되는 지점을 더듬어 찾아가는 방법?
안개로 주위가 보이지 않는 산에서 주위를 더듬으면서 낮은 곳으로 내려가는..
미분계수가 0인 지점을 찾으면 되지 않느냐? 라고 생각할 수 있는데, 근을 계산하기 어려운/불가능한 경우가 많아, 이 방법이 있는 것이다.
근데 global_minimum이 아닌 local_minimum에 빠져서 헤어나오지 못할 가능성이, 즉 optimum(최적해,optimal_solution < 해,solution, rel. 최적화,optimization)에 도달하지 못할 가능성이.. - greedy_algorithm과 비슷한 문제점

최적화,optimization 문제 해결방법의 하나?

다른 것은 라그랑주_곱셈자,Lagrange_multiplier... 비교?

[edit]

1. tmp bmks; 읽어서 흡수 ¶

이하 http://google.com/search?q=gradient descent 번역 검색결과 중
https://hyunw.kim/blog/2017/11/01/Optimization.html
{
순서대로
batch_gradient_descent BGD
stochastic_gradient_descent SGD
mini-batch_gradient_descent
momentum (운동량,momentum과 별도의 모멘텀,momentum 페이지를 만들까)
RMSProp
Adam ... Adagrad + Momentum ??
이상 글에서 다루는 키워드들만 간단히 나열하였음
}

[edit]

2. tmp bmks; del ¶

3Blue1Brown series S3 E2
경사 하강, 신경 네트워크가 학습하는 방법 | 심층 학습, 2장
https://youtu.be/IHZwWFHWa-w

[edit]

3. Sub: 확률적기울기하강 stochastic gradient descent (SGD) ¶

확률적기울기하강 stochastic_gradient_descent SGD
확률적경사하강, stochastic gradient descent (SGD)
tmp bmks ko
http://sanghyukchun.github.io/74/ 의 50% 정도의 "Stochastic Gradient Descent" 문단 참조.
대충, 신경망,neural_network의 weight_parameter_update할 때 모두 계산하면(full batch) 너무 비효율적이므로 SGD라는 확률적 방법을 쓴다는 얘기. 이 때 'mini batch'를 만든다는.

[edit]

4. Sub: adaptive gradient, adagrad - adaptive_gradient ¶

etc. 참조: https://twinw.tistory.com/247

Twins:
https://developers.google.com/machine-learning/glossary?hl=ko#gradient-descent

http://www.aistudy.com/math/gradient_descent.htm

ML에서 흔히 사용되는 gradient descent 는 크게 두가지 형태, 즉 batch 와 on-line 이 있다, 그 둘의 절충인 mini-batches가 있다
batch_gradient_descent
on-line_gradient_descent

경사하강법

Compare: 뉴턴_방법,Newton_method

이름에 대해
단변수일 경우 기울기,gradient를 정의할 수 없으므로, 도함수 하강(derivative descent)이라 명명하는 편이 나을지도.
"Especially, for the single-variable cases, probably I would say 'derivative descent algorithm' because we cannot define the gradient in this single-variable case." (KU정태수 ㄷㄱㄱ 14-2 18:00)

AKA 경사강하, 경사하강 (gradient descent)
AKA 경사하강법(gradient descent method/algorithm)
(기울기|경사)(하강|강하)(법)

Opp: 기울기상승,gradient_ascent

Up: 기계학습,machine_learning 기울기,gradient? 하강/강하 descent(writing)

VeryGoodWiki

This wiki is very good.
Minus sign: −

기울기하강,gradient_descent

기울기하강,gradient_descent (rev. 1.4)

Contents

1. tmp bmks; 읽어서 흡수 ¶

2. tmp bmks; del ¶

3. Sub: 확률적기울기하강 stochastic gradient descent (SGD) ¶

4. Sub: adaptive gradient, adagrad - adaptive_gradient ¶