아주 대충,
grad f는 f의 direction of steepest ascent.
그래서 그 부호,sign를 반대로 한
-grad f는 f의 direction of steepest descent?
see also 방향도함수,directional_derivative 델,del,나블라,nabla

근사,approximation법과 비슷한 다음단계로 나아가는 반복적 방법(반복,iteration, iterative method)
최적화하고자 하는(최소점을 찾고자 하는) 함수가 $f$ 라면

$x_{i+1} = x_i - \alpha \frac{df}{dx}(x_i)$

그리고 다변수함수로 일반화(확장)하면

$x_{i+1} = x_i - \alpha \nabla f(x_i)$

(angeloyeo)
저기서 부호가 $-$ 대신 $+$ 로 되면 gradient ascent.
저기서 $\alpha$ 가 알고리즘 속도 조절 parameter. step size 또는 learning rate(학습율,learning_rate).
이 방법의 문제점은 local_minimum 에 빠지는 것.
또 다른 문제점은 해에 근접할수록 $|\nabla f|$ 가 0에 가까워지기 때문에 수렴 속도가 느려진다는 것.
(darkpgmr)

1. gradient descent에서 batch
2. CLEANUP
3. tmp bmks; 읽어서 흡수
4. tmp bmks; del
5. Sub: 확률적기울기하강 stochastic gradient descent (SGD)
6. Sub: adaptive gradient, adagrad - adaptive_gradient
7. subgradient descent? - no, subgradient method.
8. proximal gradient descent
9. natural gradient descent
10. rel. mirror descent
11. tmp video en

[edit]

1. gradient descent에서 batch ¶

배치,batch
// from https://velog.io/@arittung/DeepLearningStudyDay8#04-배치-사이즈-batch-size
{
batch는 전체 자료집합,dataset.
batch size는 단일 반복에서 gradient를 계산하는 데 사용하는 data의 총 개수.
mini batch는 1 batch size에 해당하는 dataset.

batch size가 너무

크면 - 느리고, 메모리부족 발생할 위험
작으면 - 너무 잦은 weight update - 훈련이 불안정해짐

(batch) gradient descent는 전체 dataset을 사용하므로 - 느리므로, 표본,sample을 무작위로 선택하는 대안 방법인
stochastic gradient descent (SGD)가 있음.
}

[edit]

2. CLEANUP ¶

KWs:
기울기,gradient
가중값,weight
비용함수,cost_function
손실함수,loss_function
신경망,neural_network
단변량선형회귀,univariate_linear_regression - up: 선형회귀,linear_regression

https://wikidocs.net/4213

Video
3Blue1Brown https://youtu.be/IHZwWFHWa-w 경사 하강, 신경 네트워크가 학습하는 방법
tmp: Gradient Descent, Step-by-Step https://www.youtube.com/watch?v=sDv4f4s2SB8

tmp bmks ko
https://bskyvision.com/411 (easy!)
https://darkpgmr.tistory.com/133

tmp bmks en
https://calculus.subwiki.org/wiki/Gradient_descent

Links ko
http://www.aistudy.com/math/gradient_descent.htm 보면 역전파,backpropagation와 관련.
https://angeloyeo.github.io/2020/08/16/gradient_descent.html

aka : steepest_descent
대충, 최소가 되는 지점을 더듬어 찾아가는 방법?
안개로 주위가 보이지 않는 산에서 주위를 더듬으면서 낮은 곳으로 내려가는..
미분계수가 0인 지점을 찾으면 되지 않느냐? 라고 생각할 수 있는데, 근을 계산하기 어려운/불가능한 경우가 많아, 이 방법이 있는 것이다.
근데 global_minimum이 아닌 local_minimum에 빠져서 헤어나오지 못할 가능성이, 즉 optimum(최적해,optimal_solution < 해,solution, rel. 최적화,optimization)에 도달하지 못할 가능성이.. - greedy_algorithm과 비슷한 문제점

최적화,optimization 문제 해결방법의 하나?

다른 것은 라그랑주_곱셈자,Lagrange_multiplier... 비교?

[edit]

3. tmp bmks; 읽어서 흡수 ¶

이하 http://google.com/search?q=gradient descent 번역 검색결과 중
https://hyunw.kim/blog/2017/11/01/Optimization.html
{
순서대로
batch_gradient_descent BGD
stochastic_gradient_descent SGD
mini-batch_gradient_descent
momentum (운동량,momentum과 별도의 모멘텀,momentum 페이지를 만들까)
RMSProp
Adam ... Adagrad + Momentum ??
이상 글에서 다루는 키워드들만 간단히 나열하였음
}

[edit]

4. tmp bmks; del ¶

3Blue1Brown series S3 E2
경사 하강, 신경 네트워크가 학습하는 방법 | 심층 학습, 2장
https://youtu.be/IHZwWFHWa-w

[edit]

5. Sub: 확률적기울기하강 stochastic gradient descent (SGD) ¶

확률적기울기하강 stochastic_gradient_descent SGD
확률적경사하강, stochastic gradient descent (SGD)
tmp bmks ko
http://sanghyukchun.github.io/74/ 의 50% 정도의 "Stochastic Gradient Descent" 문단 참조.
대충, 신경망,neural_network의 weight_parameter_update할 때 모두 계산하면(full batch) 너무 비효율적이므로 SGD라는 확률적 방법을 쓴다는 얘기. 이 때 'mini batch'를 만든다는.

확률적 경사 하강법

[edit]

6. Sub: adaptive gradient, adagrad - adaptive_gradient ¶

etc. 참조: https://twinw.tistory.com/247

[edit]

7. subgradient descent? - no, subgradient method. ¶

subgradient_method

볼록최적화 문제의 해법 중 하나. 반복법.

https://convex-optimization-for-all.github.io/contents/chapter08/ 를 보면 항상 하강하는 gradient descent와 달리 항상 하강하지는 않으므로 subgradient descent라고 부르지는 않는다 함.

번역?
subgradient not in kms ... https://www.kms.or.kr/mathdict/list.html?key=kname&keyword=subgradient
as of 2023-04-22.
subderivative를 wpko에선 '하방미분'이라 번역했던데 그럼 이건 '하방기울기' and 하방기울기법?
sub-를 보통 부-라고 번역하던데 그럼 이건 '부미분' and 부미분법? (??)

식은 gradient descent와 비슷하며 역시 반복,iteration법임.

projected_subgradient_method
{
projected subgradient method
번역?

...

projected.subgradient.method

projected.subgradient.method
}

rel.
subgradient
{
번역?

not in kms ... https://www.kms.or.kr/mathdict/list.html?key=ename&keyword=subgradient
as of 2023-04-22

subgradient

rel.
subderivative and subdifferential
{
번역?
'subd' not in kms ... https://www.kms.or.kr/mathdict/list.html?key=kname&keyword=subd
as of 2023-04-22

보통 sub-는 부- 로 번역하는데
wpko에선 '하방미분'이라 번역했음.

하방미분

Subderivative

저건 볼록해석,convex_analysis 볼록함수,convex_function 쪽에서
미분,derivative 미분,differential 을 일반화/확장한 개념인듯. (qqq 미분가능하지 않은 (i.e. 미분가능성,differentiability 없는) 볼록함수,convex_function도 다루기 위해??)

rel. 볼록최적화,convex_optimization

}

Subgradient_method

subgradient.method

[edit]

8. proximal gradient descent ¶

proximal_gradient_descent
{
proximal gradient descent

...

proximal_gradient_descent

proximal_gradient_descent
}

[edit]

9. natural gradient descent ¶

natural gradient descent
natural_gradient_descent

via
"자연스러운 경사하강법" 정보기하학과 머신러닝 (3): 거울 하강법 https://horizon.kias.re.kr/24237/ 앞부분

...

natural gradient descent

[edit]

10. rel. mirror descent ¶

mirror_descent
거울하강 ?

기울기하강,gradient_descent and
multiplicative_weight ... multiplicative_weight_update method ??
{
not in kms (2023-04-28) ... https://www.kms.or.kr/mathdict/list.html?key=ename&keyword=multiplicative weight
그렇다면 https://www.kms.or.kr/mathdict/list.html?key=ename&keyword=multiplicative

곱셈, 곱, 곱셈적, ...중에 뭘로 번역?

rel. 곱셈,multiplication?

가중값,weight

...

multiplicative_weight

}의 일반화?

(원래공간 primal_space에 대한) 쌍대공간,dual_space을 이용한 갱신
반복적 최적화 방법 (반복,iteration 최적화,optimization)

rel.
Bregman_divergence

via
정보기하학과 머신러닝: 거울 하강법 https://horizon.kias.re.kr/24237/ 50%쯤

Mirror_descent = https://en.wikipedia.org/wiki/Mirror_descent

...

mirror descent

[edit]

11. tmp video en ¶

Solve any equation using gradient descent - YouTube
https://www.youtube.com/watch?v=0kFydRfswU8

....

gradient descent

Twins:
https://developers.google.com/machine-learning/glossary?hl=ko#gradient-descent

http://www.aistudy.com/math/gradient_descent.htm

ML에서 흔히 사용되는 gradient descent 는 크게 두가지 형태, 즉 batch 와 on-line 이 있다, 그 둘의 절충인 mini-batches가 있다
batch_gradient_descent
on-line_gradient_descent

GD의 types는 ibm에 의하면

batch
stochastic
mini-batch

https://www.ibm.com/topics/gradient-descent

경사하강법

https://calculus.subwiki.org/wiki/Gradient_descent

Compare: 뉴턴_방법,Newton_method

이름에 대해
단변수일 경우 기울기,gradient를 정의할 수 없으므로, 도함수 하강(derivative descent)이라 명명하는 편이 나을지도.
"Especially, for the single-variable cases, probably I would say 'derivative descent algorithm' because we cannot define the gradient in this single-variable case." (KU정태수 ㄷㄱㄱ 14-2 18:00)

AKA 경사강하, 경사하강 (gradient descent)
AKA 경사하강법(gradient descent method/algorithm)
(기울기|경사)(하강|강하)(법)

Opp: 기울기상승,gradient_ascent

Up: 기계학습,machine_learning 기울기,gradient? 하강/강하 descent(writing)

VeryGoodWiki

This wiki is very good.
Minus sign: −

기울기하강,gradient_descent

Contents

1. gradient descent에서 batch ¶

2. CLEANUP ¶

3. tmp bmks; 읽어서 흡수 ¶

4. tmp bmks; del ¶

5. Sub: 확률적기울기하강 stochastic gradient descent (SGD) ¶

6. Sub: adaptive gradient, adagrad - adaptive_gradient ¶

7. subgradient descent? - no, subgradient method. ¶

8. proximal gradient descent ¶

9. natural gradient descent ¶

10. rel. mirror descent ¶

11. tmp video en ¶