#noindex
자료는 scatterplot으로 표시할 수 있는데 이것을 회귀분석regression analysis을 통해 경향(trend)를 반영하는 [[곡선,curve]] 즉 회귀곡선regression curve을 찾는다. 보통 사용되는 방법은 [[최소제곱,least_square|least square]]s. (minimizing the sum of the squares of the vertical distances between the data points and the curve) (AKA [[최적화,optimization|optimization]] problem)
## from Thomas Calculus Early Trans. 13e p. 33
----
[[자료,data]]로 함수를 만든 뒤 이 함수를 피팅하는 작업. 회귀분석에서 함수에 피팅할 때는 [[잔차,residual]]가 최소화되도록 함수를 조정하는데, 가장 일반적인 방법은 [[최소제곱법,least_square_method,LSM]](curr see [[최소제곱,least_square]])이다.
잔차 제곱의 합을 e라 하면
 $e=\sum_{i=1}^{n}\left(y_i-f(x_i)\right)^2$

[[회귀분석,regression_analysis]]은 [[예측,prediction]]을 가능하게 한다.
----
[[Date(2022-09-26T18:45:52)]]
https://process-mining.tistory.com/ 보고 대충 적음. 나중에 분류에 참조.

[[선형회귀,linear_regression]]
{
(대충) scatterplot(과 가장 잘 맞는) / (을 가장 잘 설명해주는) 일차함수
> ''y=Wx+b''
의 $W,b$ 값을 찾아내는??

https://process-mining.tistory.com/125
least_square - 가장 기본적인 선형회귀인 least_squares 는 단순히 RSS를 최소화하는 것인데, outlier에 민감하다.
robust_linear_regression - outlier에 민감하지 않은. curr see https://process-mining.tistory.com/130
 Google:Laplace_regression - Laplace_distribution 을 [[가능도,likelihood]]로 사용.
 Google:Huber_regression - Google:Huber_loss_function 을 최소화하는. 저것은 error의 절대값이 얼마 이하이면 L2_error , 얼마 초과이면 L1_error 형태 - 즉 error가 작을 때는 L1 error의 단점을 피하고, 클 때는 L2 error의 단점을 피하는.
}

ridge_regression - RSS를 최소화하면서 L2_norm까지 최소화. curr see https://process-mining.tistory.com/129

<<tableofcontents>>

= 회귀분석 regression analysis =
[[회귀분석,regression_analysis]]

''TBD. 회귀분석을 이 페이지([[회귀,regression]])에 적을지 아님 별도 페이지가 필요한지''

http://blog.naver.com/mykepzzang/220933439872
{
* [[변수,variable]]간의 의존관계(dependency)를 파악
* 경향성(tendency)을 파악
하여 [[예측,prediction]]을 가능하게 함.
}

단일 회귀 분석 (독립변수가 하나)
 $y=\alpha+\beta x+\epsilon$
다중 회귀 분석 (독립변수가 둘 이상)
 $ y=\alpha+\beta x_1+\gamma x_2+\epsilon$
다다(p98)에 의하면, 다중회귀분석은 독립 변수가 여러 개이므로 단순(단일?)회귀분석처럼 2차원 그래프로 시각화하기 어렵다. 이 때 2차원 평면상에 점을 찍을 수 있도록 [[주성분분석,principal_component_analysis,PCA]]을 이용한다.
독립 변수가 데이터의 수보다 훨씬 많으면 주성분 분석의 차원 감소를 이용한 [[주성분회귀,principal_component_regression,PCR]]와 이를 개선한 [[Partial_least_squares,PLS]]회귀를 활용할 수 있다.

[[편차,deviation]]를 사용한다. 

[[독립변수,independent_variable]]는 각각 [[선형독립,linear_independence]]이어야 한다는 것이 회귀분석의 전제인데, 독립 변수가 늘면 독립 변수들 사이의 상관관계가 개입해 결과에 영향을 준다. 이를 [[다중공선성,multicollinearity]] 문제라고 한다.
{
관련:
[[회귀,regression]]
[[회귀분석,regression_analysis]]

Twins:
[[https://terms.naver.com/entry.nhn?docId=3404410&cid=40942&categoryId=32211 두산백과]]
WpKo:다중공선성
 "독립변수들간에 정확한 선형관계가 존재하는 '''완전공선성'''의 경우와 독립변수들간에 높은 선형관계가 존재하는 '''다중공선성'''으로 구분하기도 한다."
WpEn:Multicollinearity
}
해결법은 PLS회귀와 L1정규화(Lasso)등이 있다.

다항식 회귀의 경우 차수를 무조건 많이 올린다고 해서 좋은 게 아니다. 이미 주어진 데이터에는 잔차가 0으로 근접하는 결과가 나올 수 있지만 앞으로 수집할 데이터가 크게 벗어날 수 있다. 이것을 [[과적합,overfitting]]이라고 한다.

[[WpKo:회귀_분석]]

[[WpEn:Regression_analysis]]
 = https://en.wikipedia.org/wiki/Regression_analysis

= 선형회귀 =
[[선형회귀,linear_regression]]
단순선형회귀, 다중선형회귀 ... 등도 생길텐데 여기보단 (section interface 별로임) -> 나중에 맨 위에 sub tree형식으로.

= [[로지스틱회귀,logistic_regression]] =
로지스틱 모델의 일반식
 $y=\frac{e^x}{1+e^x}$
로짓 함수(logit)는 $(0,1)\mapsto(-\infty,\infty)$ 로 변환하는 함수로, 로지스틱 함수의 역함수

로지스틱 회귀에 쓰이는 [[손실함수,loss_function]] : 로그 손실(log_loss)

links ko
https://ratsgo.github.io/machine%20learning/2017/04/02/logistic/
~~[[https://developers.google.com/machine-learning/glossary?hl=ko#%EB%A1%9C%EC%A7%80%EC%8A%A4%ED%8B%B1-%ED%9A%8C%EA%B7%80logistic-regression 머신러닝 용어집: 로지스틱 회귀(logistic regression)]]~~
https://developers.google.com/machine-learning/glossary?hl=ko#logistic-regression

rel. [[시그모이드함수,sigmoid_function]] - curr at [[함수,function#s-14]]

----
[[WpKo:로지스틱_회귀]]
[[WpSimple:Logistic_regression]]
[[WpEn:Logistic_regression]]
not in mathworld([[Date(2022-01-11T08:51:55)]]); 검색결과: https://mathworld.wolfram.com/search/?query=logistic+regression&x=0&y=0

http://mlwiki.org/index.php/Logistic_Regression

= 가중 회귀분석 =
p.105
최소제곱법은 특잇값(singular value)에 취약하다는 약점이 있다.

LOWESS,locally_weighted_scatterplot_smoothing (see [[WpEn:Local_regression]]) 분석은 가중회귀 함수를 써서 평활화( WpEn:Smoothing )를 실행한 회귀 식 도출법. 

= LOESS, LOWESS =
[[local_regression]]에 작성중

= [[스플라인,spline]] =
regression with polynomial basis

= 영단어 regression의 다른 뜻 (통계 밖) =
== software development ==
regression_testing
https://everything2.com/title/regression+testing

= 참조한 서적 =
다다 사토시, 처음 배우는 인공지능, 한빛미디어

----
MKLINK
[[수치해석,numerical_analysis]]

----
https://everything2.com/title/Regression
{
OLS regression = ordinary_least_squares ([[최소제곱,least_square]]에 작성중) regression ... 
}
https://mathworld.wolfram.com/Regression.html (short)