Sub:
분류정리,classification_theorem - 정리,theorem - 수학,math얘기

통계적분류,statistical_classification - writing - 통계,statistics적

Statistical_classification

Sim:
범주,category

1. Cmp clustering

2. writing, mklink

3. model이 얼마나 잘 작동하는지, 즉 분류기가 분류한 것과 실제 정답이 얼마나 일치하는지, 에 대한 통계적 수치(? 측도? 척도?)들 <- 적절한 제목으로.

3.1. accuracy
3.2. precision
3.3. ...tbw

4. Cover's theorem

[edit]

1. Cmp clustering ¶

not sure chk:

	tag/label/class/category ...가(즉 그 집합이)	문서,document에 대해
분류 classification	미리 정해져 있음?	문서 분류 : kNN, NB, SVM, random_forest 등
무리짓기? 덩이짓기? ?? 클러스터링 .... clustering	미리 정해져 있지 않음?	문서 클러스터링 : k-Means, DBSCAN

clustering =,clustering {

}

분류 클러스터링 차이

분류 클러스터링 비교

분류 클러스터링 차이 비교

classification vs clustering

classification vs clustering difference

[edit]

2. writing, mklink ¶

Sub:
이진분류,binary_classification
multiclass_classification (multi-class_classification ?)

https://developers.google.com/machine-learning/glossary?hl=ko#multi-class-classification

multi-label_classification and multi-output_classification
선형분류,linear_classification - 선형성,linearity
이진선형분류,binary_linear_classification ???
이미지분류,image_classification
linear_classification 선형분류

선형_분류 = https://ko.wikipedia.org/wiki/선형_분류

linear_classifier 선형분류기

Linear_classifier = https://en.wikipedia.org/wiki/Linear_classifier

quadratic_classification

quadratic_classifier

Quadratic_classifier = https://en.wikipedia.org/wiki/Quadratic_classifier

classification algorithms

서포트벡터머신,support_vector_machine,SVM
결정트리,decision_tree
...

Terms: / rel.
class
label
ground truth
confidence score
top-1 error, top-5 error - 오류,error 오차,error 중에 뭐에 가까운가? or both?
IU, IoU - intersection over union
bounding_box
분류기,classifier
{
(단어)

classifier syn.

classificator

Classifier

(기계학습의 분류기)

Classifier_(machine_learning)
}
clustering
threshold
{
TBD: 이게 분류에만 쓰이는 개념은 아닌데 어떤 pagename is best?

Classifiers typically employ some kind of a threshold.
Changing the threshold will affect the performance of the classifier.
Receiver Operating Characteristic (ROC) curves allow us to evaluate the performance of a classifier using different thresholds.
(Kwak, Slide 1, p76)
}
패턴인식,pattern_recognition

compare:

자료,data
자료집합,dataset
data_labeling or labeling

[edit]

3. model이 얼마나 잘 작동하는지, 즉 분류기가 분류한 것과 실제 정답이 얼마나 일치하는지, 에 대한 통계적 수치(? 측도? 척도?)들 <- 적절한 제목으로. ¶

https://sumniya.tistory.com/26 에 따르면 '분류성능평가지표'
...

evaluation metric이 해당 영어표현인듯... ex.

Classification Evaluation Metrics (분류성능지표)
model evaluation metric
and?

evaluation_metric { 분류,classification 모델(classification_model) 성능의 척도/측도(측도,measure)? ... Up: 평가,evaluation metric(계량,metric 또는 거리,metric) }
https://blog.naver.com/jgyy4775/222640240966 에 따르면 'classification metric'(s) classification_metric
암튼 pagename TBD. 근데 먼저 metric의 pagename이 결정되어야 하는데 거리 계량 중 .. 그냥 계량이 최선인 듯 한데, 거리는 거리,distance와 겹치니.. 계량,metric

일단 이것들에는 다음과 같은 것이 있으며 아래 나누어 서술

먼저 용어정리 - true, false, positive, negative 정리필요.
번역들도. 보통 positive=양성 negative=음성

먼저 confusion_matrix 서술 필요. <- 앞문단으로 만들까? prerequisite으로

(실제 = 정답 = ...) vs (예측 = 예상 = ...)

정확도,accuracy - 전체 중에서 얼마나 맞추었는지, 올바르게 예측된 수 / 전체 데이터 수,
정밀도,precision - model이 true로 예측한 것 중에서 실제로 얼마나 true인지, 예측을 positive로 한 것 중에서 true AND positive인 것의 비율,
재현율,recall -
민감도,sensitivity - recall과 syn? 항상? chk
F1_score = F1-measure - curr at 측도,measure

accuracy = (TN + TP) / (TN + FP + FN + TP)
precision = TP / (FP + TP)
recall = TP / (FN + TP) <-- chk

precision과 recall 이 둘이 서로 trade-off관계가 있다고 했었나?
- yes. false_positive true_negative 그 개념 그거임...

recall이 상대적으로 더 중요한 지표인 경우 : 실제 positive인 데이터 예측을 negative로 잘못 판단하게 되면 업무상 큰 영향이 발생하는 경우
precision이 상대적으로 더 중요한 지표인 경우 : 실제 negative인 데이터 예측을 positive로 잘못 판단하게 되면 업무상 큰 영향이 발생하는 경우

그래서 둘 중에 하나를 더 만족스럽게(? 향상? 더 좋은 표현) 하려면 다른 하나가 덜 정확해지는 그런...
...

정밀도 재현율 트레이드오프

precision recall tradeoff

그 둘을 조화평균,harmonic_mean하여 통합(?) 종합? 한 것이 F1 score.

[edit]

3.1. accuracy ¶

[edit]

3.2. precision ¶

[edit]

3.3. ...tbw ¶

4. Cover's theorem ¶

Cover_theorem
{
Cover's theorem

“pattern-classification problem, cast in a high dimensional space non-linearly,
is more likely to be linearly separable than in a low-dimensional space”

(Kwak)

Cover's_theorem
...

cover's theorem
}

MKLINK
class - 클래스,class, 부류/류(부류,class maybe? 동치류,equivalence_class etc), 계급, 등급, ...
label - 레이블, ... 레이블,label { rr w }
labeled_data - del ok
dataset
거리,distance
neighbor - 거리에 따라서

nearest_neighbor and kNN and...
clustering - 이것도 거리와 관계 깊은

초평면,hyperplane (저차원의 경우: 2d에선 직선,line, 3d에선 평면,plane)
결정경계,decision_boundary ... 이 둘은 분류에선 같은건가? 초평면은 결정경계 역할을 하는 것?
유사도,similarity .. via backlink. 유사한 대상(data point)을 같은 class로 분류하는 경우가 많으므로, related.
taxonomy - 역시 분류라고 번역됨. 그렇다면 classification과의 차이는? 혹시 taxonomy가 n-level 트리,tree구조라면 classification은 1-level tree 구조인가? 아님 항상 그렇진 않지만 그런 쪽의 뉘앙스가 있는건가? - (그냥 생각, not sure)

택소노미 분류학 ....
2024-05-08 지금 보면 classification도 당연히 n-level일 수 있음(

dendrogram을 생각)
둘 다 cluster(~=group)로 묶는 clustering을 포함
taxonomy는 classification을 하는 과학science/과정process/기술technique/체계system 정도의 뜻
그리고 n-tree는 지금 생각해보면 hierarchy에도 가까운. 왜 그때 생각이 안났지... 암튼. 위계,hierarchy는 분류과정에서 만들어지는 트리,tree 구조,structure i.e. tree_structure. 그리고 hierarchical clustering의 결과를 시각적으로 보여주는 그림,diagram이 dendrogram.

taxonomy

연관 표현: 구분,

Up: 기계학습,machine_learning > 지도학습,supervised_learning

VeryGoodWiki

This wiki is very good.
Minus sign: −

분류,classification

Contents

1. Cmp clustering ¶

2. writing, mklink ¶

3. model이 얼마나 잘 작동하는지, 즉 분류기가 분류한 것과 실제 정답이 얼마나 일치하는지, 에 대한 통계적 수치(? 측도? 척도?)들 <- 적절한 제목으로. ¶

3.1. accuracy ¶

3.2. precision ¶

3.3. ...tbw ¶

4. Cover's theorem ¶