#noindex Sub: [[분류정리,classification_theorem]] - [[정리,theorem]] - [[수학,math]]얘기 WpEn:Classification_theorem [[통계적분류,statistical_classification]] - writing - [[통계,statistics]]적 WpEn:Statistical_classification Sim: [[범주,category]] <> = Cmp clustering = not sure chk: || ||tag/label/class/category ...가(즉 그 집합이) ||[[문서,document]]에 대해 || ||'''분류 classification''' ||미리 정해져 있음? ||Ggl:"문서 분류" : kNN, NB, SVM, random_forest 등 || ||무리짓기? 덩이짓기? ?? 클러스터링 .... [[clustering]] ||미리 정해져 있지 않음? ||Ggl:"문서 클러스터링" : k-Means, DBSCAN || [[clustering]] =,clustering { } Naver:"분류+클러스터링+차이" Naver:"분류+클러스터링+비교" Ggl:"분류+클러스터링+차이+비교" Ggl:"classification+vs+clustering" Ggl:"classification+vs+clustering+difference" = writing, mklink = Sub: [[이진분류,binary_classification]] multiclass_classification (multi-class_classification ?) https://developers.google.com/machine-learning/glossary?hl=ko#multi-class-classification multi-label_classification and multi-output_classification [[이미지분류,image_classification]] [[linear_classification]] 선형분류 WpKo:선형_분류 = https://ko.wikipedia.org/wiki/선형_분류 [[linear_classifier]] 선형분류기 WpEn:Linear_classifier = https://en.wikipedia.org/wiki/Linear_classifier [[quadratic_classification]] [[quadratic_classifier]] WpEn:Quadratic_classifier = https://en.wikipedia.org/wiki/Quadratic_classifier classification algorithms [[서포트벡터머신,support_vector_machine,SVM]] [[결정트리,decision_tree]] ... ---- Terms: / rel. class label ground truth confidence score top-1 error, top-5 error - [[오류,error]] [[오차,error]] 중에 뭐에 가까운가? or both? IU, IoU - intersection over union bounding_box [[분류기,classifier]] { (단어) WtEn:classifier syn. WtEn:classificator WpEn:Classifier (기계학습의 분류기) [[WpEn:Classifier_(machine_learning)]] } [[clustering]] [[threshold]] { TBD: 이게 분류에만 쓰이는 개념은 아닌데 어떤 pagename is best? Classifiers typically employ some kind of a threshold. Changing the threshold will affect the performance of the classifier. [[ROC곡선,ROC_curve|Receiver Operating Characteristic (ROC) curves]] allow us to evaluate the performance of a classifier using different thresholds. (Kwak, Slide 1, p76) } [[패턴인식,pattern_recognition]] compare: [[자료,data]] [[자료집합,dataset]] [[data_labeling]] or [[labeling]] = model이 얼마나 잘 작동하는지, 즉 분류기가 분류한 것과 실제 정답이 얼마나 일치하는지, 에 대한 통계적 수치(? 측도? 척도?)들 <- 적절한 제목으로. = https://sumniya.tistory.com/26 에 따르면 '분류성능평가지표' ... Google:evaluation+metric 이 해당 영어표현인듯... ex. Classification Evaluation Metrics (분류성능지표) model evaluation metric and? [[evaluation_metric]] { [[분류,classification]] 모델(classification_model) 성능의 척도/측도([[측도,measure]])? ... Up: [[평가,evaluation]] [[metric]]([[계량,metric]] 또는 [[거리,metric]]) } https://blog.naver.com/jgyy4775/222640240966 에 따르면 'classification metric'(s) [[classification_metric]] 암튼 pagename TBD. 근데 먼저 [[metric]]의 pagename이 결정되어야 하는데 거리 계량 중 .. 그냥 계량이 최선인 듯 한데, 거리는 [[거리,distance]]와 겹치니.. [[계량,metric]] 일단 이것들에는 다음과 같은 것이 있으며 아래 나누어 서술 먼저 용어정리 - true, false, positive, negative 정리필요. 번역들도. 보통 positive=양성 negative=음성 먼저 [[confusion_matrix]] 서술 필요. <- 앞문단으로 만들까? prerequisite으로 (실제 = 정답 = ...) vs (예측 = 예상 = ...) [[정확도,accuracy]] - 전체 중에서 얼마나 맞추었는지, 올바르게 예측된 수 / 전체 데이터 수, [[정밀도,precision]] - model이 true로 예측한 것 중에서 실제로 얼마나 true인지, 예측을 positive로 한 것 중에서 true AND positive인 것의 비율, [[재현율,recall]] - [[민감도,sensitivity]] - recall과 syn? 항상? chk F1_score = F1-measure - curr at [[측도,measure]] accuracy = (TN + TP) / (TN + FP + FN + TP) precision = TP / (FP + TP) recall = TP / (FN + TP) <-- chk precision과 recall 이 둘이 서로 trade-off관계가 있다고 했었나? - yes. false_positive true_negative 그 개념 그거임... * recall이 상대적으로 더 중요한 지표인 경우 : 실제 positive인 데이터 예측을 negative로 잘못 판단하게 되면 업무상 큰 영향이 발생하는 경우 * precision이 상대적으로 더 중요한 지표인 경우 : 실제 negative인 데이터 예측을 positive로 잘못 판단하게 되면 업무상 큰 영향이 발생하는 경우 그래서 둘 중에 하나를 더 만족스럽게(? 향상? 더 좋은 표현) 하려면 다른 하나가 덜 정확해지는 그런... ... Google:정밀도+재현율+트레이드오프 Google:precision+recall+tradeoff 그 둘을 [[조화평균,harmonic_mean]]하여 통합(?) 종합? 한 것이 F1 score. == accuracy == == precision == == ...tbw == see also [[혼동행렬,confusion_matrix]] = Cover's theorem = [[Cover_theorem]] { '''Cover's theorem''' “pattern-classification problem, cast in a high dimensional space non-linearly, is more likely to be linearly separable than in a low-dimensional space” https://i.imgur.com/19517pbm.png https://i.imgur.com/3vt88wPm.png https://i.imgur.com/sTnkXxRm.png (Kwak) [[WpEn:Cover's_theorem]] ... [[Google:cover's+theorem]] } ---- MKLINK [[class]] - 클래스, 부류, ... [[label]] - 레이블, ... [[labeled_data]] - del ok [[dataset]] [[거리,distance]] [[neighbor]] - 거리에 따라서 [[nearest_neighbor]] and [[kNN]] and... [[clustering]] - 이것도 거리와 관계 깊은 [[초평면,hyperplane]] (저차원의 경우: 2d에선 [[직선,line]], 3d에선 [[평면,plane]]) [[결정경계,decision_boundary]] ... 이 둘은 '''분류'''에선 같은건가? 초평면은 결정경계 역할을 하는 것? [[유사도,similarity]] .. via backlink. 유사한 대상(data point)을 같은 class로 분류하는 경우가 많으므로, related. [[taxonomy]] - 역시 분류라고 번역됨. 그렇다면 classification과의 차이는? 혹시 taxonomy가 n-level [[트리,tree]]구조라면 classification은 1-level tree 구조인가? 아님 항상 그렇진 않지만 그런 쪽의 뉘앙스가 있는건가? - (그냥 생각, not sure) 택소노미 분류학 .... [[Date(2024-05-08T14:47:05)]] 지금 보면 classification도 당연히 n-level일 수 있음(NN:dendrogram 을 생각) 둘 다 cluster(~=group)로 묶는 clustering을 포함 taxonomy는 classification을 하는 과학science/과정process/기술technique/체계system 정도의 뜻 그리고 n-tree는 지금 생각해보면 hierarchy에도 가까운. 왜 그때 생각이 안났지... 암튼. [[위계,hierarchy]]는 분류과정에서 만들어지는 [[트리,tree]] [[구조,structure]] i.e. tree_structure. 그리고 hierarchical clustering의 결과를 시각적으로 보여주는 [[그림,diagram]]이 dendrogram. NN:taxonomy WtEn:taxonomy 연관 표현: 구분, ---- Up: [[기계학습,machine_learning]] > [[지도학습,supervised_learning]]