From e7f01968bf0aa8da38614ffe028cbd7ba1581d37 Mon Sep 17 00:00:00 2001 From: kwanghyeokahn Date: Fri, 26 Oct 2018 11:23:23 +0900 Subject: [PATCH 01/18] Update ko/cheatsheet-unsupervised-learning.md --- ko/cheatsheet-unsupervised-learning.md | 340 +++++++++++++++++++++++++ 1 file changed, 340 insertions(+) create mode 100644 ko/cheatsheet-unsupervised-learning.md diff --git a/ko/cheatsheet-unsupervised-learning.md b/ko/cheatsheet-unsupervised-learning.md new file mode 100644 index 000000000..827d815a3 --- /dev/null +++ b/ko/cheatsheet-unsupervised-learning.md @@ -0,0 +1,340 @@ +**1. Unsupervised Learning cheatsheet** + +⟶ + +
+ +**2. Introduction to Unsupervised Learning** + +⟶ + +
+ +**3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.** + +⟶ + +
+ +**4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:** + +⟶ + +
+ +**5. Clustering** + +⟶ + +
+ +**6. Expectation-Maximization** + +⟶ + +
+ +**7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:** + +⟶ + +
+ +**8. [Setting, Latent variable z, Comments]** + +⟶ + +
+ +**9. [Mixture of k Gaussians, Factor analysis]** + +⟶ + +
+ +**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:** + +⟶ + +
+ +**11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:** + +⟶ + +
+ +**12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:** + +⟶ + +
+ +**13. [Gaussians initialization, Expectation step, Maximization step, Convergence]** + +⟶ + +
+ +**14. k-means clustering** + +⟶ + +
+ +**15. We note c(i) the cluster of data point i and μj the center of cluster j.** + +⟶ + +
+ +**16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:** + +⟶ + +
+ +**17. [Means initialization, Cluster assignment, Means update, Convergence]** + +⟶ + +
+ +**18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:** + +⟶ + +
+ +**19. Hierarchical clustering** + +⟶ + +
+ +**20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.** + +⟶ + +
+ +**21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:** + +⟶ + +
+ +**22. [Ward linkage, Average linkage, Complete linkage]** + +⟶ + +
+ +**23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]** + +⟶ + +
+ +**24. Clustering assessment metrics** + +⟶ + +
+ +**25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.** + +⟶ + +
+ +**26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:** + +⟶ + +
+ +**27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as** + +⟶ + +
+ +**28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:** + +⟶ + +
+ +**29. Dimension reduction** + +⟶ + +
+ +**30. Principal component analysis** + +⟶ + +
+ +**31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.** + +⟶ + +
+ +**32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:** + +⟶ + +
+ +**33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:** + +⟶ + +
+ +**34. diagonal** + +⟶ + +
+ +**35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.** + +⟶ + +
+ +**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k +dimensions by maximizing the variance of the data as follows:** + +⟶ + +
+ +**37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.** + +⟶ + +
+ +**38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.** + +⟶ + +
+ +**39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.** + +⟶ + +
+ +**40. Step 4: Project the data on spanR(u1,...,uk).** + +⟶ + +
+ +**41. This procedure maximizes the variance among all k-dimensional spaces.** + +⟶ + +
+ +**42. [Data in feature space, Find principal components, Data in principal components space]** + +⟶ + +
+ +**43. Independent component analysis** + +⟶ + +
+ +**44. It is a technique meant to find the underlying generating sources.** + +⟶ + +
+ +**45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:** + +⟶ + +
+ +**46. The goal is to find the unmixing matrix W=A−1.** + +⟶ + +
+ +**47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:** + +⟶ + +
+ +**48. Write the probability of x=As=W−1s as:** + +⟶ + +
+ +**49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:** + +⟶ + +
+ +**50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:** + +⟶ + +
+ +**51. The Machine Learning cheatsheets are now available in Japanese.** + +⟶ + +
+ +**52. Original authors** + +⟶ + +
+ +**53. Translated by X, Y and Z** + +⟶ + +
+ +**54. Reviewed by X, Y and Z** + +⟶ + +
+ +**55. [Introduction, Motivation, Jensen's inequality]** + +⟶ + +
+ +**56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]** + +⟶ + +
+ +**57. [Dimension reduction, PCA, ICA]** + +⟶ From 80fa7fdbede7297e7ed2a3ddc3e04e8a54fc08e7 Mon Sep 17 00:00:00 2001 From: kwanghyeokahn <44485235+kwanghyeokahn@users.noreply.github.com> Date: Fri, 26 Oct 2018 11:26:15 +0900 Subject: [PATCH 02/18] Update cheatsheet-unsupervised-learning.md --- ko/cheatsheet-unsupervised-learning.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ko/cheatsheet-unsupervised-learning.md b/ko/cheatsheet-unsupervised-learning.md index 827d815a3..3f9df3dd0 100644 --- a/ko/cheatsheet-unsupervised-learning.md +++ b/ko/cheatsheet-unsupervised-learning.md @@ -1,6 +1,6 @@ **1. Unsupervised Learning cheatsheet** -⟶ +⟶ 하하
From b660e6a3b92f57bdafc00cbee89845ca7f3784a0 Mon Sep 17 00:00:00 2001 From: kwanghyeokahn <44485235+kwanghyeokahn@users.noreply.github.com> Date: Fri, 26 Oct 2018 17:58:53 +0900 Subject: [PATCH 03/18] Update cheatsheet-unsupervised-learning.md --- ko/cheatsheet-unsupervised-learning.md | 92 +++++++++++++------------- 1 file changed, 46 insertions(+), 46 deletions(-) diff --git a/ko/cheatsheet-unsupervised-learning.md b/ko/cheatsheet-unsupervised-learning.md index 3f9df3dd0..c76e2d3af 100644 --- a/ko/cheatsheet-unsupervised-learning.md +++ b/ko/cheatsheet-unsupervised-learning.md @@ -1,204 +1,204 @@ **1. Unsupervised Learning cheatsheet** -⟶ 하하 +⟶ 비지도 학습 cheatsheet
**2. Introduction to Unsupervised Learning** -⟶ +⟶ 비지도 학습 소개
**3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.** -⟶ +⟶ 동기부여 - 비지도학습의 목표는 {x(1),...,x(m)}와 같이 라벨링이 되어있지 않은 데이터 내의 숨겨진 패턴을 찾는것이다.
**4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:** -⟶ +⟶ 옌센 부등식 - f를 볼록함수로 하며 X는 확률변수로 두고 아래와 같은 부등식을 따르도록 하자.
**5. Clustering** -⟶ +⟶ 군집화
**6. Expectation-Maximization** -⟶ +⟶ 기댓값 최대화
**7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:** -⟶ +⟶ 잠재변수 - 잠재변수들은 숨겨져있거나 관측되지 않는 변수들을 말하며, 이러한 변수들은 추정문제의 어려움을 가져온다. 그리고 잠재변수는 종종 z로 표기되어진다. 일반적인 잠재변수로 구성되어져있는 형태들을 살펴보자
**8. [Setting, Latent variable z, Comments]** -⟶ +⟶ 표기형태, 잠재변수 z, 주석
**9. [Mixture of k Gaussians, Factor analysis]** -⟶ +⟶ 가우시안 혼합모델, 요인분석
-**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:** +**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:** -⟶ +⟶ 알고리즘 - 기댓값 최대화 (EM) 알고리즘은 모수 θ를 추정하는 효율적인 방법을 제공해준다. 모수 θ의 추정은 아래와 같이 우도의 아래 경계지점을 구성하는(E-step)과 그 우도의 아래 경계지점을 최적화하는(M-step)들의 반복적인 최대우도측정을 통해 추정된다.
**11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:** -⟶ +⟶ E-step : 각각의 데이터 포인트 x(i)은 특정 클러스터 z(i)로 부터 발생한 후 사후확률Qi(z(i))를 평가한다. 아래의 식 참조
**12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:** -⟶ +⟶ M-step : 데이터 포인트 x(i)에 대한 클러스트의 특정 가중치로 사후확률 Qi(z(i))을 사용, 각 클러스트 모델을 개별적으로 재평가한다. 아래의 식 참조
**13. [Gaussians initialization, Expectation step, Maximization step, Convergence]** -⟶ +⟶ Gaussians 초기값, 기대 단계, 최대화 단계, 수렴
**14. k-means clustering** -⟶ +⟶ k-평균 군집화
**15. We note c(i) the cluster of data point i and μj the center of cluster j.** -⟶ +⟶ c(i)는 데이터 포인트 i 와 j군집의 중앙인 μj 들의 군집이다.
**16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:** -⟶ +⟶ 알고리즘 - 군집 중앙에 μ1,μ2,...,μk∈Rn 와 같이 무작위로 초기값을 잡은 후, k-평균 알고리즘이 수렴될때 까지 아래와 같은 단계를 반복한다.
**17. [Means initialization, Cluster assignment, Means update, Convergence]** -⟶ +⟶ 평균 초기값, 군집분할, 평균 재조정, 수렴
**18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:** -⟶ +⟶ 왜곡 함수 - 알고리즘이 수렴하는지를 확인하기 위해서는 아래와 같은 왜곡함수를 정의해야 합니다.
**19. Hierarchical clustering** -⟶ +⟶ 계층적 군집분석
**20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.** -⟶ +⟶ 알고리즘 - 연속적 방식으로 중첩된 클러스트를 구축하는 결합형 계층적 접근방식을 사용하는 군집 알고리즘이다.
**21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:** -⟶ +⟶ 종류 - 다양한 목적함수의 최적화를 목표로하는 다양한 종류의 계층적 군집분석 알고리즘들이 있으며, 아래 표와 같이 요약되어 있다.
**22. [Ward linkage, Average linkage, Complete linkage]** -⟶ +⟶ Ward 연결법, 평균 연결법, 완전 연결법
**23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]** -⟶ +⟶ 군집 거리 내에서의 최소화, 한쌍의 군집간 평균거리의 최소화, 한쌍의 군집간 최대거리의 최소화
**24. Clustering assessment metrics** -⟶ +⟶ 군집화 평가 metrics
**25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.** -⟶ +⟶ 비지도학습 환경에서는, 지도학습 환경과는 다르게 실측자료에 라벨링이 없기 때문에 종종 모델에 대한 성능평가가 어렵다.
**26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:** -⟶ +⟶ 실루엣 계수 - a와 b를 같은 클래스의 다른 모든점과 샘플 사이의 평균거리와 다음 가장 가까운 군집의 다른 모든 점과 샘플사이의 평균거리로 표기하면 단일 샘플에 대한 실루엣 계수 s는 다음과 같이 정의할 수 있다.
**27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as** -⟶ +⟶ Calinski-Harabaz 색인 - k개 군집에 Bk와 Wk를 표기하면, 다음과 같이 각각 정의 된 군집간 분산행렬이다.
**28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:** -⟶ +⟶ Calinski-Harabaz 색인 s(k)는 군집모델이 군집화를 얼마나 잘 정의하는지를 나타낸다. 가령 높은 점수일수록 군집이 더욱 밀도있으며 잘 분리되는 형태이다. 아래와 같은 정의를 따른다.
**29. Dimension reduction** -⟶ +⟶ 차원 축소
**30. Principal component analysis** -⟶ +⟶ 주성분 분석
**31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.** -⟶ +⟶ 차원축소 기술은 데이터를 반영하는 최대 분산방향을 찾는 기술입니다.
**32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:** -⟶ +⟶ 고유값, 고유벡터 - A∈Rn×n 행렬이 주어질때, λ는 A의 고유값이 되며, 만약 z∈Rn∖{0} 벡터가 있다면 고유함수이다.
**33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:** -⟶ +⟶ 스펠트럼 정리 - A∈Rn×n 이라고 하자 만약 A가 대칭이라면, A는 실수 직교 행렬 U∈Rn×n에 의해 대각행렬로 만들 수 있다.
-**34. diagonal** +**34. diagonal** -⟶ +⟶ 대각선
@@ -211,7 +211,7 @@ **36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k dimensions by maximizing the variance of the data as follows:** -⟶ +⟶ 알고리즘 - 주성분 분석
@@ -253,7 +253,7 @@ dimensions by maximizing the variance of the data as follows:** **43. Independent component analysis** -⟶ +⟶ 독립성분분석
@@ -265,13 +265,13 @@ dimensions by maximizing the variance of the data as follows:** **45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:** -⟶ +⟶ 가정 - 우리는 data x가 n차원의 source vector s=(s1,...,sn)에서부터 생성되었음을 가정한다. 이때 si는 독립적인 확률변수에서 나왔으며,
**46. The goal is to find the unmixing matrix W=A−1.** -⟶ +⟶ 목표는
@@ -295,19 +295,19 @@ dimensions by maximizing the variance of the data as follows:** **50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:** -⟶ +⟶
**51. The Machine Learning cheatsheets are now available in Japanese.** -⟶ +⟶ 머신러닝 cheatsheet들은 일본어로도 이용 가능하다
**52. Original authors** -⟶ +⟶ 원작자
@@ -325,16 +325,16 @@ dimensions by maximizing the variance of the data as follows:** **55. [Introduction, Motivation, Jensen's inequality]** -⟶ +⟶ 소개, 동기부여, 얀센 부등식
**56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]** -⟶ +⟶ 군집화, 기댓값-최대화, k-means,
**57. [Dimension reduction, PCA, ICA]** -⟶ +⟶ 차원축소, 주성분분석(PCA), From cf4827a1359c65aa73b8521c1e60af1274702457 Mon Sep 17 00:00:00 2001 From: kwanghyeokahn <44485235+kwanghyeokahn@users.noreply.github.com> Date: Mon, 29 Oct 2018 09:57:09 +0900 Subject: [PATCH 04/18] Update cheatsheet-unsupervised-learning.md --- ko/cheatsheet-unsupervised-learning.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ko/cheatsheet-unsupervised-learning.md b/ko/cheatsheet-unsupervised-learning.md index c76e2d3af..76a0d5361 100644 --- a/ko/cheatsheet-unsupervised-learning.md +++ b/ko/cheatsheet-unsupervised-learning.md @@ -192,7 +192,7 @@ **33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:** -⟶ 스펠트럼 정리 - A∈Rn×n 이라고 하자 만약 A가 대칭이라면, A는 실수 직교 행렬 U∈Rn×n에 의해 대각행렬로 만들 수 있다. +⟶ 스펙트럼 정리 - A∈Rn×n 이라고 하자 만약 A가 대칭이라면, A는 실수 직교 행렬 U∈Rn×n에 의해 대각행렬로 만들 수 있다.
@@ -204,7 +204,7 @@ **35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.** -⟶ +⟶
From f2439f278140f68e2364a600c39d850d8778a358 Mon Sep 17 00:00:00 2001 From: kwanghyeokahn <44485235+kwanghyeokahn@users.noreply.github.com> Date: Thu, 1 Nov 2018 15:55:04 +0900 Subject: [PATCH 05/18] Update cheatsheet-unsupervised-learning.md --- ko/cheatsheet-unsupervised-learning.md | 44 +++++++++++++------------- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/ko/cheatsheet-unsupervised-learning.md b/ko/cheatsheet-unsupervised-learning.md index 76a0d5361..aab39f85f 100644 --- a/ko/cheatsheet-unsupervised-learning.md +++ b/ko/cheatsheet-unsupervised-learning.md @@ -102,7 +102,7 @@ **18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:** -⟶ 왜곡 함수 - 알고리즘이 수렴하는지를 확인하기 위해서는 아래와 같은 왜곡함수를 정의해야 합니다. +⟶ 왜곡 함수 - 알고리즘이 수렴하는지를 확인하기 위해서는 아래와 같은 왜곡함수를 정의해야 한다.
@@ -120,7 +120,7 @@ **21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:** -⟶ 종류 - 다양한 목적함수의 최적화를 목표로하는 다양한 종류의 계층적 군집분석 알고리즘들이 있으며, 아래 표와 같이 요약되어 있다. +⟶ 종류 - 다양한 목적함수의 최적화를 목표로하는 다양한 종류의 계층적 군집분석 알고리즘들이 있으며, 아래 표와 같이 요약되어있다.
@@ -180,7 +180,7 @@ **31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.** -⟶ 차원축소 기술은 데이터를 반영하는 최대 분산방향을 찾는 기술입니다. +⟶ 차원축소 기술은 데이터를 반영하는 최대 분산방향을 찾는 기술이다.
@@ -204,50 +204,50 @@ **35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.** -⟶ +⟶ 참조: 가장 큰 고유값과 연관된 고유 벡터를 행렬 A의 주요 고유벡터라고 부른다
**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k dimensions by maximizing the variance of the data as follows:** -⟶ 알고리즘 - 주성분 분석 +⟶ 알고리즘 - 주성분 분석(PCA) 절차는 데이터 분산을 최대화하여 k 차원의 데이터를 투영하는 차원 축소 기술로 다음과 같이 따른다.
**37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.** -⟶ +⟶ 1단계: 평균을 0으로 표준편차가 1이되도록 데이터를 표준화한다.
**38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.** -⟶ +⟶ 2단계: 실제 고유값과 대칭인 Σ=1mm∑i=1x(i)x(i)T∈Rn×n를 계산합니다.
**39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.** -⟶ +⟶ 3단계: k 직교 고유벡터의 합을 u1,...,uk∈Rn와 같이 계산한다. 다시말하면, 가장 큰 고유값 k의 직교 고유벡터이다.
**40. Step 4: Project the data on spanR(u1,...,uk).** -⟶ +⟶ 4단계: R(u1,...,uk) 범위에 데이터를 투영하자.
**41. This procedure maximizes the variance among all k-dimensional spaces.** -⟶ +⟶ 해당 절차는 모든 k-차원의 공간들 사이에 분산을 최대화 하는것이다.
**42. [Data in feature space, Find principal components, Data in principal components space]** -⟶ +⟶ 변수공간의 데이터, 주요성분들 찾기, 주요성분공간의 데이터
@@ -259,67 +259,67 @@ dimensions by maximizing the variance of the data as follows:** **44. It is a technique meant to find the underlying generating sources.** -⟶ +⟶ 근원적인 생성원을 찾기위한 기술을 의미한다.
**45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:** -⟶ 가정 - 우리는 data x가 n차원의 source vector s=(s1,...,sn)에서부터 생성되었음을 가정한다. 이때 si는 독립적인 확률변수에서 나왔으며, +⟶ 가정 - 다음과 같이 우리는 데이터 x가 n차원의 소스벡터 s=(s1,...,sn)에서부터 생성되었음을 가정한다. 이때 si는 독립적인 확률변수에서 나왔으며, 혼합 및 비특이 행렬 A를 통해 생성된다고 가정한다.
**46. The goal is to find the unmixing matrix W=A−1.** -⟶ 목표는 +⟶ 비혼합 행렬 W=A−1를 찾는 것을 목표로 한다.
**47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:** -⟶ +⟶ Bell과 Sejnowski 독립성분분석(ICA) 알고리즘 - 다음의 단계들을 따르는 비혼합 행렬 W를 찾는 알고리즘이다.
**48. Write the probability of x=As=W−1s as:** -⟶ +⟶ x=As=W−1s의 확률을 다음과 같이 기술한다.
**49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:** -⟶ +⟶ 주어진 학습데이터 {x(i),i∈[[1,m]]}에 로그우도를 기술하고 시그모이드 함수 g를 다음과 같이 표기한다.
**50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:** -⟶ +⟶ 그러므로, 확률적 경사상승 학습 규칙은 각 학습예제 x(i)에 대해서 다음과 같이 W를 업데이트하는 것과 같다.
**51. The Machine Learning cheatsheets are now available in Japanese.** -⟶ 머신러닝 cheatsheet들은 일본어로도 이용 가능하다 +⟶ 머신러닝 cheatsheets는 현재 일본어로 제공된다.
**52. Original authors** -⟶ 원작자 +⟶ 원저자
**53. Translated by X, Y and Z** -⟶ +⟶ X,Y,Z에 의해 번역되다.
**54. Reviewed by X, Y and Z** -⟶ +⟶ X,Y,Z에 의해 검토되다.
From 638941e55b72a919d841b75620c862541aa3288d Mon Sep 17 00:00:00 2001 From: kwanghyeokahn <44485235+kwanghyeokahn@users.noreply.github.com> Date: Thu, 1 Nov 2018 16:00:15 +0900 Subject: [PATCH 06/18] Update CONTRIBUTORS --- CONTRIBUTORS | 3 +++ 1 file changed, 3 insertions(+) diff --git a/CONTRIBUTORS b/CONTRIBUTORS index 27d30d4fc..5bc3ef12f 100644 --- a/CONTRIBUTORS +++ b/CONTRIBUTORS @@ -65,6 +65,9 @@ --hi +--ko + Kwang Hyeok Ahn (translation of Unsupervised Learning) + --ja --pt From 07ebeb3b6b08eab1de89d438434369e6ecd26b6a Mon Sep 17 00:00:00 2001 From: kwanghyeokahn <44485235+kwanghyeokahn@users.noreply.github.com> Date: Thu, 1 Nov 2018 16:19:13 +0900 Subject: [PATCH 07/18] Update cheatsheet-unsupervised-learning.md --- ko/cheatsheet-unsupervised-learning.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ko/cheatsheet-unsupervised-learning.md b/ko/cheatsheet-unsupervised-learning.md index aab39f85f..b31611788 100644 --- a/ko/cheatsheet-unsupervised-learning.md +++ b/ko/cheatsheet-unsupervised-learning.md @@ -331,10 +331,10 @@ dimensions by maximizing the variance of the data as follows:** **56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]** -⟶ 군집화, 기댓값-최대화, k-means, +⟶ 군집화, 기댓값-최대화, k-means, 계층 군집화, 측정지표
**57. [Dimension reduction, PCA, ICA]** -⟶ 차원축소, 주성분분석(PCA), +⟶ 차원축소, 주성분분석(PCA), 독립성분분석(ICA) From 19ae7cc1cb4316833a97dc04c167477d74dd1b86 Mon Sep 17 00:00:00 2001 From: kwanghyeokahn <44485235+kwanghyeokahn@users.noreply.github.com> Date: Thu, 1 Nov 2018 16:19:56 +0900 Subject: [PATCH 08/18] Update cheatsheet-unsupervised-learning.md --- ko/cheatsheet-unsupervised-learning.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ko/cheatsheet-unsupervised-learning.md b/ko/cheatsheet-unsupervised-learning.md index b31611788..acf881d5d 100644 --- a/ko/cheatsheet-unsupervised-learning.md +++ b/ko/cheatsheet-unsupervised-learning.md @@ -331,7 +331,7 @@ dimensions by maximizing the variance of the data as follows:** **56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]** -⟶ 군집화, 기댓값-최대화, k-means, 계층 군집화, 측정지표 +⟶ 군집화, 기댓값-최대화, k-means, 계층적 군집화, 측정지표
From 23a0005b8f3406db269cd3fcbef515fa9cd3f81e Mon Sep 17 00:00:00 2001 From: Shervine Amidi Date: Thu, 1 Nov 2018 15:20:58 -0700 Subject: [PATCH 09/18] Fix language name on template --- ko/cheatsheet-unsupervised-learning.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ko/cheatsheet-unsupervised-learning.md b/ko/cheatsheet-unsupervised-learning.md index acf881d5d..47f231a98 100644 --- a/ko/cheatsheet-unsupervised-learning.md +++ b/ko/cheatsheet-unsupervised-learning.md @@ -299,7 +299,7 @@ dimensions by maximizing the variance of the data as follows:**
-**51. The Machine Learning cheatsheets are now available in Japanese.** +**51. The Machine Learning cheatsheets are now available in Korean.** ⟶ 머신러닝 cheatsheets는 현재 일본어로 제공된다. From bbc6889a11ad8aa939d18c112400192606e99ff1 Mon Sep 17 00:00:00 2001 From: kwanghyeokahn <44485235+kwanghyeokahn@users.noreply.github.com> Date: Fri, 2 Nov 2018 10:39:35 +0900 Subject: [PATCH 10/18] Update cheatsheet-unsupervised-learning.md --- ko/cheatsheet-unsupervised-learning.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ko/cheatsheet-unsupervised-learning.md b/ko/cheatsheet-unsupervised-learning.md index 47f231a98..e961a88cc 100644 --- a/ko/cheatsheet-unsupervised-learning.md +++ b/ko/cheatsheet-unsupervised-learning.md @@ -301,7 +301,7 @@ dimensions by maximizing the variance of the data as follows:** **51. The Machine Learning cheatsheets are now available in Korean.** -⟶ 머신러닝 cheatsheets는 현재 일본어로 제공된다. +⟶ 머신러닝 cheatsheets는 현재 한국어로 제공된다.
From 1a80dd722738d0ebe10f225f63814a034ec4bf47 Mon Sep 17 00:00:00 2001 From: KwangHyeokAhn <44485235+kwanghyeokahn@users.noreply.github.com> Date: Mon, 31 Dec 2018 09:48:43 +0900 Subject: [PATCH 11/18] Create deep-learning-tips-and-tricks.md --- ko/deep-learning-tips-and-tricks.md | 457 ++++++++++++++++++++++++++++ 1 file changed, 457 insertions(+) create mode 100644 ko/deep-learning-tips-and-tricks.md diff --git a/ko/deep-learning-tips-and-tricks.md b/ko/deep-learning-tips-and-tricks.md new file mode 100644 index 000000000..347234ec2 --- /dev/null +++ b/ko/deep-learning-tips-and-tricks.md @@ -0,0 +1,457 @@ +**Deep Learning Tips and Tricks translation** + +
+ +**1. Deep Learning Tips and Tricks cheatsheet** + +⟶ + +
+ + +**2. CS 230 - Deep Learning** + +⟶ + +
+ + +**3. Tips and tricks** + +⟶ + +
+ + +**4. [Data processing, Data augmentation, Batch normalization]** + +⟶ + +
+ + +**5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]** + +⟶ + +
+ + +**6. [Parameter tuning, Xavier initialization, Transfer learning, Learning rate, Adaptive learning rates]** + +⟶ + +
+ + +**7. [Regularization, Dropout, Weight regularization, Early stopping]** + +⟶ + +
+ + +**8. [Good practices, Overfitting small batch, Gradient checking]** + +⟶ + +
+ + +**9. View PDF version on GitHub** + +⟶ + +
+ + +**10. Data processing** + +⟶ + +
+ + +**11. Data augmentation ― Deep learning models usually need a lot of data to be properly trained. It is often useful to get more data from the existing ones using data augmentation techniques. The main ones are summed up in the table below. More precisely, given the following input image, here are the techniques that we can apply:** + +⟶ + +
+ + +**12. [Original, Flip, Rotation, Random crop]** + +⟶ + +
+ + +**13. [Image without any modification, Flipped with respect to an axis for which the meaning of the image is preserved, Rotation with a slight angle, Simulates incorrect horizon calibration, Random focus on one part of the image, Several random crops can be done in a row]** + +⟶ + +
+ + +**14. [Color shift, Noise addition, Information loss, Contrast change]** + +⟶ + +
+ + +**15. [Nuances of RGB is slightly changed, Captures noise that can occur with light exposure, Addition of noise, More tolerance to quality variation of inputs, Parts of image ignored, Mimics potential loss of parts of image, Luminosity changes, Controls difference in exposition due to time of day]** + +⟶ + +
+ + +**16. Remark: data is usually augmented on the fly during training.** + +⟶ + +
+ + +**17. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:** + +⟶ + +
+ + +**18. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.** + +⟶ + +
+ + +**19. Training a neural network** + +⟶ + +
+ + +**20. Definitions** + +⟶ + +
+ + +**21. Epoch ― In the context of training a model, epoch is a term used to refer to one iteration where the model sees the whole training set to update its weights.** + +⟶ + +
+ + +**22. Mini-batch gradient descent ― During the training phase, updating weights is usually not based on the whole training set at once due to computation complexities or one data point due to noise issues. Instead, the update step is done on mini-batches, where the number of data points in a batch is a hyperparameter that we can tune.** + +⟶ + +
+ + +**23. Loss function ― In order to quantify how a given model performs, the loss function L is usually used to evaluate to what extent the actual outputs y are correctly predicted by the model outputs z.** + +⟶ + +
+ + +**24. Cross-entropy loss ― In the context of binary classification in neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:** + +⟶ + +
+ + +**25. Finding optimal weights** + +⟶ + +
+ + +**26. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to each weight w is computed using the chain rule.** + +⟶ + +
+ + +**27. Using this method, each weight is updated with the rule:** + +⟶ + +
+ + +**28. Updating weights ― In a neural network, weights are updated as follows:** + +⟶ + +
+ + +**29. [Step 1: Take a batch of training data and perform forward propagation to compute the loss, Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight, Step 3: Use the gradients to update the weights of the network.]** + +⟶ + +
+ + +**30. [Forward propagation, Backpropagation, Weights update]** + +⟶ + +
+ + +**31. Parameter tuning** + +⟶ + +
+ + +**32. Weights initialization** + +⟶ + +
+ + +**33. Xavier initialization ― Instead of initializing the weights in a purely random manner, Xavier initialization enables to have initial weights that take into account characteristics that are unique to the architecture.** + +⟶ + +
+ + +**34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:** + +⟶ + +
+ + +**35. [Training size, Illustration, Explanation]** + +⟶ + +
+ + +**36. [Small, Medium, Large]** + +⟶ + +
+ + +**37. [Freezes all layers, trains weights on softmax, Freezes most layers, trains weights on last layers and softmax, Trains weights on layers and softmax by initializing weights on pre-trained ones]** + +⟶ + +
+ + +**38. Optimizing convergence** + +⟶ + +
+ + +**39. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. It can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate. +** + +⟶ + +
+ + +**40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:** + +⟶ + +
+ + +**41. [Method, Explanation, Update of w, Update of b]** + +⟶ + +
+ + +**42. [Momentum, Dampens oscillations, Improvement to SGD, 2 parameters to tune]** + +⟶ + +
+ + +**43. [RMSprop, Root Mean Square propagation, Speeds up learning algorithm by controlling oscillations]** + +⟶ + +
+ + +**44. [Adam, Adaptive Moment estimation, Most popular method, 4 parameters to tune]** + +⟶ + +
+ + +**45. Remark: other methods include Adadelta, Adagrad and SGD.** + +⟶ + +
+ + +**46. Regularization** + +⟶ + +
+ + +**47. Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training data by dropping out neurons with probability p>0. It forces the model to avoid relying too much on particular sets of features.** + +⟶ + +
+ + +**48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.** + +⟶ + +
+ + +**49. Weight regularization ― In order to make sure that the weights are not too large and that the model is not overfitting the training set, regularization techniques are usually performed on the model weights. The main ones are summed up in the table below:** + +⟶ + +
+ + +**50. [LASSO, Ridge, Elastic Net]** + +⟶ + +
+ +**50 bis. Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]** + +⟶ + +
+ +**51. Early stopping ― This regularization technique stops the training process as soon as the validation loss reaches a plateau or starts to increase.** + +⟶ + +
+ + +**52. [Error, Validation, Training, early stopping, Epochs]** + +⟶ + +
+ + +**53. Good practices** + +⟶ + +
+ + +**54. Overfitting small batch ― When debugging a model, it is often useful to make quick tests to see if there is any major issue with the architecture of the model itself. In particular, in order to make sure that the model can be properly trained, a mini-batch is passed inside the network to see if it can overfit on it. If it cannot, it means that the model is either too complex or not complex enough to even overfit on a small batch, let alone a normal-sized training set.** + +⟶ + +
+ + +**55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.** + +⟶ + +
+ + +**56. [Type, Numerical gradient, Analytical gradient]** + +⟶ + +
+ + +**57. [Formula, Comments]** + +⟶ + +
+ + +**58. [Expensive; loss has to be computed two times per dimension, Used to verify correctness of analytical implementation, Trade-off in choosing h not too small (numerical instability) nor too large (poor gradient approximation)]** + +⟶ + +
+ + +**59. ['Exact' result, Direct computation, Used in the final implementation]** + +⟶ + +
+ + +**60. The Deep Learning cheatsheets are now available in [target language]. + +⟶ + + +**61. Original authors** + +⟶ + +
+ +**62.Translated by X, Y and Z** + +⟶ + +
+ +**63.Reviewed by X, Y and Z** + +⟶ + +
+ +**64.View PDF version on GitHub** + +⟶ + +
+ +**65.By X and Y** + +⟶ + +
From b14d774d1af02efb0fb934f42a498b2dfa4ba0b1 Mon Sep 17 00:00:00 2001 From: KwangHyeokAhn <44485235+kwanghyeokahn@users.noreply.github.com> Date: Wed, 2 Jan 2019 19:38:32 +0900 Subject: [PATCH 12/18] Update deep-learning-tips-and-tricks.md --- ko/deep-learning-tips-and-tricks.md | 102 ++++++++++++++-------------- 1 file changed, 51 insertions(+), 51 deletions(-) diff --git a/ko/deep-learning-tips-and-tricks.md b/ko/deep-learning-tips-and-tricks.md index 347234ec2..25df83ffd 100644 --- a/ko/deep-learning-tips-and-tricks.md +++ b/ko/deep-learning-tips-and-tricks.md @@ -4,196 +4,196 @@ **1. Deep Learning Tips and Tricks cheatsheet** -⟶ - +⟶ 딥 러닝 팁과 트릭 치트시트 +
**2. CS 230 - Deep Learning** -⟶ +⟶ CS230 - 딥 러닝
**3. Tips and tricks** -⟶ +⟶ 팁과 트릭 치트시트
**4. [Data processing, Data augmentation, Batch normalization]** -⟶ +⟶ [데이터 처리, 데이터 증가, 배치 정규화]
**5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]** -⟶ +⟶ [신경망 학습, 에포크, 미니-배치, 크로스-엔트로피 손실, 역전파, 경사하강법, 가중치 업데이트, 그레디언트 확인]
**6. [Parameter tuning, Xavier initialization, Transfer learning, Learning rate, Adaptive learning rates]** -⟶ +⟶ [parameter 조정, Xavier 초기화, 전이학습, 학습률, 데이터 맞춤 학습률]
**7. [Regularization, Dropout, Weight regularization, Early stopping]** -⟶ +⟶ [정규화, 드랍아웃, 가중치 정규화, 이른 정지]
**8. [Good practices, Overfitting small batch, Gradient checking]** -⟶ +⟶ [좋은 습관, 오버피팅 스몰 배치, 그레디언트 확인]
**9. View PDF version on GitHub** -⟶ +⟶ GitHub에서 PDF 버전을 확인할 수 있습니다.
**10. Data processing** -⟶ +⟶ 데이터 처리
**11. Data augmentation ― Deep learning models usually need a lot of data to be properly trained. It is often useful to get more data from the existing ones using data augmentation techniques. The main ones are summed up in the table below. More precisely, given the following input image, here are the techniques that we can apply:** -⟶ +⟶ 데이터 증가 - 딥러닝 모델들은 적절한 일반적으로 학습을 위해 많은 양의 데이터를 필요로 합니다. 데이터 증가 기술을 사용하여 기존의 데이터에서 더 많은 데이터를 얻는 것은 종종 유용합니다. 주요 내용은 아래 표에 요약되어 있습니다. 보다 정확하게, 주어진 이미지에 따라 우리가 적용할 수 있는 기술들이 있습니다. :
**12. [Original, Flip, Rotation, Random crop]** -⟶ +⟶ [원본, 반전, 회전, 랜덤 이미지 패치]
**13. [Image without any modification, Flipped with respect to an axis for which the meaning of the image is preserved, Rotation with a slight angle, Simulates incorrect horizon calibration, Random focus on one part of the image, Several random crops can be done in a row]** -⟶ +⟶ [수정 없는 이미지, 원본 이미지 회손 없이 좌우 반전, 약간의 각도로 회전, 부정확 한 수평선 보정을 시뮬레이션합니다, 이미지의 한 부분을 임의의 초점으로 맞춥니다, 몇몇 무작위 이미지 패치는 연속으로 나타날 수 있습니다 ]
**14. [Color shift, Noise addition, Information loss, Contrast change]** -⟶ +⟶ [색상변환, 잡음 추가, 정보 손실, 명암대비 변경]
**15. [Nuances of RGB is slightly changed, Captures noise that can occur with light exposure, Addition of noise, More tolerance to quality variation of inputs, Parts of image ignored, Mimics potential loss of parts of image, Luminosity changes, Controls difference in exposition due to time of day]** -⟶ +⟶ [RGB의 뉘앙스는 약간 변경됩니다, 빛 노출로 발생할 수 있는 잡음을 포착할 수 있습니다, 잡음 추가, 인풋의 품질변동에 대한 허용오차 증대, 이미지의 일부분 무시, 손실된 이미지 일부분을 모방할 가능성, 밝기 변화, 하루 중 시간에 따른 노출 변화 제어 ]
**16. Remark: data is usually augmented on the fly during training.** -⟶ +⟶ 비고 : 데이터는 일반적으로 학습중에 증가 됩니다.
**17. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:** -⟶ +⟶ 배치 정규화 - 배치{xi}를 정규화하는 하이퍼파라미터 γ,β 단계입니다. μB,σ2B를 우리가 배치에 정정하고자하는 평균과 분산으로 표기함으로써, 다음과 같이 진행됩니다.
**18. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.** -⟶ +⟶ 일반적으로 완전연결/컨볼루셔널 계층 이후와 비선형 계층 이전에 사용되며 학습률을 높이고 초기화에 대한 의존성을 줄이는 데 그 목적이 있습니다.
**19. Training a neural network** -⟶ +⟶ 신경망 학습
**20. Definitions** -⟶ +⟶ 정의
**21. Epoch ― In the context of training a model, epoch is a term used to refer to one iteration where the model sees the whole training set to update its weights.** -⟶ +⟶ 에포크 - 모델 학습의 맥락에서, 에포크는 모델이 전체 트레이닝 셋의 가중치를 업데이트 하는 한 번의 반복을 뜻하는 용어입니다.
**22. Mini-batch gradient descent ― During the training phase, updating weights is usually not based on the whole training set at once due to computation complexities or one data point due to noise issues. Instead, the update step is done on mini-batches, where the number of data points in a batch is a hyperparameter that we can tune.** -⟶ +⟶ 미니-배치 경사하강법 - 학습 단계에서, 가중치 업데이트는 일반적으로 계산 복잡성이나 잡음 문제로 인한 하나의 데이터 포인트로 인해 전체 트레이닝 셋을 기반으로하지 않습니다 대신에, 업데이트 단계는
**23. Loss function ― In order to quantify how a given model performs, the loss function L is usually used to evaluate to what extent the actual outputs y are correctly predicted by the model outputs z.** -⟶ +⟶ 손실함수 -
**24. Cross-entropy loss ― In the context of binary classification in neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:** -⟶ +⟶ 크로스-엔트로피 로스 -
**25. Finding optimal weights** -⟶ +⟶ 최적의 가중치 찾기
**26. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to each weight w is computed using the chain rule.** -⟶ +⟶ 역전파 -
**27. Using this method, each weight is updated with the rule:** -⟶ +⟶
**28. Updating weights ― In a neural network, weights are updated as follows:** -⟶ +⟶ 가중치 업데이트 -
@@ -207,49 +207,49 @@ **30. [Forward propagation, Backpropagation, Weights update]** -⟶ +⟶ [순전파, 역전파, 가중치 업데이트]
**31. Parameter tuning** -⟶ +⟶ 파라미터 조정
**32. Weights initialization** -⟶ +⟶ 가중치 초기화
**33. Xavier initialization ― Instead of initializing the weights in a purely random manner, Xavier initialization enables to have initial weights that take into account characteristics that are unique to the architecture.** -⟶ +⟶ Xavier 초기화 -
**34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:** -⟶ +⟶ 전이학습 -
**35. [Training size, Illustration, Explanation]** -⟶ +⟶ [학습 크기, 삽화, 설명]
**36. [Small, Medium, Large]** -⟶ +⟶ [작음, 중간, 큰]
@@ -271,35 +271,35 @@ **39. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. It can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate. ** -⟶ +⟶ 학습률 -
**40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:** -⟶ +⟶ 데이터 맞춤 학습률 -
**41. [Method, Explanation, Update of w, Update of b]** -⟶ +⟶ [방법, 설명, w 업데이트, b 업데이트]
**42. [Momentum, Dampens oscillations, Improvement to SGD, 2 parameters to tune]** -⟶ +⟶ [모멘텀, ]
**43. [RMSprop, Root Mean Square propagation, Speeds up learning algorithm by controlling oscillations]** -⟶ +⟶ [RMSprop. ]
@@ -313,42 +313,42 @@ **45. Remark: other methods include Adadelta, Adagrad and SGD.** -⟶ +⟶ 비고 : 이외 방법으로 Adadelta, Adagrad 그리고 SGD가 포함됩니다.
**46. Regularization** -⟶ +⟶ 정규화
**47. Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training data by dropping out neurons with probability p>0. It forces the model to avoid relying too much on particular sets of features.** -⟶ +⟶ 드랍아웃 -
**48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.** -⟶ +⟶ 비고 :
**49. Weight regularization ― In order to make sure that the weights are not too large and that the model is not overfitting the training set, regularization techniques are usually performed on the model weights. The main ones are summed up in the table below:** -⟶ +⟶ 가중치 정규화 -
**50. [LASSO, Ridge, Elastic Net]** -⟶ +⟶[라쏘, ]
@@ -402,7 +402,7 @@ **57. [Formula, Comments]** -⟶ +⟶ [공식, 언급]
@@ -423,12 +423,12 @@ **60. The Deep Learning cheatsheets are now available in [target language]. -⟶ +⟶ 딥 러닝 치트시트는 한국어로 이용가능 합니다. **61. Original authors** -⟶ +⟶ 원 저자
@@ -446,7 +446,7 @@ **64.View PDF version on GitHub** -⟶ +⟶ GitHub에서 PDF 버전으로 보실 수 있습니다.
From 374732342d824ecd1c17e551e1b245ab755c821e Mon Sep 17 00:00:00 2001 From: KwangHyeokAhn <44485235+kwanghyeokahn@users.noreply.github.com> Date: Thu, 3 Jan 2019 09:09:00 +0900 Subject: [PATCH 13/18] Update deep-learning-tips-and-tricks.md --- ko/deep-learning-tips-and-tricks.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/ko/deep-learning-tips-and-tricks.md b/ko/deep-learning-tips-and-tricks.md index 25df83ffd..1c311ecd3 100644 --- a/ko/deep-learning-tips-and-tricks.md +++ b/ko/deep-learning-tips-and-tricks.md @@ -151,21 +151,21 @@ **22. Mini-batch gradient descent ― During the training phase, updating weights is usually not based on the whole training set at once due to computation complexities or one data point due to noise issues. Instead, the update step is done on mini-batches, where the number of data points in a batch is a hyperparameter that we can tune.** -⟶ 미니-배치 경사하강법 - 학습 단계에서, 가중치 업데이트는 일반적으로 계산 복잡성이나 잡음 문제로 인한 하나의 데이터 포인트로 인해 전체 트레이닝 셋을 기반으로하지 않습니다 대신에, 업데이트 단계는 +⟶ 미니-배치 경사하강법 - 학습 단계에서, 가중치 업데이트는 일반적으로 계산 복잡성이나 잡음 문제로 인한 하나의 데이터 포인트로 인해 전체 트레이닝 셋을 기반으로하지 않습니다 대신에, 업데이트 단계는 배치내에 있는 여러 데이터 포인트들을 튜닝할 수 있는 하이퍼파라미터인 미니 배치에서 진행됩니다.
**23. Loss function ― In order to quantify how a given model performs, the loss function L is usually used to evaluate to what extent the actual outputs y are correctly predicted by the model outputs z.** -⟶ 손실함수 - +⟶ 손실함수 - 주어진 모델이 어떻게 수행되는지를 정량화하기 위해, 손실 함수 L은 보통 실제 출력값 y가 예측 모델 출력값 z에 의해 정확하게 예측되는 정도를 평가하는 데 사용됩니다.
**24. Cross-entropy loss ― In the context of binary classification in neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:** -⟶ 크로스-엔트로피 로스 - +⟶ 크로스-엔트로피 손실 - 신경망 학습에서 이진분류의 맥락으로 접근하면, 크로스-엔트로피 손실 L(z,y)는 일반적으로 사용되며 다음과 같이 정의됩니다.
@@ -179,28 +179,28 @@ **26. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to each weight w is computed using the chain rule.** -⟶ 역전파 - +⟶ 역전파 - 역전파는 실제 출력값과 원하는 출력값을 계산하여 신경망의 가중치를 업데이트 하는 방법입니다. 각 가중치 w에 대한 미분은 체인규칙을 사용하여 계산됩니다.
**27. Using this method, each weight is updated with the rule:** -⟶ +⟶ 이러한 방법을 사용하여, 각각의 가중치는 아래와 같은 규칙에 의해 업데이트 됩니다 :
**28. Updating weights ― In a neural network, weights are updated as follows:** -⟶ 가중치 업데이트 - +⟶ 가중치 업데이트 - 신경망에서, 다음과 같은 방법으로 가중치는 업데이트 됩니다 :
**29. [Step 1: Take a batch of training data and perform forward propagation to compute the loss, Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight, Step 3: Use the gradients to update the weights of the network.]** -⟶ +⟶ [1단계 : ]
From 3b3cc3a10f3fca9794ce2d4a91e4ad6bd20ecb1f Mon Sep 17 00:00:00 2001 From: KwangHyeokAhn <44485235+kwanghyeokahn@users.noreply.github.com> Date: Fri, 4 Jan 2019 18:57:59 +0900 Subject: [PATCH 14/18] Update deep-learning-tips-and-tricks.md --- ko/deep-learning-tips-and-tricks.md | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/ko/deep-learning-tips-and-tricks.md b/ko/deep-learning-tips-and-tricks.md index 1c311ecd3..0a7b34d08 100644 --- a/ko/deep-learning-tips-and-tricks.md +++ b/ko/deep-learning-tips-and-tricks.md @@ -200,7 +200,7 @@ **29. [Step 1: Take a batch of training data and perform forward propagation to compute the loss, Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight, Step 3: Use the gradients to update the weights of the network.]** -⟶ [1단계 : ] +⟶ [1단계 : 트레이닝 데이터의 배치를 가져와 순전파를 진행하여 손실을 계산합니다. 2단계 : 각 가중치와 관련하여 그레디언트 손실값을 얻기 위해 역전파를 하게됩니다. 3단계 : 네트워크의 가중치를 업데이트 하기 위해서 그레디언트를 사용합니다.]
@@ -228,14 +228,14 @@ **33. Xavier initialization ― Instead of initializing the weights in a purely random manner, Xavier initialization enables to have initial weights that take into account characteristics that are unique to the architecture.** -⟶ Xavier 초기화 - +⟶ Xavier 초기화 - 완전 무작위 방식으로 가중치 초기화하는 대신, Xavier 초기화는 설계에 고유 한 특성을 고려한 초기 가중치를 가질 수 있습니다.
**34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:** -⟶ 전이학습 - +⟶ 전이학습 - 딥 러닝 학습 모델을 학습하기 위해서는 많은 양의 데이터가 필요로하며 무엇 보다도 많은 시간을 필요로 합니다. 학습을 위해 수일,수주의 시간이 걸린 거대한 데이터 셋의 사전 훈련 된 가중치를 활용하며 유스 케이스 활용에 유용합니다. 현재 보유하고있는 데이터의 양에 따라 여러가지 활용 방법이 있습니다.
@@ -256,14 +256,14 @@ **37. [Freezes all layers, trains weights on softmax, Freezes most layers, trains weights on last layers and softmax, Trains weights on layers and softmax by initializing weights on pre-trained ones]** -⟶ +⟶ [모든 층을 고정시키고, softmax에서 가중치 학습, 대부분의 층을 고정시키고, 마지막 층과 softmax에서 가중치 학습, 사전학습 된 가중치를 초기화 하여 layers 및 softamx에 가중치를 학습합니다.]
**38. Optimizing convergence** -⟶ +⟶ convergence 최적화
@@ -271,14 +271,14 @@ **39. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. It can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate. ** -⟶ 학습률 - +⟶ 학습률 - α 또는 때때로 η로 표기되는 학습률은 가중치가 어느 속도로 업데이트 되는지를 나타내줍니다. 이것은 고정되거나 또는 적응적으로 변화될 수 있습니다. 현재 가장 널리 사용되는 방법은 Adam이며, 학습률을 조정하는 방법입니다.
**40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:** -⟶ 데이터 맞춤 학습률 - +⟶ 데이터 맞춤 학습률 - 모델 학습시 학습 속도가 달라지며 학습 시간이 단축되고 수치적인 최적화 솔루션이 향상 될 수 있습니다. Adam 최적화가 가장 일반적으로 사용되는 기술이지만 이외 방법들도 유용합니다. 아래 표에 요약되어 있습니다. :
@@ -367,35 +367,35 @@ **52. [Error, Validation, Training, early stopping, Epochs]** -⟶ +⟶ [에러, ]
**53. Good practices** -⟶ +⟶ [좋은 습관]
**54. Overfitting small batch ― When debugging a model, it is often useful to make quick tests to see if there is any major issue with the architecture of the model itself. In particular, in order to make sure that the model can be properly trained, a mini-batch is passed inside the network to see if it can overfit on it. If it cannot, it means that the model is either too complex or not complex enough to even overfit on a small batch, let alone a normal-sized training set.** -⟶ +⟶ 과적합 작은 배치 -
**55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.** -⟶ +⟶ 그레디언트 확인 -
**56. [Type, Numerical gradient, Analytical gradient]** -⟶ +⟶[종류, ]
From cc6cc86fe05d12b87ed0f8400b54424c6da6e700 Mon Sep 17 00:00:00 2001 From: KwangHyeokAhn <44485235+kwanghyeokahn@users.noreply.github.com> Date: Mon, 7 Jan 2019 08:57:18 +0900 Subject: [PATCH 15/18] Update deep-learning-tips-and-tricks.md --- ko/deep-learning-tips-and-tricks.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ko/deep-learning-tips-and-tricks.md b/ko/deep-learning-tips-and-tricks.md index 0a7b34d08..27fa7c810 100644 --- a/ko/deep-learning-tips-and-tricks.md +++ b/ko/deep-learning-tips-and-tricks.md @@ -292,7 +292,7 @@ **42. [Momentum, Dampens oscillations, Improvement to SGD, 2 parameters to tune]** -⟶ [모멘텀, ] +⟶ [모멘텀, 감쇠진동, 확률적 경사하강법(SGD)을 개선, 튜닝할 2가지 파라미터]
From adfd1fc0de28283b4be6ebbd1490ec582efa66c8 Mon Sep 17 00:00:00 2001 From: KwangHyeokAhn <44485235+kwanghyeokahn@users.noreply.github.com> Date: Fri, 18 Jan 2019 11:47:31 +0900 Subject: [PATCH 16/18] Update deep-learning-tips-and-tricks.md --- ko/deep-learning-tips-and-tricks.md | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/ko/deep-learning-tips-and-tricks.md b/ko/deep-learning-tips-and-tricks.md index 27fa7c810..92d68f1aa 100644 --- a/ko/deep-learning-tips-and-tricks.md +++ b/ko/deep-learning-tips-and-tricks.md @@ -235,7 +235,7 @@ **34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:** -⟶ 전이학습 - 딥 러닝 학습 모델을 학습하기 위해서는 많은 양의 데이터가 필요로하며 무엇 보다도 많은 시간을 필요로 합니다. 학습을 위해 수일,수주의 시간이 걸린 거대한 데이터 셋의 사전 훈련 된 가중치를 활용하며 유스 케이스 활용에 유용합니다. 현재 보유하고있는 데이터의 양에 따라 여러가지 활용 방법이 있습니다. +⟶ 전이학습 - 딥 러닝 학습 모델을 학습하기 위해서는 많은 양의 데이터가 필요로하며 무엇 보다도 많은 시간을 필요로 합니다. 학습을 위해 수일,수주의 시간이 걸린 거대한 데이터 셋의 사전 훈련 된 가중치를 활용하며 케이스 활용에 유용합니다. 현재 보유하고있는 데이터의 양에 따라 여러가지 활용 방법이 있습니다.
@@ -299,21 +299,21 @@ **43. [RMSprop, Root Mean Square propagation, Speeds up learning algorithm by controlling oscillations]** -⟶ [RMSprop. ] +⟶ [RMSprop, 루트평균제곱전파, 진동제어를 통해 학습 알고리즘 속도를 향상]
**44. [Adam, Adaptive Moment estimation, Most popular method, 4 parameters to tune]** -⟶ +⟶ [아담, 적응 모멘트 추정, 가장 대중적인 방법, 4개의 파라미터를 조정]
**45. Remark: other methods include Adadelta, Adagrad and SGD.** -⟶ 비고 : 이외 방법으로 Adadelta, Adagrad 그리고 SGD가 포함됩니다. +⟶ 주석 : 이외 방법으로 Adadelta, Adagrad 그리고 SGD가 포함됩니다.
@@ -327,47 +327,47 @@ **47. Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training data by dropping out neurons with probability p>0. It forces the model to avoid relying too much on particular sets of features.** -⟶ 드랍아웃 - +⟶ 드랍아웃 - 드랍아웃은 확률 p>0인 뉴런을 제거하여 훈련 데이터의 과적합을 예방하기 위해 신경망에서 사용되는 기술입니다. 모델이 특정 변수 셋에 너무 의존하는것을 피하도록 해야합니다.
**48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.** -⟶ 비고 : +⟶ 주석 : 대부분의 딥러닝 프레임워크에서 1-p '유지'파라미터 변수 통해 드랍아웃의 매개변수화 합니다.
**49. Weight regularization ― In order to make sure that the weights are not too large and that the model is not overfitting the training set, regularization techniques are usually performed on the model weights. The main ones are summed up in the table below:** -⟶ 가중치 정규화 - +⟶ 가중치 정규화 - 가중치가 너무 크지 않고 모델이 트레이닝 셋에 과적합되지 않는것을 확인하기 위해서 정규화 기법이 일반적으로 가중치 모델에 사용됩니다. 주요 내용은 아래 표에 요약되어 있습니다.
**50. [LASSO, Ridge, Elastic Net]** -⟶[라쏘, ] +⟶[라쏘, 리지, 엘라스틱 넷]
**50 bis. Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]** -⟶ +⟶ 계수를 0으로 축소합니다, 변수 선택에 좋습니다, 계수를 더 작게 만듭니다, 변수 선택과 작은 계수간에 거래
**51. Early stopping ― This regularization technique stops the training process as soon as the validation loss reaches a plateau or starts to increase.** -⟶ +⟶ 조기정지 - 해당 정규화 기술은 검증손실이 안정기에 도달하거나 증가하기 시작하는 즉시 트레이닝 과정을 중지합니다.
**52. [Error, Validation, Training, early stopping, Epochs]** -⟶ [에러, ] +⟶ [에러, 검증, 트레이닝, 조기정지, 에포크]
@@ -381,7 +381,7 @@ **54. Overfitting small batch ― When debugging a model, it is often useful to make quick tests to see if there is any major issue with the architecture of the model itself. In particular, in order to make sure that the model can be properly trained, a mini-batch is passed inside the network to see if it can overfit on it. If it cannot, it means that the model is either too complex or not complex enough to even overfit on a small batch, let alone a normal-sized training set.** -⟶ 과적합 작은 배치 - +⟶ 과적합 작은 배치 - 모델 디버깅시, 종종 사용되는 방법으로 빠르게 테스트를 진행하여 모델 구조상 자체의 중대한 문제가 있는지 확인할 수 있습니다. 특히,
From 7e7d5e255d61425efa7875eff9630398e0cde81f Mon Sep 17 00:00:00 2001 From: KwangHyeokAhn <44485235+kwanghyeokahn@users.noreply.github.com> Date: Fri, 18 Jan 2019 13:56:37 +0900 Subject: [PATCH 17/18] Update deep-learning-tips-and-tricks.md --- ko/deep-learning-tips-and-tricks.md | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/ko/deep-learning-tips-and-tricks.md b/ko/deep-learning-tips-and-tricks.md index 92d68f1aa..0b3f300f4 100644 --- a/ko/deep-learning-tips-and-tricks.md +++ b/ko/deep-learning-tips-and-tricks.md @@ -354,7 +354,7 @@ **50 bis. Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]** -⟶ 계수를 0으로 축소합니다, 변수 선택에 좋습니다, 계수를 더 작게 만듭니다, 변수 선택과 작은 계수간에 거래 +⟶ 계수를 0으로 축소합니다, 변수 선택에 좋습니다, 계수를 더 작게 만듭니다, 변수 선택과 작은 계수간에 트레이드 오프
@@ -381,21 +381,20 @@ **54. Overfitting small batch ― When debugging a model, it is often useful to make quick tests to see if there is any major issue with the architecture of the model itself. In particular, in order to make sure that the model can be properly trained, a mini-batch is passed inside the network to see if it can overfit on it. If it cannot, it means that the model is either too complex or not complex enough to even overfit on a small batch, let alone a normal-sized training set.** -⟶ 과적합 작은 배치 - 모델 디버깅시, 종종 사용되는 방법으로 빠르게 테스트를 진행하여 모델 구조상 자체의 중대한 문제가 있는지 확인할 수 있습니다. 특히, +⟶ 과적합 작은 배치 - 모델 디버깅시, 종종 사용되는 방법으로 빠르게 테스트를 진행하여 모델 구조상 자체의 중대한 문제가 있는지 확인할 수 있습니다. 특히, 모델이 적절히 학습할 수 있는지 확인하기 위해 네트워크 내부에 미니배치가 전달되어 오버핏이 되는지 확입니다. 만약 그럿지 못하다면, 모델이 너무 복잡하거나 미니배치에 충분히 오버핏될 복잡도가 떨어지는것을 의미합니다, 일반 크기의 트레이닝 셋 또한 동일하다고 볼 수 있습니다.
**55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.** -⟶ 그레디언트 확인 - - +⟶ 그레디언트 확인 - 그레디언트 확인은 신경망의 역방향 시행중에 사용되는 방법입니다. 주어진 점에서 분석 그레디언트의 값을 수치 그레디언트와 비교하여 정확성에 대한 민감정도 검사기 역할을 수행합니다.
**56. [Type, Numerical gradient, Analytical gradient]** -⟶[종류, ] +⟶[종류, 수치기울기, 분석 그레디언트]
@@ -409,14 +408,14 @@ **58. [Expensive; loss has to be computed two times per dimension, Used to verify correctness of analytical implementation, Trade-off in choosing h not too small (numerical instability) nor too large (poor gradient approximation)]** -⟶ +⟶[비용; 차원당 두번씩 손실값을 계산해야 합니다, 분석 구현의 정확성을 확인하는데 사용됩니다, h 선택에 있어 너무 작지 않으며(수치적 불안정성), 너무 크지않은(약한 그레디언트)상에서 트레이드 오프가 필요로합니다.]
**59. ['Exact' result, Direct computation, Used in the final implementation]** -⟶ +⟶['정확한' 결과, 직접계산, 최종 수행단계에서 사용]
@@ -434,13 +433,13 @@ **62.Translated by X, Y and Z** -⟶ +⟶ X,Y 그리고 Z로 번역됩니다.
**63.Reviewed by X, Y and Z** -⟶ +⟶ X,Y 그리고 Z에 의해 검토됩니다.
@@ -452,6 +451,6 @@ **65.By X and Y** -⟶ +⟶ X와 Y로
From ac35fced5e9037b47c63df0d91d5654dbdb512b4 Mon Sep 17 00:00:00 2001 From: KwangHyeokAhn <44485235+kwanghyeokahn@users.noreply.github.com> Date: Fri, 18 Jan 2019 14:01:48 +0900 Subject: [PATCH 18/18] Update deep-learning-tips-and-tricks.md --- ko/deep-learning-tips-and-tricks.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ko/deep-learning-tips-and-tricks.md b/ko/deep-learning-tips-and-tricks.md index 0b3f300f4..5e3cf07a0 100644 --- a/ko/deep-learning-tips-and-tricks.md +++ b/ko/deep-learning-tips-and-tricks.md @@ -109,7 +109,7 @@ **16. Remark: data is usually augmented on the fly during training.** -⟶ 비고 : 데이터는 일반적으로 학습중에 증가 됩니다. +⟶ 주석: 데이터는 일반적으로 학습중에 증가 됩니다.