From ec1dfc0deeea35ce3fab1d2d0cf735c8fd5e6065 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Fri, 14 Sep 2018 14:08:41 +0200
Subject: [PATCH 01/42] Polish language translation templates

---
 README.md                                     |  16 +-
 pl/cheatsheet-deep-learning.md                | 291 ++++++++++
 ...tsheet-machine-learning-tips-and-tricks.md | 257 +++++++++
 pl/cheatsheet-supervised-learning.md          | 519 ++++++++++++++++++
 pl/cheatsheet-unsupervised-learning.md        | 299 ++++++++++
 pl/refresher-linear-algebra.md                | 315 +++++++++++
 pl/refresher-probability.md                   | 347 ++++++++++++
 7 files changed, 2036 insertions(+), 8 deletions(-)
 create mode 100644 pl/cheatsheet-deep-learning.md
 create mode 100644 pl/cheatsheet-machine-learning-tips-and-tricks.md
 create mode 100644 pl/cheatsheet-supervised-learning.md
 create mode 100644 pl/cheatsheet-unsupervised-learning.md
 create mode 100644 pl/refresher-linear-algebra.md
 create mode 100644 pl/refresher-probability.md
diff --git a/README.md b/README.md
index 3e54ce51e..a36ab6c20 100644
--- a/README.md
+++ b/README.md
@@ -12,14 +12,14 @@ This repository aims at collaboratively translating our [Machine Learning cheats
 |Probabilities and Statistics|0%|0%|0%|0%|0%|0%|0%|
 |Linear algebra|0%|0%|0%|**100%**|0%|0%|0%|
 
-|Cheatsheet topic|العَرَبِيَّة|עִבְרִית|[हिन्दी](https://github.com/shervinea/cheatsheet-translation/tree/master/hi)|[ಕನ್ನಡ](https://github.com/shervinea/cheatsheet-translation/tree/master/kn)|[मराठी](https://github.com/shervinea/cheatsheet-translation/tree/master/mr)|[తెలుగు](https://github.com/shervinea/cheatsheet-translation/tree/master/te)|[Türkçe](https://github.com/shervinea/cheatsheet-translation/tree/master/tr)|[Русский](https://github.com/shervinea/cheatsheet-translation/tree/master/ru)
-|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
-|Deep learning|0%|0%|0%|0%|0%|0%|0%|0%|
-|Supervised learning|0%|0%|0%|0%|0%|0%|0%|0%|
-|Unsupervised learning|0%|0%|0%|0%|0%|0%|0%|0%|
-|ML tips and tricks|0%|0%|0%|0%|0%|0%|0%|0%|
-|Probabilities and Statistics|0%|0%|0%|0%|0%|0%|0%|0%|
-|Linear algebra|0%|0%|0%|0%|0%|0%|0%|0%|
+|Cheatsheet topic|العَرَبِيَّة|עִבְרִית|[हिन्दी](https://github.com/shervinea/cheatsheet-translation/tree/master/hi)|[ಕನ್ನಡ](https://github.com/shervinea/cheatsheet-translation/tree/master/kn)|[मराठी](https://github.com/shervinea/cheatsheet-translation/tree/master/mr)|[తెలుగు](https://github.com/shervinea/cheatsheet-translation/tree/master/te)|[Türkçe](https://github.com/shervinea/cheatsheet-translation/tree/master/tr)|[Русский](https://github.com/shervinea/cheatsheet-translation/tree/master/ru) | [Polski](https://github.com/shervinea/cheatsheet-translation/tree/master/pl)|
+|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
+|Deep learning|0%|0%|0%|0%|0%|0%|0%|0%|0%|
+|Supervised learning|0%|0%|0%|0%|0%|0%|0%|0%|0%|
+|Unsupervised learning|0%|0%|0%|0%|0%|0%|0%|0%|0%|
+|ML tips and tricks|0%|0%|0%|0%|0%|0%|0%|0%|0%|
+|Probabilities and Statistics|0%|0%|0%|0%|0%|0%|0%|0%|0%|
+|Linear algebra|0%|0%|0%|0%|0%|0%|0%|0%|0%|
 
 If your favorite language is missing, please feel free to add it!
 
diff --git a/pl/cheatsheet-deep-learning.md b/pl/cheatsheet-deep-learning.md
new file mode 100644
index 000000000..642a4cd01
--- /dev/null
+++ b/pl/cheatsheet-deep-learning.md
@@ -0,0 +1,291 @@
+**1. Deep Learning cheatsheet**
+
+&#10230;
+
+<br>
+
+**2. Neural Networks**
+
+&#10230;
+
+<br>
+
+**3. Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.**
+
+&#10230;
+
+<br>
+
+**4. Architecture ― The vocabulary around neural networks architectures is described in the figure below:**
+
+&#10230;
+
+<br>
+
+**5. [Input layer, hidden layer, output layer]**
+
+&#10230;
+
+<br>
+
+**6. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
+
+&#10230;
+
+<br>
+
+**7. where we note w, b, z the weight, bias and output respectively.**
+
+&#10230;
+
+<br>
+
+**8. Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:**
+
+&#10230;
+
+<br>
+
+**9. [Sigmoid, Tanh, ReLU, Leaky ReLU]**
+
+&#10230;
+
+<br>
+
+**10. Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
+
+&#10230;
+
+<br>
+
+**11. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
+
+&#10230;
+
+<br>
+
+**12. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:**
+
+&#10230;
+
+<br>
+
+**13. As a result, the weight is updated as follows:**
+
+&#10230;
+
+<br>
+
+**14. Updating weights ― In a neural network, weights are updated as follows:**
+
+&#10230;
+
+<br>
+
+**15. Step 1: Take a batch of training data.**
+
+&#10230;
+
+<br>
+
+**16. Step 2: Perform forward propagation to obtain the corresponding loss.**
+
+&#10230;
+
+<br>
+
+**17. Step 3: Backpropagate the loss to get the gradients.**
+
+&#10230;
+
+<br>
+
+**18. Step 4: Use the gradients to update the weights of the network.**
+
+&#10230;
+
+<br>
+
+**19. Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p**
+
+&#10230;
+
+<br>
+
+**20. Convolutional Neural Networks**
+
+&#10230;
+
+<br>
+
+**21. Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:**
+
+&#10230;
+
+<br>
+
+**22. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
+
+&#10230;
+
+<br>
+
+**23. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
+
+&#10230;
+
+<br>
+
+**24. Recurrent Neural Networks**
+
+&#10230;
+
+<br>
+
+**25. Types of gates ― Here are the different types of gates that we encounter in a typical recurrent neural network:**
+
+&#10230;
+
+<br>
+
+**26. [Input gate, forget gate, gate, output gate]**
+
+&#10230;
+
+<br>
+
+**27. [Write to cell or not?, Erase a cell or not?, How much to write to cell?, How much to reveal cell?]**
+
+&#10230;
+
+<br>
+
+**28. LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.**
+
+&#10230;
+
+<br>
+
+**29. Reinforcement Learning and Control**
+
+&#10230;
+
+<br>
+
+**30. The goal of reinforcement learning is for an agent to learn how to evolve in an environment.**
+
+&#10230;
+
+<br>
+
+**31. Definitions**
+
+&#10230;
+
+<br>
+
+**32. Markov decision processes ― A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R) where:**
+
+&#10230;
+
+<br>
+
+**33. S is the set of states**
+
+&#10230;
+
+<br>
+
+**34. A is the set of actions**
+
+&#10230;
+
+<br>
+
+**35. {Psa} are the state transition probabilities for s∈S and a∈A**
+
+&#10230;
+
+<br>
+
+**36. γ∈[0,1[ is the discount factor**
+
+&#10230;
+
+<br>
+
+**37. R:S×A⟶R or R:S⟶R is the reward function that the algorithm wants to maximize**
+
+&#10230;
+
+<br>
+
+**38. Policy ― A policy π is a function π:S⟶A that maps states to actions.**
+
+&#10230;
+
+<br>
+
+**39. Remark: we say that we execute a given policy π if given a state a we take the action a=π(s).**
+
+&#10230;
+
+<br>
+
+**40. Value function ― For a given policy π and a given state s, we define the value function Vπ as follows:**
+
+&#10230;
+
+<br>
+
+**41. Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:**
+
+&#10230;
+
+<br>
+
+**42. Remark: we note that the optimal policy π∗ for a given state s is such that:**
+
+&#10230;
+
+<br>
+
+**43. Value iteration algorithm ― The value iteration algorithm is in two steps:**
+
+&#10230;
+
+<br>
+
+**44. 1) We initialize the value:**
+
+&#10230;
+
+<br>
+
+**45. 2) We iterate the value based on the values before:**
+
+&#10230;
+
+<br>
+
+**46. Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:**
+
+&#10230;
+
+<br>
+
+**47. times took action a in state s and got to s′**
+
+&#10230;
+
+<br>
+
+**48. times took action a in state s**
+
+&#10230;
+
+<br>
+
+**49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:**
+
+&#10230;
diff --git a/pl/cheatsheet-machine-learning-tips-and-tricks.md b/pl/cheatsheet-machine-learning-tips-and-tricks.md
new file mode 100644
index 000000000..5dd821561
--- /dev/null
+++ b/pl/cheatsheet-machine-learning-tips-and-tricks.md
@@ -0,0 +1,257 @@
+**1. Machine Learning tips and tricks cheatsheet**
+
+&#10230;
+
+<br>
+
+**2. Classification metrics**
+
+&#10230;
+
+<br>
+
+**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
+
+&#10230;
+
+<br>
+
+**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+**5. [Predicted class, Actual class]**
+
+&#10230;
+
+<br>
+
+**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
+
+&#10230;
+
+<br>
+
+**7. [Metric, Formula, Interpretation]**
+
+&#10230;
+
+<br>
+
+**8. Overall performance of model**
+
+&#10230;
+
+<br>
+
+**9. How accurate the positive predictions are**
+
+&#10230;
+
+<br>
+
+**10. Coverage of actual positive sample**
+
+&#10230;
+
+<br>
+
+**11. Coverage of actual negative sample**
+
+&#10230;
+
+<br>
+
+**12. Hybrid metric useful for unbalanced classes**
+
+&#10230;
+
+<br>
+
+**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**14. [Metric, Formula, Equivalent]**
+
+&#10230;
+
+<br>
+
+**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
+
+&#10230;
+
+<br>
+
+**16. [Actual, Predicted]**
+
+&#10230;
+
+<br>
+
+**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
+
+&#10230;
+
+<br>
+
+**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
+
+&#10230;
+
+<br>
+
+**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
+
+&#10230;
+
+<br>
+
+**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
+
+&#10230;
+
+<br>
+
+**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
+
+&#10230;
+
+<br>
+
+**22. Model selection**
+
+&#10230;
+
+<br>
+
+**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
+
+&#10230;
+
+<br>
+
+**24. [Training set, Validation set, Testing set]**
+
+&#10230;
+
+<br>
+
+**25. [Model is trained, Model is assessed, Model gives predictions]**
+
+&#10230;
+
+<br>
+
+**26. [Usually 80% of the dataset, Usually 20% of the dataset]**
+
+&#10230;
+
+<br>
+
+**27. [Also called hold-out or development set, Unseen data]**
+
+&#10230;
+
+<br>
+
+**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
+
+&#10230;
+
+<br>
+
+**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
+
+&#10230;
+
+<br>
+
+**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
+
+&#10230;
+
+<br>
+
+**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
+
+&#10230;
+
+<br>
+
+**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
+
+&#10230;
+
+<br>
+
+**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230;
+
+<br>
+
+**35. Diagnostics**
+
+&#10230;
+
+<br>
+
+**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
+
+&#10230;
+
+<br>
+
+**37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
+
+&#10230;
+
+<br>
+
+**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
+
+&#10230;
+
+<br>
+
+**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
+
+&#10230;
+
+<br>
+
+**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
+
+&#10230;
+
+<br>
+
+**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
+
+&#10230;
+
+<br>
+
+**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
+
+&#10230;
+
+<br>
+
+**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
+
+&#10230;
+
+<br>
diff --git a/pl/cheatsheet-supervised-learning.md b/pl/cheatsheet-supervised-learning.md
new file mode 100644
index 000000000..3aa1f452d
--- /dev/null
+++ b/pl/cheatsheet-supervised-learning.md
@@ -0,0 +1,519 @@
+**1. Supervised Learning cheatsheet**
+
+&#10230;
+
+<br>
+
+**2. Introduction to Supervised Learning**
+
+&#10230;
+
+<br>
+
+**3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
+
+&#10230;
+
+<br>
+
+**4. Type of prediction ― The different types of predictive models are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**5. [Regression, Classifier, Outcome, Examples]**
+
+&#10230;
+
+<br>
+
+**6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
+
+&#10230;
+
+<br>
+
+**7. Type of model ― The different models are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
+
+&#10230;
+
+<br>
+
+**9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
+
+&#10230;
+
+<br>
+
+**10. Notations and general concepts**
+
+&#10230;
+
+<br>
+
+**11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
+
+&#10230;
+
+<br>
+
+**12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
+
+&#10230;
+
+<br>
+
+**14. [Linear regression, Logistic regression, SVM, Neural Network]**
+
+&#10230;
+
+<br>
+
+**15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
+
+&#10230;
+
+<br>
+
+**16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
+
+&#10230;
+
+<br>
+
+**17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
+
+&#10230;
+
+<br>
+
+**18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
+
+&#10230;
+
+<br>
+
+**19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
+
+&#10230;
+
+<br>
+
+**20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
+
+&#10230;
+
+<br>
+
+**21. Linear models**
+
+&#10230;
+
+<br>
+
+**22. Linear regression**
+
+&#10230;
+
+<br>
+
+**23. We assume here that y|x;θ∼N(μ,σ2)**
+
+&#10230;
+
+<br>
+
+**24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
+
+&#10230;
+
+<br>
+
+**25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
+
+&#10230;
+
+<br>
+
+**26. Remark: the update rule is a particular case of the gradient ascent.**
+
+&#10230;
+
+<br>
+
+**27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
+
+&#10230;
+
+<br>
+
+**28. Classification and logistic regression**
+
+&#10230;
+
+<br>
+
+**29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
+
+&#10230;
+
+<br>
+
+**30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
+
+&#10230;
+
+<br>
+
+**31. Remark: there is no closed form solution for the case of logistic regressions.**
+
+&#10230;
+
+<br>
+
+**32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
+
+&#10230;
+
+<br>
+
+**33. Generalized Linear Models**
+
+&#10230;
+
+<br>
+
+**34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
+
+&#10230;
+
+<br>
+
+**35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
+
+&#10230;
+
+<br>
+
+**36. Here are the most common exponential distributions summed up in the following table:**
+
+&#10230;
+
+<br>
+
+**37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
+
+&#10230;
+
+<br>
+
+**38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
+
+&#10230;
+
+<br>
+
+**39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
+
+&#10230;
+
+<br>
+
+**40. Support Vector Machines**
+
+&#10230;
+
+<br>
+
+**41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
+
+&#10230;
+
+<br>
+
+**42: Optimal margin classifier ― The optimal margin classifier h is such that:**
+
+&#10230;
+
+<br>
+
+**43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
+
+&#10230;
+
+<br>
+
+**44. such that**
+
+&#10230;
+
+<br>
+
+**45. support vectors**
+
+&#10230;
+
+<br>
+
+**46. Remark: the line is defined as wTx−b=0.**
+
+&#10230;
+
+<br>
+
+**47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
+
+&#10230;
+
+<br>
+
+**48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
+
+&#10230;
+
+<br>
+
+**49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
+
+&#10230;
+
+<br>
+
+**50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
+
+&#10230;
+
+<br>
+
+**51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
+
+&#10230;
+
+<br>
+
+**52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
+
+&#10230;
+
+<br>
+
+**53. Remark: the coefficients βi are called the Lagrange multipliers.**
+
+&#10230;
+
+<br>
+
+**54. Generative Learning**
+
+&#10230;
+
+<br>
+
+**55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
+
+&#10230;
+
+<br>
+
+**56. Gaussian Discriminant Analysis**
+
+&#10230;
+
+<br>
+
+**57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
+
+&#10230;
+
+<br>
+
+**58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
+
+&#10230;
+
+<br>
+
+**59. Naive Bayes**
+
+&#10230;
+
+<br>
+
+**60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
+
+&#10230;
+
+<br>
+
+**61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
+
+&#10230;
+
+<br>
+
+**62. Remark: Naive Bayes is widely used for text classification and spam detection.**
+
+&#10230;
+
+<br>
+
+**63. Tree-based and ensemble methods**
+
+&#10230;
+
+<br>
+
+**64. These methods can be used for both regression and classification problems.**
+
+&#10230;
+
+<br>
+
+**65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
+
+&#10230;
+
+<br>
+
+**66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
+
+&#10230;
+
+<br>
+
+**67. Remark: random forests are a type of ensemble methods.**
+
+&#10230;
+
+<br>
+
+**68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**69. [Adaptive boosting, Gradient boosting]**
+
+&#10230;
+
+<br>
+
+**70. High weights are put on errors to improve at the next boosting step**
+
+&#10230;
+
+<br>
+
+**71. Weak learners trained on remaining errors**
+
+&#10230;
+
+<br>
+
+**72. Other non-parametric approaches**
+
+&#10230;
+
+<br>
+
+**73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
+
+&#10230;
+
+<br>
+
+**74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
+
+&#10230;
+
+<br>
+
+**75. Learning Theory**
+
+&#10230;
+
+<br>
+
+**76. Union bound ― Let A1,...,Ak be k events. We have:**
+
+&#10230;
+
+<br>
+
+**77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
+
+&#10230;
+
+<br>
+
+**78. Remark: this inequality is also known as the Chernoff bound.**
+
+&#10230;
+
+<br>
+
+**79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
+
+&#10230;
+
+<br>
+
+**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
+
+&#10230;
+
+<br>
+
+**81: the training and testing sets follow the same distribution **
+
+&#10230;
+
+<br>
+
+**82. the training examples are drawn independently**
+
+&#10230;
+
+<br>
+
+**83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
+
+&#10230;
+
+<br>
+
+**84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
+
+&#10230;
+
+<br>
+
+**85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
+
+&#10230;
+
+<br>
+
+**86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
+
+&#10230;
+
+<br>
+
+**87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
+
+&#10230;
diff --git a/pl/cheatsheet-unsupervised-learning.md b/pl/cheatsheet-unsupervised-learning.md
new file mode 100644
index 000000000..5826ff44b
--- /dev/null
+++ b/pl/cheatsheet-unsupervised-learning.md
@@ -0,0 +1,299 @@
+**1. Unsupervised Learning cheatsheet**
+
+&#10230;
+
+<br>
+
+**2. Introduction to Unsupervised Learning**
+
+&#10230;
+
+<br>
+
+**3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
+
+&#10230;
+
+<br>
+
+**4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
+
+&#10230;
+
+<br>
+
+**5. Clustering**
+
+&#10230;
+
+<br>
+
+**6. Expectation-Maximization**
+
+&#10230;
+
+<br>
+
+**7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
+
+&#10230;
+
+<br>
+
+**8. [Setting, Latent variable z, Comments]**
+
+&#10230;
+
+<br>
+
+**9. [Mixture of k Gaussians, Factor analysis]**
+
+&#10230;
+
+<br>
+
+**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
+
+&#10230;
+
+<br>
+
+**11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
+
+&#10230;
+
+<br>
+
+**12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
+
+&#10230;
+
+<br>
+
+**13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
+
+&#10230;
+
+<br>
+
+**14. k-means clustering**
+
+&#10230;
+
+<br>
+
+**15. We note c(i) the cluster of data point i and μj the center of cluster j.**
+
+&#10230;
+
+<br>
+
+**16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
+
+&#10230;
+
+<br>
+
+**17. [Means initialization, Cluster assignment, Means update, Convergence]**
+
+&#10230;
+
+<br>
+
+**18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
+
+&#10230;
+
+<br>
+
+**19. Hierarchical clustering**
+
+&#10230;
+
+<br>
+
+**20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
+
+&#10230;
+
+<br>
+
+**21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**22. [Ward linkage, Average linkage, Complete linkage]**
+
+&#10230;
+
+<br>
+
+**23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
+
+&#10230;
+
+<br>
+
+**24. Clustering assessment metrics**
+
+&#10230;
+
+<br>
+
+**25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
+
+&#10230;
+
+<br>
+
+**26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
+
+&#10230;
+
+<br>
+
+**27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
+
+&#10230;
+
+<br>
+
+**28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+**29. Dimension reduction**
+
+&#10230;
+
+<br>
+
+**30. Principal component analysis**
+
+&#10230;
+
+<br>
+
+**31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
+
+&#10230;
+
+<br>
+
+**32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+&#10230;
+
+<br>
+
+**33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+&#10230;
+
+<br>
+
+**34. diagonal**
+
+&#10230;
+
+<br>
+
+**35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
+
+&#10230;
+
+<br>
+
+**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
+dimensions by maximizing the variance of the data as follows:**
+
+&#10230;
+
+<br>
+
+**37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
+
+&#10230;
+
+<br>
+
+**38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
+
+&#10230;
+
+<br>
+
+**39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
+
+&#10230;
+
+<br>
+
+**40. Step 4: Project the data on spanR(u1,...,uk).**
+
+&#10230;
+
+<br>
+
+**41. This procedure maximizes the variance among all k-dimensional spaces.**
+
+&#10230;
+
+<br>
+
+**42. [Data in feature space, Find principal components, Data in principal components space]**
+
+&#10230;
+
+<br>
+
+**43. Independent component analysis**
+
+&#10230;
+
+<br>
+
+**44. It is a technique meant to find the underlying generating sources.**
+
+&#10230;
+
+<br>
+
+**45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
+
+&#10230;
+
+<br>
+
+**46. The goal is to find the unmixing matrix W=A−1.**
+
+&#10230;
+
+<br>
+
+**47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
+
+&#10230;
+
+<br>
+
+**48. Write the probability of x=As=W−1s as:**
+
+&#10230;
+
+<br>
+
+**49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
+
+&#10230;
+
+<br>
+
+**50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
+
+&#10230;
+
diff --git a/pl/refresher-linear-algebra.md b/pl/refresher-linear-algebra.md
new file mode 100644
index 000000000..a824025f7
--- /dev/null
+++ b/pl/refresher-linear-algebra.md
@@ -0,0 +1,315 @@
+**1. Linear Algebra and Calculus refresher**
+
+&#10230;
+
+<br>
+
+**2. General notations**
+
+&#10230;
+
+<br>
+
+**3. Definitions**
+
+&#10230;
+
+<br>
+
+**4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
+
+&#10230;
+
+<br>
+
+**5. Matrix ― We note A∈Rm×n a matrix with n rows and m, where Ai,j∈R is the entry located in the ith row and jth column:**
+
+&#10230;
+
+<br>
+
+**6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
+
+&#10230;
+
+<br>
+
+**7. Main matrices**
+
+&#10230;
+
+<br>
+
+**8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
+
+&#10230;
+
+<br>
+
+**9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
+
+&#10230;
+
+<br>
+
+**10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
+
+&#10230;
+
+<br>
+
+**11. Remark: we also note D as diag(d1,...,dn).**
+
+&#10230;
+
+<br>
+
+**12. Matrix operations**
+
+&#10230;
+
+<br>
+
+**13. Multiplication**
+
+&#10230;
+
+<br>
+
+**14. Vector-vector ― There are two types of vector-vector products:**
+
+&#10230;
+
+<br>
+
+**15. inner product: for x,y∈Rn, we have:**
+
+&#10230;
+
+<br>
+
+**16. outer product: for x∈Rm,y∈Rn, we have:**
+
+&#10230;
+
+<br>
+
+**17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
+
+&#10230;
+
+<br>
+
+**18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
+
+&#10230;
+
+<br>
+
+**19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
+
+&#10230;
+
+<br>
+
+**20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
+
+&#10230;
+
+<br>
+
+**21. Other operations**
+
+&#10230;
+
+<br>
+
+**22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
+
+&#10230;
+
+<br>
+
+**23. Remark: for matrices A,B, we have (AB)T=BTAT**
+
+&#10230;
+
+<br>
+
+**24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
+
+&#10230;
+
+<br>
+
+**25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
+
+&#10230;
+
+<br>
+
+**26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
+
+&#10230;
+
+<br>
+
+**27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
+
+&#10230;
+
+<br>
+
+**28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
+
+&#10230;
+
+<br>
+
+**29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
+
+&#10230;
+
+<br>
+
+**30. Matrix properties**
+
+&#10230;
+
+<br>
+
+**31. Definitions**
+
+&#10230;
+
+<br>
+
+**32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
+
+&#10230;
+
+<br>
+
+**33. [Symmetric, Antisymmetric]**
+
+&#10230;
+
+<br>
+
+**34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
+
+&#10230;
+
+<br>
+
+**35. N(ax)=|a|N(x) for a scalar**
+
+&#10230;
+
+<br>
+
+**36. if N(x)=0, then x=0**
+
+&#10230;
+
+<br>
+
+**37. For x∈V, the most commonly used norms are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**38. [Norm, Notation, Definition, Use case]**
+
+&#10230;
+
+<br>
+
+**39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
+
+&#10230;
+
+<br>
+
+**40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
+
+&#10230;
+
+<br>
+
+**41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
+
+&#10230;
+
+<br>
+
+**42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
+
+&#10230;
+
+<br>
+
+**43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
+
+&#10230;
+
+<br>
+
+**44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+&#10230;
+
+<br>
+
+**45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+&#10230;
+
+<br>
+
+**46. diagonal**
+
+&#10230;
+
+<br>
+
+**47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
+
+&#10230;
+
+<br>
+
+**48. Matrix calculus**
+
+&#10230;
+
+<br>
+
+**49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
+
+&#10230;
+
+<br>
+
+**50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
+
+&#10230;
+
+<br>
+
+**51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
+
+&#10230;
+
+<br>
+
+**52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
+
+&#10230;
+
+<br>
+
+**53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
+
+&#10230;
diff --git a/pl/refresher-probability.md b/pl/refresher-probability.md
new file mode 100644
index 000000000..db03157d5
--- /dev/null
+++ b/pl/refresher-probability.md
@@ -0,0 +1,347 @@
+**1. Probabilities and Statistics refresher**
+
+&#10230;
+
+<br>
+
+**2. Introduction to Probability and Combinatorics**
+
+&#10230;
+
+<br>
+
+**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
+
+&#10230;
+
+<br>
+
+**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
+
+&#10230;
+
+<br>
+
+**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
+
+&#10230;
+
+<br>
+
+**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
+
+&#10230;
+
+<br>
+
+**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
+
+&#10230;
+
+<br>
+
+**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
+
+&#10230;
+
+<br>
+
+**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
+
+&#10230;
+
+<br>
+
+**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
+
+&#10230;
+
+<br>
+
+**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
+
+&#10230;
+
+<br>
+
+**12. Conditional Probability**
+
+&#10230;
+
+<br>
+
+**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
+
+&#10230;
+
+<br>
+
+**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
+
+&#10230;
+
+<br>
+
+**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
+
+&#10230;
+
+<br>
+
+**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
+
+&#10230;
+
+<br>
+
+**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
+
+&#10230;
+
+<br>
+
+**18. Independence ― Two events A and B are independent if and only if we have:**
+
+&#10230;
+
+<br>
+
+**19. Random Variables**
+
+&#10230;
+
+<br>
+
+**20. Definitions**
+
+&#10230;
+
+<br>
+
+**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
+
+&#10230;
+
+<br>
+
+**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
+
+&#10230;
+
+<br>
+
+**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
+
+&#10230;
+
+<br>
+
+**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
+
+&#10230;
+
+<br>
+
+**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
+
+&#10230;
+
+<br>
+
+**26. [Case, CDF F, PDF f, Properties of PDF]**
+
+&#10230;
+
+<br>
+
+**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
+
+&#10230;
+
+<br>
+
+**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
+
+&#10230;
+
+<br>
+
+**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
+
+&#10230;
+
+<br>
+
+**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
+
+&#10230;
+
+<br>
+
+**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
+
+&#10230;
+
+<br>
+
+**32. Probability Distributions**
+
+&#10230;
+
+<br>
+
+**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
+
+&#10230;
+
+<br>
+
+**34. Main distributions ― Here are the main distributions to have in mind:**
+
+&#10230;
+
+<br>
+
+**35. [Type, Distribution]**
+
+&#10230;
+
+<br>
+
+**36. Jointly Distributed Random Variables**
+
+&#10230;
+
+<br>
+
+**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
+
+&#10230;
+
+<br>
+
+**38. [Case, Marginal density, Cumulative function]**
+
+&#10230;
+
+<br>
+
+**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
+
+&#10230;
+
+<br>
+
+**40. Independence ― Two random variables X and Y are said to be independent if we have:**
+
+&#10230;
+
+<br>
+
+**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
+
+&#10230;
+
+<br>
+
+**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
+
+&#10230;
+
+<br>
+
+**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
+
+&#10230;
+
+<br>
+
+**44. Remark 2: If X and Y are independent, then ρXY=0.**
+
+&#10230;
+
+<br>
+
+**45. Parameter estimation**
+
+&#10230;
+
+<br>
+
+**46. Definitions**
+
+&#10230;
+
+<br>
+
+**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
+
+&#10230;
+
+<br>
+
+**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
+
+&#10230;
+
+<br>
+
+**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
+
+&#10230;
+
+<br>
+
+**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
+
+&#10230;
+
+<br>
+
+**51. Estimating the mean**
+
+&#10230;
+
+<br>
+
+**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
+
+&#10230;
+
+<br>
+
+**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
+
+&#10230;
+
+<br>
+
+**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
+
+&#10230;
+
+<br>
+
+**55. Estimating the variance**
+
+&#10230;
+
+<br>
+
+**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
+
+&#10230;
+
+<br>
+
+**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
+
+&#10230;
+
+<br>
+
+**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
+
+&#10230;
+
+<br>

From e6a3171119dca2bf1479eaa1bf3077a22550270a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Fri, 14 Sep 2018 15:56:27 +0200
Subject: [PATCH 02/42] Polish language translation - ML

---
 ...tsheet-machine-learning-tips-and-tricks.md | 86 +++++++++----------
 1 file changed, 43 insertions(+), 43 deletions(-)

diff --git a/pl/cheatsheet-machine-learning-tips-and-tricks.md b/pl/cheatsheet-machine-learning-tips-and-tricks.md
index 5dd821561..69bfe338d 100644
--- a/pl/cheatsheet-machine-learning-tips-and-tricks.md
+++ b/pl/cheatsheet-machine-learning-tips-and-tricks.md
@@ -1,257 +1,257 @@
 **1. Machine Learning tips and tricks cheatsheet**
 
-&#10230;
+&#10230; Uczenie maszynowe - ściąga z poradami
 
 <br>
 
 **2. Classification metrics**
 
-&#10230;
+&#10230; Miary efektywności klasyfikatorów
 
 <br>
 
 **3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
 
-&#10230;
+&#10230; W przypadku klasyfikacji binarnej, następujące miary są użyteczne do ustalenia efektywności modelu.
 
 <br>
 
 **4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
 
-&#10230;
+&#10230; Macierz pomyłek - Macierz pomyłek jest wykorzystywana w celu przedstawienia bardziej całościowego obrazu efektywności modelu. Definiuje się ją w następujący sposób:
 
 <br>
 
 **5. [Predicted class, Actual class]**
 
-&#10230;
+&#10230; [Klasa predykowana, Klasa rzeczywista]
 
 <br>
 
 **6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
 
-&#10230;
+&#10230; Główne miary - Następujące miary często wykorzystywane są do ustalenia efektywności modelu:
 
 <br>
 
 **7. [Metric, Formula, Interpretation]**
 
-&#10230;
+&#10230; [Miara, Wzór, Interpretacja]
 
 <br>
 
 **8. Overall performance of model**
 
-&#10230;
+&#10230; Dokładność - całościowa efektywność modelu
 
 <br>
 
 **9. How accurate the positive predictions are**
 
-&#10230;
+&#10230; Precyzja - jak dokładne są predykcje pozytywne
 
 <br>
 
 **10. Coverage of actual positive sample**
 
-&#10230;
+&#10230; Czułość - stosunek wyników prawdziwie dodatnich do sumy prawdziwie dodatnich i fałszywie ujemnych
 
 <br>
 
 **11. Coverage of actual negative sample**
 
-&#10230;
+&#10230; Swoistość - stosunek wyników prawdziwie ujemnych do sumy prawdziwie ujemnych i fałszywie dodatnich
 
 <br>
 
 **12. Hybrid metric useful for unbalanced classes**
 
-&#10230;
+&#10230; Hybrydowa miara, przydatna przy niezbalansowanych klasach
 
 <br>
 
 **13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
 
-&#10230;
+&#10230; ROC - jest to wykres TPR do FPR przy zmiennym progu. Podsumowanie tych miar znajduje się w tabeli poniżej:
 
 <br>
 
 **14. [Metric, Formula, Equivalent]**
 
-&#10230;
+&#10230; [Miara, Wzór, Odpowiednik]
 
 <br>
 
 **15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
 
-&#10230;
+&#10230; AUC - Powierzchna pola pod ROC, zwane także AUC lub AUROC, jest to powierzchnia pola pod wykresem ROC, jak to pokazano na wykresie obok:
 
 <br>
 
 **16. [Actual, Predicted]**
 
-&#10230;
+&#10230; Rzeczywiste, Predykowane
 
 <br>
 
 **17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
 
-&#10230;
+&#10230; Miary podstawowe - Mając model regresyjny f, następujące miary są często używane do sprawdzenia efektywności modelu:
 
 <br>
 
 **18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
 
-&#10230;
+&#10230; [Całkowita suma kwadratów, Wyjaśniona suma kwadratów, Pozostała suma kwadratów]
 
 <br>
 
 **19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
 
-&#10230;
+&#10230; Współczynnik determinacji - często zapisywany jako R2 lub r2, jest miarą tego, jak dobrze zaobserwowane wyniki są replikowane przez model. Definiuje się go następująco:
 
 <br>
 
 **20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
 
-&#10230;
+&#10230; Główne miary - Następujące miary często wykorzystywane są do ustalenia efektywności modelu regresyjnego. Opierają się one na ilości zmiennych n, które model wykorzystuje:
 
 <br>
 
 **21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
 
-&#10230;
+&#10230; gdzie L jest prawdopodobieństwem i ˆσ2 jest estymatą wariancji związanej z każdą odpowiedzią.
 
 <br>
 
 **22. Model selection**
 
-&#10230;
+&#10230; Wybór modelu
 
 <br>
 
 **23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
 
-&#10230;
+&#10230; Słownictwo - Przy wybieraniu modelu rozróżniamy 3 różne porcje danych. Określamy je następująco:
 
 <br>
 
 **24. [Training set, Validation set, Testing set]**
 
-&#10230;
+&#10230; [Zbiór treningowy, Zbiór walidacyjny, Zbiór testowy]
 
 <br>
 
 **25. [Model is trained, Model is assessed, Model gives predictions]**
 
-&#10230;
+&#10230; [Model jest trenowany, Model jest sprawdzany, Model generuje predykcje]
 
 <br>
 
 **26. [Usually 80% of the dataset, Usually 20% of the dataset]**
 
-&#10230;
+&#10230; [Zazwyczaj 80% zbioru danych, Zazwyczaj 20% zbioru danych]
 
 <br>
 
 **27. [Also called hold-out or development set, Unseen data]**
 
-&#10230;
+&#10230; [Zwany także zbiorem zachowanym albo zbiorem deweloperskim, Niewidziane dane]
 
 <br>
 
 **28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
 
-&#10230;
+&#10230; Po wyborze modelu, szkolimy go na całym zbiorze danych (treningowy + walidacyjny) i testujemy na niewidzianym zbiorze (testowy). Zbiory są przedstawione na obrazkach poniżej:
 
 <br>
 
 **29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
 
-&#10230;
+&#10230; Walidacja krzyżowa - Cross-validation, zapisywana także jako CV, jest metodą która zakłada że przy wyborze modelu nie opieramy się tylko na jednych danych treningowych. Różne rodzaje tej metody opisane są poniżej w tabeli:
 
 <br>
 
 **30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
 
-&#10230;
+&#10230; [Trenowanie na k-1 podzbiorach i sprawdzanie na pozostałym podzbiorze, Trenowanie na n-p obserwacjach i sprawdzanie na p pozostałych]
 
 <br>
 
 **31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
 
-&#10230;
+&#10230; [Zazwyczaj k=5 lub 10, przypadek przy p=1 zwany jest leave-one-out]
 
 <br>
 
 **32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
 
-&#10230;
+&#10230; Najczęściej stosowanym rodzajem walidacji krzyżowej jest metoda zwana k-fold cross-validation (K-krotna walidacja krzyżowa). Dzieli ona dane treningowe na k równych podzbiorów. Model jest trenowany na k-1 podzbiorach i testowany na pozostałym jednym podzbiorze. Proces powtarzany jest k razy przy zmianie podzbioru walidacyjnej na następną. Błąd jest liczony jako średnia błędów ze wszytkich podzbiorów walidacyjnych.
 
 <br>
 
 **33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
 
-&#10230;
+&#10230; Regularyzacja - jest to proces mający na celu uniknięcie nadmiernemu dopasowaniu (overfitting) modelu do danych treningowych i uniknięciu wysokiej wariancji modelu. Tabela obok przedstawia rodzaje często stosowanych motod regularyzacyjnych:
 
 <br>
 
 **34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
 
-&#10230;
+&#10230; [Zmniejsza współczynniki do 0, Dobra do doboru zmiennych, Zmniejsza współczynniki, Rozwiązanie pośrednie pomiędzy doborem zmiennych a małymi współczynnikami]
 
 <br>
 
 **35. Diagnostics**
 
-&#10230;
+&#10230; Diagnostyka
 
 <br>
 
 **36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
 
-&#10230;
+&#10230; Niewystarczające dopasowanie (bias, underfitting) - jest to różnica pomiędzy predykowanymi wynikami a wynikami rzeczywistymi. Predykcje modelu cechuje mała wariancja i słabe dopasowanie do danych treningowych.
 
 <br>
 
 **37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
 
-&#10230;
+&#10230; Nadmierne dopasowanie (variance, overfitting) - predykcje modelu cechuje duża wariancja i dobre dopasowanie do danych treningowych.
 
 <br>
 
 **38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
 
-&#10230;
+&#10230; Nadmierne/Niewystarczające dopasowanie modelu - im prostszy model tym będzie bardziej niewystarczająco dopasowany, im bradziej złożony tym będzie bardziej nadmiernie dopasowany.
 
 <br>
 
 **39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
 
-&#10230;
+&#10230; [Objawy, Regresja, Klasyfikacja, Deep learning, Co zrobić?]
 
 <br>
 
 **40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
 
-&#10230;
+&#10230; [Wysoki błąd treningowy, Błąd treningowy zbliżony do błędu testowego, Niewystarczające dopasowanie, Błąd treningowy odrobinę mniejszy niż błąd testowy, Bardzo mały błąd treningowy, Błąd treningowy o wiele mniejszy niż błąd testowy, Nadmierne dopasowanie]
 
 <br>
 
 **41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
 
-&#10230;
+&#10230; [Uczyń model bardziej złożonym, Dodaj zmiennych, Ucz model dłużej, Zastosuj regularyzacje, Zdobąć więcej danych]
 
 <br>
 
 **42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
 
-&#10230;
+&#10230; Analiza błędu - Jest to analiza głównych powodów różnicy efektywności modelu testowanego i modelu doskonałego. W celu poprawy efektywności modelu.
 
 <br>
 
 **43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
 
-&#10230;
+&#10230; Analiza narzędziowa - analiza głównych powodów różnicy efektywności modelu testowanego i modelu podstawowego. W celu uproszczenia modelu.
 
 <br>

From c9aacb1ef908ef38b6c84ae55b4e44f49f11ba3a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Fri, 14 Sep 2018 16:01:09 +0200
Subject: [PATCH 03/42] Polish language translation - ML

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index a36ab6c20..99ef548d2 100644
--- a/README.md
+++ b/README.md
@@ -17,7 +17,7 @@ This repository aims at collaboratively translating our [Machine Learning cheats
 |Deep learning|0%|0%|0%|0%|0%|0%|0%|0%|0%|
 |Supervised learning|0%|0%|0%|0%|0%|0%|0%|0%|0%|
 |Unsupervised learning|0%|0%|0%|0%|0%|0%|0%|0%|0%|
-|ML tips and tricks|0%|0%|0%|0%|0%|0%|0%|0%|0%|
+|ML tips and tricks|0%|0%|0%|0%|0%|0%|0%|0%|100%|
 |Probabilities and Statistics|0%|0%|0%|0%|0%|0%|0%|0%|0%|
 |Linear algebra|0%|0%|0%|0%|0%|0%|0%|0%|0%|
 

From 1d5987b544b6c17b714563080a3b5f53c0d56c4d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Fri, 14 Sep 2018 16:07:20 +0200
Subject: [PATCH 04/42] Polish language translation - ML

---
 pl/cheatsheet-machine-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pl/cheatsheet-machine-learning-tips-and-tricks.md b/pl/cheatsheet-machine-learning-tips-and-tricks.md
index 69bfe338d..b931d4184 100644
--- a/pl/cheatsheet-machine-learning-tips-and-tricks.md
+++ b/pl/cheatsheet-machine-learning-tips-and-tricks.md
@@ -90,7 +90,7 @@
 
 **16. [Actual, Predicted]**
 
-&#10230; Rzeczywiste, Predykowane
+&#10230; [Rzeczywiste, Predykowane]
 
 <br>
 

From a3d18f7438ff6d0d3e9d3b16041a041c29925bd8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Fri, 14 Sep 2018 16:08:41 +0200
Subject: [PATCH 05/42] Polish language translation - ML

---
 pl/cheatsheet-machine-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pl/cheatsheet-machine-learning-tips-and-tricks.md b/pl/cheatsheet-machine-learning-tips-and-tricks.md
index b931d4184..c6730c850 100644
--- a/pl/cheatsheet-machine-learning-tips-and-tricks.md
+++ b/pl/cheatsheet-machine-learning-tips-and-tricks.md
@@ -252,6 +252,6 @@
 
 **43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
 
-&#10230; Analiza narzędziowa - analiza głównych powodów różnicy efektywności modelu testowanego i modelu podstawowego. W celu uproszczenia modelu.
+&#10230; Analiza ablacyjna - analiza głównych powodów różnicy efektywności modelu testowanego i modelu podstawowego. W celu uproszczenia modelu.
 
 <br>

From dcd71e0a7a7f300bf41718c7b449107822573621 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Fri, 14 Sep 2018 20:33:10 +0200
Subject: [PATCH 06/42] Polish language translation - ML

---
 CONTRIBUTORS | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index a9d2ef4af..903965847 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -25,3 +25,6 @@ Please input your name in the form: First name Last Name
 --te
 
 --zh
+
+--pl
+    Michał Jamry

From af7eef13d9450b030a628652739c4c3b5d740e11 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Fri, 14 Sep 2018 22:06:21 +0200
Subject: [PATCH 07/42] Polish language translation - DL

---
 pl/cheatsheet-deep-learning.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/pl/cheatsheet-deep-learning.md b/pl/cheatsheet-deep-learning.md
index 642a4cd01..3bb3fbba4 100644
--- a/pl/cheatsheet-deep-learning.md
+++ b/pl/cheatsheet-deep-learning.md
@@ -1,24 +1,24 @@
 **1. Deep Learning cheatsheet**
 
-&#10230;
+&#10230; Deep Learning - ściąga
 
 <br>
 
 **2. Neural Networks**
 
-&#10230;
+&#10230; Sieci neuronowe
 
 <br>
 
 **3. Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.**
 
-&#10230;
+&#10230; Sieci neuronowe to klasa modeli zbudowanych z warstw. Często wykorzystywane rodzaje sieci neuronowych to splotowe i rekurencyjne sieci neuronowe.
 
 <br>
 
 **4. Architecture ― The vocabulary around neural networks architectures is described in the figure below:**
 
-&#10230;
+&#10230; Architecture
 
 <br>
 

From 7252da8a7c36983649b9d771507aea65488ec25b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Fri, 14 Sep 2018 22:32:45 +0200
Subject: [PATCH 08/42] Polish language translation - DL

---
 pl/cheatsheet-deep-learning.md | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/pl/cheatsheet-deep-learning.md b/pl/cheatsheet-deep-learning.md
index 3bb3fbba4..2feaea9c8 100644
--- a/pl/cheatsheet-deep-learning.md
+++ b/pl/cheatsheet-deep-learning.md
@@ -18,43 +18,43 @@
 
 **4. Architecture ― The vocabulary around neural networks architectures is described in the figure below:**
 
-&#10230; Architecture
+&#10230; Architektura - słownictwo związane z sieciami neuronowymi jest opisane poniżej:
 
 <br>
 
 **5. [Input layer, hidden layer, output layer]**
 
-&#10230;
+&#10230; [Warstwa wejściowa, warstwa ukryta, warstwa wyjściowa]
 
 <br>
 
 **6. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
 
-&#10230;
+&#10230; Przez i rozumiemy i-tą warstwę sieci a przez j, j-ty neuron warstwy, mamy więc:
 
 <br>
 
 **7. where we note w, b, z the weight, bias and output respectively.**
 
-&#10230;
+&#10230; gdzie w to wagi (współczynniki), b to wyraz wolny funkcji i z to wynik.
 
 <br>
 
 **8. Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:**
 
-&#10230;
+&#10230; Funkcja aktywacji - Funkcje aktywacji stosowane są po wyliczeniu warstwy ukrytej w celu wprowadzenia nieliniowości do modelu. Oto najczęściej stosowane:
 
 <br>
 
 **9. [Sigmoid, Tanh, ReLU, Leaky ReLU]**
 
-&#10230;
+&#10230; [Sigmoid, Tanh, ReLU, Leaky ReLU]
 
 <br>
 
 **10. Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
 
-&#10230;
+&#10230; Błąd - W kontekście sieci neuronowych 
 
 <br>
 
@@ -240,52 +240,52 @@
 
 **41. Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:**
 
-&#10230;
+&#10230; Równanie Bellmana - 
 
 <br>
 
 **42. Remark: we note that the optimal policy π∗ for a given state s is such that:**
 
-&#10230;
+&#10230; Przypomnienie: zauważamy, że optymalna strategia π∗ dla danego stanu s jest taka, że:
 
 <br>
 
 **43. Value iteration algorithm ― The value iteration algorithm is in two steps:**
 
-&#10230;
+&#10230; Algorytm iteracyjnego ustalania wartości zmiennej - algorytm ten składa się z dwóch kroków:
 
 <br>
 
 **44. 1) We initialize the value:**
 
-&#10230;
+&#10230; Inicjalizujemy zmienną wartością:
 
 <br>
 
 **45. 2) We iterate the value based on the values before:**
 
-&#10230;
+&#10230; W iteracyjny sposób ustalamy wartość zmiennej w oparciu o wartość poprzedniej zmiennej:
 
 <br>
 
 **46. Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:**
 
-&#10230;
+&#10230; Szacowanie maksymalnego prawdopodobieństwa - Estymaty maksymalnego prawdopodobieństwo dla poszczególnych przejść pomiędzy stanami wygląda następująco:
 
 <br>
 
 **47. times took action a in state s and got to s′**
 
-&#10230;
+&#10230; ile razy podjęto działanie a w stanie s i otrzymano stan s'
 
 <br>
 
 **48. times took action a in state s**
 
-&#10230;
+&#10230; ile razu podjęto działanie a w stanie s
 
 <br>
 
 **49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:**
 
-&#10230;
+&#10230; Q-learning ― Q-learning jest sposobem bezmodelowego estymowania Q, które wygląda w następujący sposób: 

From a89e93888886e46ea2f954c2a0e3f105678e4abb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Fri, 14 Sep 2018 22:45:02 +0200
Subject: [PATCH 09/42] Polish language translation - DL

---
 pl/cheatsheet-deep-learning.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/pl/cheatsheet-deep-learning.md b/pl/cheatsheet-deep-learning.md
index 2feaea9c8..fc8607378 100644
--- a/pl/cheatsheet-deep-learning.md
+++ b/pl/cheatsheet-deep-learning.md
@@ -222,25 +222,25 @@
 
 **38. Policy ― A policy π is a function π:S⟶A that maps states to actions.**
 
-&#10230;
+&#10230; Strategia - Strategia π jest funkcją π:S⟶A, która mapuje stany na działania.
 
 <br>
 
 **39. Remark: we say that we execute a given policy π if given a state a we take the action a=π(s).**
 
-&#10230;
+&#10230; Przypomnienie: mówimy, że wykonujemy daną strategię π w danym stanie s, gdy wykonujemy działanie a=π(s).
 
 <br>
 
 **40. Value function ― For a given policy π and a given state s, we define the value function Vπ as follows:**
 
-&#10230;
+&#10230; Funkcja wartości ― Dla danej strategii π w danym stanie s, definiujemy wartość funkcji Vπ w następujący sposób:    
 
 <br>
 
 **41. Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:**
 
-&#10230; Równanie Bellmana - 
+&#10230; Równanie Bellmana - Optymalne równania Bellmana charakteryzują wartość funkcji Vπ∗ optymalnej strategii π∗:
 
 <br>
 

From 651f0c197ac93b3a9f616f00e32b7b35def6eb96 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Fri, 14 Sep 2018 22:50:08 +0200
Subject: [PATCH 10/42] Polish language translation - DL

---
 pl/cheatsheet-deep-learning.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pl/cheatsheet-deep-learning.md b/pl/cheatsheet-deep-learning.md
index fc8607378..bf09b9586 100644
--- a/pl/cheatsheet-deep-learning.md
+++ b/pl/cheatsheet-deep-learning.md
@@ -270,7 +270,7 @@
 
 **46. Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:**
 
-&#10230; Szacowanie maksymalnego prawdopodobieństwa - Estymaty maksymalnego prawdopodobieństwo dla poszczególnych przejść pomiędzy stanami wygląda następująco:
+&#10230; Szacowanie maksymalnego prawdopodobieństwa - Szacowanie maksymalnego prawdopodobieństwo dla poszczególnych przejść pomiędzy stanami wygląda następująco:
 
 <br>
 

From be7f32c3d85b4ac1e385164c167536bb76710182 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Fri, 14 Sep 2018 23:01:06 +0200
Subject: [PATCH 11/42] Polish language translation - DL

---
 pl/cheatsheet-deep-learning.md | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/pl/cheatsheet-deep-learning.md b/pl/cheatsheet-deep-learning.md
index bf09b9586..f50609dff 100644
--- a/pl/cheatsheet-deep-learning.md
+++ b/pl/cheatsheet-deep-learning.md
@@ -162,31 +162,31 @@
 
 **28. LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.**
 
-&#10230;
+&#10230; LSTM ― 
 
 <br>
 
 **29. Reinforcement Learning and Control**
 
-&#10230;
+&#10230; Uczenie Wspomagane i Kontrola
 
 <br>
 
 **30. The goal of reinforcement learning is for an agent to learn how to evolve in an environment.**
 
-&#10230;
+&#10230; Celem uczenia wspomaganego jest nauczenie agenta tego, w jaki sposób ewoluować w danym środowisku.
 
 <br>
 
 **31. Definitions**
 
-&#10230;
+&#10230; Definicje:
 
 <br>
 
 **32. Markov decision processes ― A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R) where:**
 
-&#10230;
+&#10230; Proces decyzyjny Markowa ― Proces decyzyjny markowa (MDP) jest 5-krotką (S,A,{Psa},γ,R), gdzie: 
 
 <br>
 
@@ -194,29 +194,29 @@
 
 &#10230;
 
-<br>
+<br> S jest zbiorem stanów
 
 **34. A is the set of actions**
 
-&#10230;
+&#10230; A jest zbiorem działań
 
 <br>
 
 **35. {Psa} are the state transition probabilities for s∈S and a∈A**
 
-&#10230;
+&#10230; {Psa} to zbiór prawdopodobieństw przejść pomiędzy stanami dla s∈S i a∈A 
 
 <br>
 
 **36. γ∈[0,1[ is the discount factor**
 
-&#10230;
+&#10230; γ∈[0,1[ jest współczynnikiem dyskontującym. 
 
 <br>
 
 **37. R:S×A⟶R or R:S⟶R is the reward function that the algorithm wants to maximize**
 
-&#10230;
+&#10230; R:S×A⟶R lub R:S⟶R to funkcja nagrody, którą algorytm ma za zadanie zmaksymalizować.
 
 <br>
 
@@ -246,7 +246,7 @@
 
 **42. Remark: we note that the optimal policy π∗ for a given state s is such that:**
 
-&#10230; Przypomnienie: zauważamy, że optymalna strategia π∗ dla danego stanu s jest taka, że:
+&#10230; Przypomnienie: zauważmy, że optymalna strategia π∗ dla danego stanu s jest taka, że:
 
 <br>
 

From e18b85fadd674b1a79cef83b417071529d0c69c0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Fri, 14 Sep 2018 23:08:33 +0200
Subject: [PATCH 12/42] Polish language translation - DL

---
 pl/cheatsheet-deep-learning.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/pl/cheatsheet-deep-learning.md b/pl/cheatsheet-deep-learning.md
index f50609dff..bf6a6c3ff 100644
--- a/pl/cheatsheet-deep-learning.md
+++ b/pl/cheatsheet-deep-learning.md
@@ -138,31 +138,31 @@
 
 **24. Recurrent Neural Networks**
 
-&#10230;
+&#10230; Rekurencyjne Sieci Neuronowe
 
 <br>
 
 **25. Types of gates ― Here are the different types of gates that we encounter in a typical recurrent neural network:**
 
-&#10230;
+&#10230; Rodzaje bramek ― Przedstawiamy różne rodzaje bramek, które możemy spotkać w typowych sieciach rekurencyjnych (RNN):
 
 <br>
 
 **26. [Input gate, forget gate, gate, output gate]**
 
-&#10230;
+&#10230; [Bramka wejściowa, bramka zapominajca, bramka, bramka wyjściowa]
 
 <br>
 
 **27. [Write to cell or not?, Erase a cell or not?, How much to write to cell?, How much to reveal cell?]**
 
-&#10230;
+&#10230; [Pisać do komórki, czy nie?, Wyczyścić komówke, czy nie?, Jak dużo zapisać do komórki?, Jak dużo ujawnić komórce?]
 
 <br>
 
 **28. LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.**
 
-&#10230; LSTM ― 
+&#10230; LSTM ― Długa krótkoterminowa sieć neuronowa (LSTM) to rodzaj sieci rekurencyjnej (RNN), która radzi sobie z problemem zanikającego gradientu poprzez wykorzystanie bramek zapominających.
 
 <br>
 

From 0fed2b09142836cf314f2b348f7cef4347c6cd8e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Fri, 14 Sep 2018 23:26:16 +0200
Subject: [PATCH 13/42] Polish language translation - DL

---
 pl/cheatsheet-deep-learning.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/pl/cheatsheet-deep-learning.md b/pl/cheatsheet-deep-learning.md
index bf6a6c3ff..fb8ec25de 100644
--- a/pl/cheatsheet-deep-learning.md
+++ b/pl/cheatsheet-deep-learning.md
@@ -54,7 +54,7 @@
 
 **10. Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
 
-&#10230; Błąd - W kontekście sieci neuronowych 
+&#10230;
 
 <br>
 
@@ -114,25 +114,25 @@
 
 **20. Convolutional Neural Networks**
 
-&#10230;
+&#10230; Konwolucyjne Sieci Neuronowe
 
 <br>
 
 **21. Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:**
 
-&#10230;
+&#10230; Wymagania warstwy konwolucyjnej ― Zauwżając, że W to rozmiar danych wejściowych, F to rozmiar neuronów warstwy konwolucyjnej, P rozmiar uzupełnienia zerami, to wymaganą ilość neuronów określamy następująco:
 
 <br>
 
 **22. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
 
-&#10230;
+&#10230; Normalizacja pakietu (Batch normalization) - Jest to krok w którym hiperparametry γ,β są wykorzystywane do normalizacji pakietu {xi}. Zauważając, że μB to średnia, a σ2B to wariancja, to normalizacja pakiet wygląda następująca:
 
 <br>
 
 **23. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
 
-&#10230;
+&#10230; Jest ona zazwyczaj stosowana po warstwie pełnej lub konwolucyjnej, a przed zastosowaniem nieliniowej funkcji aktywacyjnej i ma na celu umożliwienie stosowania dużego współczynnika uczenia i zmniejszenia zależności od inicjalizacji.
 
 <br>
 

From c2242654916df1c05f37882b1e0b626655714261 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Fri, 14 Sep 2018 23:28:34 +0200
Subject: [PATCH 14/42] Polish language translation - DL

---
 pl/cheatsheet-deep-learning.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pl/cheatsheet-deep-learning.md b/pl/cheatsheet-deep-learning.md
index fb8ec25de..31afc10b8 100644
--- a/pl/cheatsheet-deep-learning.md
+++ b/pl/cheatsheet-deep-learning.md
@@ -288,4 +288,4 @@
 
 **49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:**
 
-&#10230; Q-learning ― Q-learning jest sposobem bezmodelowego estymowania Q, które wygląda w następujący sposób: 
+&#10230; Q-learning ― Q-learning jest bezmodelowym sposobem estymowania Q, który wygląda następująco: 

From b51c33573a4fbb29e2e79b3cc646dbc15c3055e6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Fri, 14 Sep 2018 23:29:05 +0200
Subject: [PATCH 15/42] Polish language translation - DL

---
 pl/cheatsheet-deep-learning.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pl/cheatsheet-deep-learning.md b/pl/cheatsheet-deep-learning.md
index 31afc10b8..1214494b7 100644
--- a/pl/cheatsheet-deep-learning.md
+++ b/pl/cheatsheet-deep-learning.md
@@ -204,7 +204,7 @@
 
 **35. {Psa} are the state transition probabilities for s∈S and a∈A**
 
-&#10230; {Psa} to zbiór prawdopodobieństw przejść pomiędzy stanami dla s∈S i a∈A 
+&#10230; {Psa} to zbiór prawdopodobieństw przejść pomiędzy stanami gdzie s∈S i a∈A 
 
 <br>
 

From 50dd4b6d21bb0e5378ca3af9896041cfdf706d6a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Fri, 14 Sep 2018 23:29:54 +0200
Subject: [PATCH 16/42] Polish language translation - DL

---
 pl/cheatsheet-deep-learning.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pl/cheatsheet-deep-learning.md b/pl/cheatsheet-deep-learning.md
index 1214494b7..89245a30d 100644
--- a/pl/cheatsheet-deep-learning.md
+++ b/pl/cheatsheet-deep-learning.md
@@ -126,7 +126,7 @@
 
 **22. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
 
-&#10230; Normalizacja pakietu (Batch normalization) - Jest to krok w którym hiperparametry γ,β są wykorzystywane do normalizacji pakietu {xi}. Zauważając, że μB to średnia, a σ2B to wariancja, to normalizacja pakiet wygląda następująca:
+&#10230; Normalizacja pakietu (Batch normalization) ― Jest to krok w którym hiperparametry γ,β są wykorzystywane do normalizacji pakietu {xi}. Zauważając, że μB to średnia, a σ2B to wariancja, to normalizacja pakiet wygląda następująca:
 
 <br>
 

From ef93adb936eee5315e27bb4727ecae015899675a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Fri, 14 Sep 2018 23:44:18 +0200
Subject: [PATCH 17/42] Polish language translation - DL

---
 pl/cheatsheet-deep-learning.md | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/pl/cheatsheet-deep-learning.md b/pl/cheatsheet-deep-learning.md
index 89245a30d..c64d62132 100644
--- a/pl/cheatsheet-deep-learning.md
+++ b/pl/cheatsheet-deep-learning.md
@@ -60,55 +60,55 @@
 
 **11. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
 
-&#10230;
+&#10230; Współczynnik uczenia ― Współczynnik uczenia, często zapisywany jako α lub rzadziej η, określa z jaką szybkością będą aktualizowane wagi. Może on mieć wartość stałą lub zmienną. Obecnie najpopularniejszą metodą optymalizacji funkcji kosztu jest metoda Adam, która dostosowuje wartość współczynnika uczenia.
 
 <br>
 
 **12. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:**
 
-&#10230;
+&#10230; Propagacja wsteczna ― Propagacja wsteczna jest metodą aktualizacji wag w sieci neuronowej, która bierze pod uwagę różnice pomiędzy wynikiem uzyskanym, a oczekiwanym (koszt). Pochodna cząstkowa względem wagi w jest liczona z wykorzystaniem zasady złożenia pochodnych funkcji i wygląda następująco:
 
 <br>
 
 **13. As a result, the weight is updated as follows:**
 
-&#10230;
+&#10230; W wyniku czego, wagi są aktualizowane w następujący sposób:
 
 <br>
 
 **14. Updating weights ― In a neural network, weights are updated as follows:**
 
-&#10230;
+&#10230; Aktualizacja wag ― W sieci neuronowej, wagi są aktualizowane w następujący sposób: 
 
 <br>
 
 **15. Step 1: Take a batch of training data.**
 
-&#10230;
+&#10230; Krok 1: Pobierz pakiet danych treningowych.
 
 <br>
 
 **16. Step 2: Perform forward propagation to obtain the corresponding loss.**
 
-&#10230;
+&#10230; Krok 2: dokonaj propagacji do przodu aby uzyskać wartość kosztu.
 
 <br>
 
 **17. Step 3: Backpropagate the loss to get the gradients.**
 
-&#10230;
+&#10230; Step 3: Z wykorzystaniem propagacji wstecznej użyj koszt aby uzyskać gradient.
 
 <br>
 
 **18. Step 4: Use the gradients to update the weights of the network.**
 
-&#10230;
+&#10230; Krok 4: Wykorzystaj gradient aby zaktualizować wagi w sieci neuronowej.
 
 <br>
 
 **19. Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p**
 
-&#10230;
+&#10230; Dropout ― Dropout jest techniką zapobiegania nadmiernemu dopasowaniu (overfitting) do danych treningowych poprzez pomijanie niektórych neuronów w sieci. W praktyce, neurony są pomijane z prawdopodobieństwem p lub nie są pomijane z prawdopodobieństwem 1-p
 
 <br>
 

From 69f902f283b12c742f34fdbfdcfbf1c468bfbb09 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Fri, 14 Sep 2018 23:48:16 +0200
Subject: [PATCH 18/42] Polish language translation - DL

---
 pl/cheatsheet-deep-learning.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pl/cheatsheet-deep-learning.md b/pl/cheatsheet-deep-learning.md
index c64d62132..223cf30d1 100644
--- a/pl/cheatsheet-deep-learning.md
+++ b/pl/cheatsheet-deep-learning.md
@@ -54,7 +54,7 @@
 
 **10. Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
 
-&#10230;
+&#10230; Koszt logarytmiczny (Cross-entropy loss) ― W kontekście sieci neuronowych koszt logarytmiczny L(z,y) jest często stosowany i wygląda następująco:
 
 <br>
 

From 4f263e9149285430bf99324d337c50ca5237f5ca Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Fri, 14 Sep 2018 23:58:33 +0200
Subject: [PATCH 19/42] Polish language translation - DL

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 99ef548d2..26ec7ff6b 100644
--- a/README.md
+++ b/README.md
@@ -14,7 +14,7 @@ This repository aims at collaboratively translating our [Machine Learning cheats
 
 |Cheatsheet topic|العَرَبِيَّة|עִבְרִית|[हिन्दी](https://github.com/shervinea/cheatsheet-translation/tree/master/hi)|[ಕನ್ನಡ](https://github.com/shervinea/cheatsheet-translation/tree/master/kn)|[मराठी](https://github.com/shervinea/cheatsheet-translation/tree/master/mr)|[తెలుగు](https://github.com/shervinea/cheatsheet-translation/tree/master/te)|[Türkçe](https://github.com/shervinea/cheatsheet-translation/tree/master/tr)|[Русский](https://github.com/shervinea/cheatsheet-translation/tree/master/ru) | [Polski](https://github.com/shervinea/cheatsheet-translation/tree/master/pl)|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
-|Deep learning|0%|0%|0%|0%|0%|0%|0%|0%|0%|
+|Deep learning|0%|0%|0%|0%|0%|0%|0%|0%|100%|
 |Supervised learning|0%|0%|0%|0%|0%|0%|0%|0%|0%|
 |Unsupervised learning|0%|0%|0%|0%|0%|0%|0%|0%|0%|
 |ML tips and tricks|0%|0%|0%|0%|0%|0%|0%|0%|100%|

From b41f71c3ece64f9890e3559ba7be6bfbbc98aaed Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Sat, 15 Sep 2018 08:03:52 +0200
Subject: [PATCH 20/42] Polish language translation - DL

---
 pl/cheatsheet-deep-learning.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pl/cheatsheet-deep-learning.md b/pl/cheatsheet-deep-learning.md
index 223cf30d1..4441230ce 100644
--- a/pl/cheatsheet-deep-learning.md
+++ b/pl/cheatsheet-deep-learning.md
@@ -226,7 +226,7 @@
 
 <br>
 
-**39. Remark: we say that we execute a given policy π if given a state a we take the action a=π(s).**
+**39. Remark: we say that we execute a given policy π if given a state s we take the action a=π(s).**
 
 &#10230; Przypomnienie: mówimy, że wykonujemy daną strategię π w danym stanie s, gdy wykonujemy działanie a=π(s).
 

From d04cece445c3ffb2428992eccda7e05453f2f333 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Sat, 15 Sep 2018 08:31:21 +0200
Subject: [PATCH 21/42] Polish language translation - DL consistent terminalogy
 fix

---
 pl/cheatsheet-deep-learning.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/pl/cheatsheet-deep-learning.md b/pl/cheatsheet-deep-learning.md
index 4441230ce..b9a9fa333 100644
--- a/pl/cheatsheet-deep-learning.md
+++ b/pl/cheatsheet-deep-learning.md
@@ -54,7 +54,7 @@
 
 **10. Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
 
-&#10230; Koszt logarytmiczny (Cross-entropy loss) ― W kontekście sieci neuronowych koszt logarytmiczny L(z,y) jest często stosowany i wygląda następująco:
+&#10230; Strata logarytmiczna (Cross-entropy loss) ― W kontekście sieci neuronowych strata logarytmiczna L(z,y) jest często stosowany i wygląda następująco:
 
 <br>
 
@@ -90,13 +90,13 @@
 
 **16. Step 2: Perform forward propagation to obtain the corresponding loss.**
 
-&#10230; Krok 2: dokonaj propagacji do przodu aby uzyskać wartość kosztu.
+&#10230; Krok 2: dokonaj propagacji do przodu aby uzyskać wartość straty.
 
 <br>
 
 **17. Step 3: Backpropagate the loss to get the gradients.**
 
-&#10230; Step 3: Z wykorzystaniem propagacji wstecznej użyj koszt aby uzyskać gradient.
+&#10230; Step 3: Z wykorzystaniem propagacji wstecznej użyj straty aby uzyskać gradient.
 
 <br>
 

From b30c9b3c7ab9e6381e98b992d38f6a7d208d2aed Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Sat, 15 Sep 2018 08:32:40 +0200
Subject: [PATCH 22/42] Polish language translation - supervised learning

---
 pl/cheatsheet-supervised-learning.md | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/pl/cheatsheet-supervised-learning.md b/pl/cheatsheet-supervised-learning.md
index 3aa1f452d..e80339867 100644
--- a/pl/cheatsheet-supervised-learning.md
+++ b/pl/cheatsheet-supervised-learning.md
@@ -1,84 +1,84 @@
 **1. Supervised Learning cheatsheet**
 
-&#10230;
+&#10230; Uczenie nadzorowane - ściąga
 
 <br>
 
 **2. Introduction to Supervised Learning**
 
-&#10230;
+&#10230; Wprowadzenie do Uczenia nadzorowanego
 
 <br>
 
 **3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
 
-&#10230;
+&#10230; Mając zbiór danych {x(1),...,x(m)} i powiązany z nimi zbiór wyników {y(1),...,y(m)}, chcemy zbudować klasyfikator, który nauczy się predykcji y na podstawie x.
 
 <br>
 
 **4. Type of prediction ― The different types of predictive models are summed up in the table below:**
 
-&#10230;
+&#10230; Rodzaje predykcji ― Różne rodzaje predykcji opisane są w tabelce poniżej:
 
 <br>
 
 **5. [Regression, Classifier, Outcome, Examples]**
 
-&#10230;
+&#10230; [Regresja, Klasyfikacja, Wynik, Przykład]
 
 <br>
 
 **6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
 
-&#10230;
+&#10230; [Ciągłość, Klasa, Regresja liniowa, Regresja logistyczna, SVM, Naive Bayes]
 
 <br>
 
 **7. Type of model ― The different models are summed up in the table below:**
 
-&#10230;
+&#10230; Rodzaj modelu ― Różne rodzaje modeli opisane są w tabelce poniżej:
 
 <br>
 
 **8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
 
-&#10230;
+&#10230; [Model dyskryminacyjny, Model generatywny, Cel, Co jest uczone?, Obrazek, Przykład]
 
 <br>
 
 **9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
 
-&#10230;
+&#10230; [Bezpośrednia estymata P(y|x), Estymata P(x|y) aby wydedukować P(y|x), Rozgraniczenie decyzyjne, Rozkład prawdopodobieństwa danych, Regresja, SVM, GDA, Naive Bayes]
 
 <br>
 
 **10. Notations and general concepts**
 
-&#10230;
+&#10230; Zapis i stwierdzenia ogólne
 
 <br>
 
 **11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
 
-&#10230;
+&#10230; Hipoteza ― Hipoteze zapisujemy jako h0 i jest wybranym przez nas modelem. Dla danych danech wejściowych x(i) model tworzy predykcje wyniku h0(x(i)).
 
 <br>
 
 **12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
 
-&#10230;
+&#10230; Funkcja straty - Funkcja straty jest funkcją L:(z,y)∈R×Y⟼L(z,y)∈R która bierze za wejście predykowany wynik modelu oraz odpowiadający mu wynik rzeczywisty y i wyraża jak różne są od siebie. Częśto stosowane funkcje straty przedstawione są w tabelce poniżej:
 
 <br>
 
 **13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
 
-&#10230;
+&#10230; [Błąd najmniejszych kwadratów, Strata logistyczny, Strata Hinge-a, Strata logarytmiczny (Cross-entropy)]
 
 <br>
 
 **14. [Linear regression, Logistic regression, SVM, Neural Network]**
 
-&#10230;
+&#10230; [Regresja liniowa, Regresja logistyczna, SVM, Sieć neuronowa]
 
 <br>
 

From 535ac5f5766fd57ec59bed9d164197171988a58b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Sat, 15 Sep 2018 08:42:36 +0200
Subject: [PATCH 23/42] Polish language translation - DL

---
 pl/cheatsheet-deep-learning.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pl/cheatsheet-deep-learning.md b/pl/cheatsheet-deep-learning.md
index b9a9fa333..3a4424a22 100644
--- a/pl/cheatsheet-deep-learning.md
+++ b/pl/cheatsheet-deep-learning.md
@@ -12,7 +12,7 @@
 
 **3. Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.**
 
-&#10230; Sieci neuronowe to klasa modeli zbudowanych z warstw. Często wykorzystywane rodzaje sieci neuronowych to splotowe i rekurencyjne sieci neuronowe.
+&#10230; Sieci neuronowe to klasa modeli zbudowanych z warstw. Często wykorzystywane rodzaje sieci neuronowych to konwolucyjne i rekurencyjne sieci neuronowe.
 
 <br>
 

From dd6d92e3d6387d6c7aaafa0a5a4eb9b082dc1f11 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Sat, 15 Sep 2018 10:42:28 +0200
Subject: [PATCH 24/42] Polish language translation - supervised learning

---
 pl/cheatsheet-supervised-learning.md | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/pl/cheatsheet-supervised-learning.md b/pl/cheatsheet-supervised-learning.md
index e80339867..ae800186a 100644
--- a/pl/cheatsheet-supervised-learning.md
+++ b/pl/cheatsheet-supervised-learning.md
@@ -84,79 +84,79 @@
 
 **15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
 
-&#10230;
+&#10230; Funkcja kosztu - Funkcja kosztu J jest często używana w celu określenia efektywności modelu, definiuje sie ją za pomocą funkcji straty L w następujący sposób:
 
 <br>
 
 **16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
 
-&#10230;
+&#10230; Schodzenie gradientu (Gradient descent) ― Przyjmując, że współczynnik uczenia to α∈R, zasadę aktualizacji przy schodzeniu gradientu można wyrazić za pomocą współczynnika uczenia i funkcji kosztu J w następujący sposób:
 
 <br>
 
 **17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
 
-&#10230;
+&#10230; Przypomnienie: Stochastyczne schodzenie gradientu (Stochastic Gradient Descent, SGD) aktualizuje współczynniki funkcji (wagi) w oparciu o każdy przykład z danych treningowych z osobna, a pakietowe schodzenie gradientu (batch gradient descent) aktualizuje je na podstawie całego pakietu (podzbioru) przykładów z danych treningowych.
 
 <br>
 
 **18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
 
-&#10230;
+&#10230; Prawdopodobieństwo ― Prawdopodobieństwo modelu L(θ) przy parametrze θ jest wykorzystywane do znalezienia optymalnego parametru θ, maksymalizującego prawdopodobieństwo. W praktyce, używamy prawdopodobieństwa logarytmicznego ℓ(θ)=log(L(θ)) które łatwiej zoptymalizować. Mamy więc:
 
 <br>
 
 **19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
 
-&#10230;
+&#10230; Algorytm Newtona ― Algorytm Newtona to numeryczna metoda znalezienia takiego parametru θ, dla którego ℓ′(θ)=0. Zasada jego aktualizacji:
 
 <br>
 
 **20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
 
-&#10230;
+&#10230; Przypomnienie: wielowymiarowa generalizacja, znana także jako metoda Newtona-Raphsona, ma następującą zasadę aktualizacji:
 
 <br>
 
 **21. Linear models**
 
-&#10230;
+&#10230; Modele liniowy
 
 <br>
 
 **22. Linear regression**
 
-&#10230;
+&#10230; Regresja liniowa
 
 <br>
 
 **23. We assume here that y|x;θ∼N(μ,σ2)**
 
-&#10230;
+&#10230; Zakładając że y|x;θ∼N(μ,σ2)
 
 <br>
 
 **24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
 
-&#10230;
+&#10230; Równania normalnej - Przyjmując za X macierz, wartość θ minimalizująca funkcje kosztu ma zamknięte rozwiązanie:
 
 <br>
 
 **25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
 
-&#10230;
+&#10230; Algorytm aproksymacji średniokwadratowej - Przyjmując, że α to współczynnik uczenia, zasada aktualizacji aproksymacji średniokwadratowej (Least Mean Square, LMS) z wykorzystaniem m przykładów z danych treningowych (zwana także algorytmem Widrow-Hoffa) wygląda następująco:
 
 <br>
 
 **26. Remark: the update rule is a particular case of the gradient ascent.**
 
-&#10230;
+&#10230; Przypomnienie: zasada aktualizacji to szczególny przypadek wchodzenia gradientu.
 
 <br>
 
 **27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
 
-&#10230;
+&#10230; LWR ― 
 
 <br>
 

From b98f83bb188d799361709d3898e475b63ce218a8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Sat, 15 Sep 2018 11:05:41 +0200
Subject: [PATCH 25/42] Polish language translation - supervised learning

---
 pl/cheatsheet-supervised-learning.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pl/cheatsheet-supervised-learning.md b/pl/cheatsheet-supervised-learning.md
index ae800186a..b943090e6 100644
--- a/pl/cheatsheet-supervised-learning.md
+++ b/pl/cheatsheet-supervised-learning.md
@@ -102,7 +102,7 @@
 
 **18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
 
-&#10230; Prawdopodobieństwo ― Prawdopodobieństwo modelu L(θ) przy parametrze θ jest wykorzystywane do znalezienia optymalnego parametru θ, maksymalizującego prawdopodobieństwo. W praktyce, używamy prawdopodobieństwa logarytmicznego ℓ(θ)=log(L(θ)) które łatwiej zoptymalizować. Mamy więc:
+&#10230; Prawdopodobieństwo ― Prawdopodobieństwo modelu L(θ) przy parametrze θ jest wykorzystywane do znalezienia optymalnego parametru θ poprzez maksymalizacje prawdopodobieństwa. W praktyce, używamy prawdopodobieństwa logarytmicznego ℓ(θ)=log(L(θ)) które łatwiej zoptymalizować (logspace). Mamy więc:
 
 <br>
 

From baf8e1a853c37c6563f1f25096c789c556099c3d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Sat, 15 Sep 2018 18:56:15 +0200
Subject: [PATCH 26/42] Polish language translation - supervised learning

---
 README.md                            | 4 ++--
 pl/cheatsheet-supervised-learning.md | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index 3093b45a7..454f0f927 100644
--- a/README.md
+++ b/README.md
@@ -14,10 +14,10 @@ This repository aims at collaboratively translating our [Machine Learning cheats
 
 |Cheatsheet topic|العَرَبِيَّة|עִבְרִית|[हिन्दी](https://github.com/shervinea/cheatsheet-translation/tree/master/hi)|[ಕನ್ನಡ](https://github.com/shervinea/cheatsheet-translation/tree/master/kn)|[मराठी](https://github.com/shervinea/cheatsheet-translation/tree/master/mr)|[తెలుగు](https://github.com/shervinea/cheatsheet-translation/tree/master/te)|[Türkçe](https://github.com/shervinea/cheatsheet-translation/tree/master/tr)|[Русский](https://github.com/shervinea/cheatsheet-translation/tree/master/ru) | [Polski](https://github.com/shervinea/cheatsheet-translation/tree/master/pl)|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
-|Deep learning|0%|0%|0%|0%|0%|0%|0%|0%|100%|
+|Deep learning|0%|0%|0%|0%|0%|0%|0%|0%|**100%**|
 |Supervised learning|0%|0%|0%|0%|0%|0%|0%|0%|0%|
 |Unsupervised learning|0%|0%|0%|0%|0%|0%|0%|0%|0%|
-|ML tips and tricks|0%|0%|0%|0%|0%|0%|0%|0%|100%|
+|ML tips and tricks|0%|0%|0%|0%|0%|0%|0%|0%|**100%**|
 |Probabilities and Statistics|0%|0%|0%|0%|0%|0%|0%|0%|0%|
 |Linear algebra|0%|0%|0%|0%|0%|0%|0%|0%|0%|
 
diff --git a/pl/cheatsheet-supervised-learning.md b/pl/cheatsheet-supervised-learning.md
index b943090e6..45b29f734 100644
--- a/pl/cheatsheet-supervised-learning.md
+++ b/pl/cheatsheet-supervised-learning.md
@@ -156,13 +156,13 @@
 
 **27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
 
-&#10230; LWR ― 
+&#10230; LWR ― XXXXXXXXXXXXXXXXXXXXXXXXXXX
 
 <br>
 
 **28. Classification and logistic regression**
 
-&#10230;
+&#10230; Klasyfikacja i regresja logistyczna
 
 <br>
 

From c582ef4d16848ac5135ef4cb69c204e1f9e0a57e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Sun, 16 Sep 2018 01:07:31 +0200
Subject: [PATCH 27/42] Polish language translation - supervised learning

---
 README.md | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/README.md b/README.md
index 392275d3a..402b9a488 100644
--- a/README.md
+++ b/README.md
@@ -12,14 +12,14 @@ This repository aims at collaboratively translating our [Machine Learning cheats
 |Probabilities and Statistics|0%|0%|0%|**100%**|0%|0%|0%|
 |Linear algebra|0%|0%|0%|**100%**|0%|0%|0%|
 
-|Cheatsheet topic|العَرَبِيَّة|עִבְרִית|[हिन्दी](https://github.com/shervinea/cheatsheet-translation/tree/master/hi)|[ಕನ್ನಡ](https://github.com/shervinea/cheatsheet-translation/tree/master/kn)|[मराठी](https://github.com/shervinea/cheatsheet-translation/tree/master/mr)|[తెలుగు](https://github.com/shervinea/cheatsheet-translation/tree/master/te)|[Türkçe](https://github.com/shervinea/cheatsheet-translation/tree/master/tr)|[Русский](https://github.com/shervinea/cheatsheet-translation/tree/master/ru)
-|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
-|Deep learning|0%|0%|0%|0%|0%|0%|**100%**|0%|
-|Supervised learning|0%|0%|0%|0%|0%|0%|0%|0%|
-|Unsupervised learning|0%|0%|0%|0%|0%|0%|0%|0%|
-|ML tips and tricks|0%|0%|0%|0%|0%|0%|0%|0%|
-|Probabilities and Statistics|0%|0%|0%|0%|0%|0%|0%|0%|
-|Linear algebra|0%|0%|0%|0%|0%|0%|0%|0%|
+|Cheatsheet topic|العَرَبِيَّة|עִבְרִית|[हिन्दी](https://github.com/shervinea/cheatsheet-translation/tree/master/hi)|[ಕನ್ನಡ](https://github.com/shervinea/cheatsheet-translation/tree/master/kn)|[मराठी](https://github.com/shervinea/cheatsheet-translation/tree/master/mr)|[తెలుగు](https://github.com/shervinea/cheatsheet-translation/tree/master/te)|[Türkçe](https://github.com/shervinea/cheatsheet-translation/tree/master/tr)|[Русский](https://github.com/shervinea/cheatsheet-translation/tree/master/ru)|[Polski](https://github.com/shervinea/cheatsheet-translation/tree/master/pl)
+|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
+|Deep learning|0%|0%|0%|0%|0%|0%|**100%**|0%|**100%**|
+|Supervised learning|0%|0%|0%|0%|0%|0%|0%|0%|0%|
+|Unsupervised learning|0%|0%|0%|0%|0%|0%|0%|0%|0%|
+|ML tips and tricks|0%|0%|0%|0%|0%|0%|0%|0%|**100%**|
+|Probabilities and Statistics|0%|0%|0%|0%|0%|0%|0%|0%|0%|
+|Linear algebra|0%|0%|0%|0%|0%|0%|0%|0%|0%|
 
 If your favorite language is missing, please feel free to add it!
 

From 718a235786629e91e4406ac430f415078c7151ea Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Thu, 20 Sep 2018 08:36:28 +0200
Subject: [PATCH 28/42] [pl] supervised learning

---
 pl/cheatsheet-supervised-learning.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/pl/cheatsheet-supervised-learning.md b/pl/cheatsheet-supervised-learning.md
index 45b29f734..e4f4cd3af 100644
--- a/pl/cheatsheet-supervised-learning.md
+++ b/pl/cheatsheet-supervised-learning.md
@@ -156,7 +156,7 @@
 
 **27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
 
-&#10230; LWR ― XXXXXXXXXXXXXXXXXXXXXXXXXXX
+&#10230; LWR ― Regresja ważona lokalnie, jest odmianą regresji liniowej, w której waży się każdy przykład ze zbioru treningowego funkcją kosztu w(i)(x), która jest zdefiniowana z wykorzystaniem parametru t∈R w sposób następujący:
 
 <br>
 
@@ -168,25 +168,25 @@
 
 **29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
 
-&#10230;
+&#10230; Funkcja sigmoidalna - Funkcja sigmoidalna g, anana także jako funkcja logistyczna, jest zdefiniowana w następujący sposób:
 
 <br>
 
 **30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
 
-&#10230;
+&#10230; Regresja logistyczna ―  Zakładając, że y|x;θ∼Bernoulli(ϕ). Mamy następującą formułę:
 
 <br>
 
 **31. Remark: there is no closed form solution for the case of logistic regressions.**
 
-&#10230;
+&#10230; Przypomnienie: nie istnieje zamknięte rozwiązanie przypadku regresji logistycznej.
 
 <br>
 
 **32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
 
-&#10230;
+&#10230; Regresja softmax ― 
 
 <br>
 

From 268ed89fe91cb997ed7bd479526b8d41e8c4d991 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Thu, 20 Sep 2018 10:18:42 +0200
Subject: [PATCH 29/42] [pl] supervised learning

---
 pl/cheatsheet-supervised-learning.md | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/pl/cheatsheet-supervised-learning.md b/pl/cheatsheet-supervised-learning.md
index e4f4cd3af..d100bbfe8 100644
--- a/pl/cheatsheet-supervised-learning.md
+++ b/pl/cheatsheet-supervised-learning.md
@@ -186,49 +186,49 @@
 
 **32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
 
-&#10230; Regresja softmax ― 
+&#10230; Regresja softmax ―  Regresja softmax, zwana także wieloklasową regresją logistyczną, używana jest jako uogólnienie regresji logistycznej w przypadku, gdy mamy więcej niż 2 klasy wynikowe. Konwencją jest, że θK=0, czyni to parametr Bernoulliego ϕi każdej klasy i równy:
 
 <br>
 
 **33. Generalized Linear Models**
 
-&#10230;
+&#10230; Generalne modeli liniowych
 
 <br>
 
 **34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
 
-&#10230;
+&#10230; Rodzina wykładnicza ― O klasie rozkładu mówi się, że należy do rodziny wykładniczej jeśli można ją zapisać z wykorzystaniem parametrów naturalnych, zwanych także kanonicznymi parametrami η, wystarczającej statystyki T(y) i podzału logarytmicznego funkcji a(η) w nastepujący sposób:
 
 <br>
 
 **35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
 
-&#10230;
+&#10230; Przypomnienie: często zdarzy się, że T(y)=y. Więc exp(-a(η)) może być rozumiany jako parametr normalizujący, który zapewni, że suma prawdopodobieństw będzie wynosiła 1.
 
 <br>
 
 **36. Here are the most common exponential distributions summed up in the following table:**
 
-&#10230;
+&#10230; W tabeli przedstawione są najczęściej spotykane rozkłady wykładnicze:
 
 <br>
 
 **37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
 
-&#10230;
+&#10230; [Rozkład, Bernoulli, Gaussian, Poisson, Geometric]
 
 <br>
 
 **38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
 
-&#10230;
+&#10230; Założenia generalnych modeli liniowych ― generalne modele liniowe mają za zadanie przewidzieć losową zmienną y jako funkcje x∈Rn+1 i opieraja się na 3 założeniach:
 
 <br>
 
 **39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
 
-&#10230;
+&#10230; 
 
 <br>
 

From 6f58cc44ba1db7eba0cd2f17aa714abf539a44cb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Thu, 20 Sep 2018 10:58:43 +0200
Subject: [PATCH 30/42] [pl] supervised learning

---
 pl/cheatsheet-supervised-learning.md | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/pl/cheatsheet-supervised-learning.md b/pl/cheatsheet-supervised-learning.md
index d100bbfe8..ea6ea0e48 100644
--- a/pl/cheatsheet-supervised-learning.md
+++ b/pl/cheatsheet-supervised-learning.md
@@ -228,61 +228,61 @@
 
 **39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
 
-&#10230; 
+&#10230; Przypomnienie: zwykła metoda najmniejszych kwadratów i regresja logistyczna to przypadki szczególne generalnych modeli liniowych.
 
 <br>
 
 **40. Support Vector Machines**
 
-&#10230;
+&#10230; Maszyny wektorów nośnych (Support Vector Machines)
 
 <br>
 
 **41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
 
-&#10230;
+&#10230; Celem maszyn wektorów nośnych jest znalezienie hiperpłaszczyzny, która maksymalizuje margines pomiędzy przykładami oddzielnych klas.
 
 <br>
 
 **42: Optimal margin classifier ― The optimal margin classifier h is such that:**
 
-&#10230;
+&#10230; Klasyfikator optymalnego marginesu ― Klasyfikator optymalnego marginesu h jest opisany następująco:
 
 <br>
 
 **43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
 
-&#10230;
+&#10230; gdzie (w,b)∈Rn×R jest rozwiązaniem następującego problemu optymalizacyjnego:
 
 <br>
 
 **44. such that**
 
-&#10230;
+&#10230; takich, że
 
 <br>
 
 **45. support vectors**
 
-&#10230;
+&#10230; wektory nośne 
 
 <br>
 
 **46. Remark: the line is defined as wTx−b=0.**
 
-&#10230;
+&#10230; Przypomnienie: linia zdefiniowana jest jako wTx−b=0.
 
 <br>
 
 **47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
 
-&#10230;
+&#10230; Strata Hinge'a ― Strata Hinge'a jest wykorzystywana w maszynach wektorów nośnych, definiowana jest następująco:
 
 <br>
 
 **48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
 
-&#10230;
+&#10230; 
 
 <br>
 

From abdace2a4fd4437f3fb57ed171c3d41f869e25da Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Thu, 20 Sep 2018 12:03:43 +0200
Subject: [PATCH 31/42] [pl] supervised learning

---
 pl/cheatsheet-supervised-learning.md | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/pl/cheatsheet-supervised-learning.md b/pl/cheatsheet-supervised-learning.md
index ea6ea0e48..b737499cc 100644
--- a/pl/cheatsheet-supervised-learning.md
+++ b/pl/cheatsheet-supervised-learning.md
@@ -282,85 +282,85 @@
 
 **48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
 
-&#10230; 
+&#10230; Jądro ― Mając mapowanie ϕ, definiujemy jądro K jako:
 
 <br>
 
 **49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
 
-&#10230;
+&#10230; W praktyce, jądro K zdefiniowane jako K(x,z)=exp(−||x−z||22σ2) nazywane jest Jądrem Gaussa i jest powszechnie używane.
 
 <br>
 
 **50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
 
-&#10230;
+&#10230; [Nieliniowa separowalność, Użycie mapowania jądra, Rozgraniczenie decyzyjne w oryginalnej przestrzeni]
 
 <br>
 
 **51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
 
-&#10230;
+&#10230; Przypomnienie: mówimy, że używamy "kernel trick" do opliczenia funkcji kosztu wykorzystując jądro, ponieważ w rzeczywistości nie potrzebujemy znać mapowania ϕ, które często bywa skomplikowane. W zamian, jedynie wartości K(x,z) są potrzebne.
 
 <br>
 
 **52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
 
-&#10230;
+&#10230; Lagrangian ― Definiujemy Lagrangian L(w,b) następująco:
 
 <br>
 
 **53. Remark: the coefficients βi are called the Lagrange multipliers.**
 
-&#10230;
+&#10230; Przypomnienie: współczynniki βi nazywane są mnożnikami Legrange'a
 
 <br>
 
 **54. Generative Learning**
 
-&#10230;
+&#10230; Uczenie generatywne
 
 <br>
 
 **55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
 
-&#10230;
+&#10230; Model generatywny po pierwsze stara się nauczyć jak dane są generowane poprzez estymacje P(x|y), które możemy użyć do estymacji P(y|x) korzystając z reguły Bayesa.
 
 <br>
 
 **56. Gaussian Discriminant Analysis**
 
-&#10230;
+&#10230; Analiza dykryminanty Gaussa
 
 <br>
 
 **57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
 
-&#10230;
+&#10230; Założenia ―  Analiza dyskryminanty gaussa zakłada że y i x|y=0 i x|y=1 i jest taka, że:
 
 <br>
 
 **58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
 
-&#10230;
+&#10230; Estymacja ― Następująca tabela przedstawia estymaty, które widać przy maksymalizacji prawdopodobieństwa:
 
 <br>
 
 **59. Naive Bayes**
 
-&#10230;
+&#10230; Naiwny klasyfikator bayesowski
 
 <br>
 
 **60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
 
-&#10230;
+&#10230; Założenie ― Naiwny klasyfikator bayesowski zakłada, że cechy (features) każdego przykładu są niezależne.
 
 <br>
 
 **61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
 
-&#10230;
+&#10230; Rozwiązanie ― 
 
 <br>
 

From 941519f67d479bfe62f7027aac5ddca697450ead Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Thu, 20 Sep 2018 12:13:27 +0200
Subject: [PATCH 32/42] [pl] supervised learning

---
 pl/cheatsheet-supervised-learning.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/pl/cheatsheet-supervised-learning.md b/pl/cheatsheet-supervised-learning.md
index b737499cc..4ee30bd0c 100644
--- a/pl/cheatsheet-supervised-learning.md
+++ b/pl/cheatsheet-supervised-learning.md
@@ -360,37 +360,37 @@
 
 **61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
 
-&#10230; Rozwiązanie ― 
+&#10230; Rozwiązanie ― Maksymalizując logarytmiczne prawdopodobieństwo otrzymujemy następująze rozwiązanie z k∈{0,1},l∈[[1,L]]
 
 <br>
 
 **62. Remark: Naive Bayes is widely used for text classification and spam detection.**
 
-&#10230;
+&#10230; Przypomnienie: Naiwny klasyfikator bayesowski jest powszechnie używany do klasyfikacji tekstu i detekcji spamu.
 
 <br>
 
 **63. Tree-based and ensemble methods**
 
-&#10230;
+&#10230; Metody oparte o drzewa i "ensembling"
 
 <br>
 
 **64. These methods can be used for both regression and classification problems.**
 
-&#10230;
+&#10230; Te metody mogą być używane zarówno do problemów regresyjnych jak i klasyfikacyjnych.
 
 <br>
 
 **65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
 
-&#10230;
+&#10230; CART ― Drzewa klasyfikacyjne i regresyjne (Classification and Regression Trees), zwane także drzewami decyzyjnymi, mogą być reprezentowane jako drzewa binarne. Zaletą tych metod jest ich wysoka interpretowalność.
 
 <br>
 
 **66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
 
-&#10230;
+&#10230; Las losowy
 
 <br>
 

From 5053f004ea2e3be83ed5752025ec14a028d0ce88 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Thu, 20 Sep 2018 12:31:15 +0200
Subject: [PATCH 33/42] [pl] supervised learning

---
 pl/cheatsheet-supervised-learning.md | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/pl/cheatsheet-supervised-learning.md b/pl/cheatsheet-supervised-learning.md
index 4ee30bd0c..4985f4598 100644
--- a/pl/cheatsheet-supervised-learning.md
+++ b/pl/cheatsheet-supervised-learning.md
@@ -390,55 +390,55 @@
 
 **66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
 
-&#10230; Las losowy
+&#10230; Las losowy ― Jest to metoda oparta na drzewach, która wykorzystuje dużą ilość drzew decyzyjnych, opeartych na losowo dobieranych cechach. W przeciwieństwie do prostego drzewa decyzyjnego, jest on wysoce nieinterpretowalny, jednak dobra efektywność czyni go popularnym algorytmem.
 
 <br>
 
 **67. Remark: random forests are a type of ensemble methods.**
 
-&#10230;
+&#10230; Przypomnienie: las losowy jest rodzajem algorytmu opartego na ensemblingu.
 
 <br>
 
 **68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
 
-&#10230;
+&#10230; Boostowanie ― pomysł polega na połączeniu kilku słabszych modeli w celu otworzenia jednego silniejszego. Poniżej przedstawione są główne rodzaje:
 
 <br>
 
 **69. [Adaptive boosting, Gradient boosting]**
 
-&#10230;
+&#10230; [Boostowanie adaptacyjne, Boostowanie gradientowe]
 
 <br>
 
 **70. High weights are put on errors to improve at the next boosting step**
 
-&#10230;
+&#10230; Duża waga kładziona jest na błędy w celu polepszenia wyniku w następnym kroku boostującym 
 
 <br>
 
 **71. Weak learners trained on remaining errors**
 
-&#10230;
+&#10230; Słabe modele trenowane są na pozostałych błędach
 
 <br>
 
 **72. Other non-parametric approaches**
 
-&#10230;
+&#10230; Inne nie sparametryzowane podejścia
 
 <br>
 
 **73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
 
-&#10230;
+&#10230; k-najbliżsi sąsiedzi ― Algorytm k-najbliższych sąsiadów, znany powszechnie jako k-NN, jest nie sparametryzowanym podejściem, gdzie przynależność danego przykładu do danej klasy zależy od przynależności k-najbliższych punktów. Może być wykorzystywane zarówno przy klasyfikacji, jak i regresji.
 
 <br>
 
 **74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
 
-&#10230;
+&#10230; Re
 
 <br>
 

From d673ca072f73edee4850ce5d23f2b70d136e19c1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Thu, 20 Sep 2018 12:41:26 +0200
Subject: [PATCH 34/42] [pl] supervised learning

---
 pl/cheatsheet-supervised-learning.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/pl/cheatsheet-supervised-learning.md b/pl/cheatsheet-supervised-learning.md
index 4985f4598..9e0469832 100644
--- a/pl/cheatsheet-supervised-learning.md
+++ b/pl/cheatsheet-supervised-learning.md
@@ -438,37 +438,37 @@
 
 **74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
 
-&#10230; Re
+&#10230; Przypomnienie: Im wyższy parametr k, tym niższe dopasowanie, im mniejszy parametr k, tym wyższe dopasowanie.
 
 <br>
 
 **75. Learning Theory**
 
-&#10230;
+&#10230; Teoria uczenia
 
 <br>
 
 **76. Union bound ― Let A1,...,Ak be k events. We have:**
 
-&#10230;
+&#10230; Nierównośc Boole'a (Boole's inequality, union bound) ― Przyjmując, że A1,...,Ak są k zdarzeniami. Mamy:
 
 <br>
 
 **77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
 
-&#10230;
+&#10230; Nierówność Hoeffding'a ― Przyjmując, że Z1,...,Zm są m zmiennymi pobranymi z rozkładu Bernoulli'ego parametru ϕ. Przyjmując, ze ˆϕ jest średnią próbki i y>0, mamy:
 
 <br>
 
 **78. Remark: this inequality is also known as the Chernoff bound.**
 
-&#10230;
+&#10230; Przypomnienie: nierówność ta zwana jest także "Chernoff bound".
 
 <br>
 
 **79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
 
-&#10230;
+&#10230; Błąd uczenia ― Dla danego klasyfikatora h, definiujemy błąd treningowy ˆϵ(h), znana także jako błąd empiryczny lub ryzyko empiryczne. Wygląda następująco:
 
 <br>
 

From 4e567c4181cb633e2e068850303d066289742040 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Thu, 20 Sep 2018 12:43:14 +0200
Subject: [PATCH 35/42] [pl] supervised learning

---
 pl/cheatsheet-supervised-learning.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/pl/cheatsheet-supervised-learning.md b/pl/cheatsheet-supervised-learning.md
index 9e0469832..0bb3ea170 100644
--- a/pl/cheatsheet-supervised-learning.md
+++ b/pl/cheatsheet-supervised-learning.md
@@ -472,13 +472,13 @@
 
 <br>
 
-**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
+**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions:**
 
-&#10230;
+&#10230; 
 
 <br>
 
-**81: the training and testing sets follow the same distribution **
+**81: the training and testing sets follow the same distribution**
 
 &#10230;
 

From 26a1310e769dec4d9a43468aacccf99795eea8be Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Thu, 20 Sep 2018 12:50:20 +0200
Subject: [PATCH 36/42] [pl] supervised learning

---
 pl/cheatsheet-supervised-learning.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/pl/cheatsheet-supervised-learning.md b/pl/cheatsheet-supervised-learning.md
index 0bb3ea170..0fac40d62 100644
--- a/pl/cheatsheet-supervised-learning.md
+++ b/pl/cheatsheet-supervised-learning.md
@@ -474,31 +474,31 @@
 
 **80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions:**
 
-&#10230; 
+&#10230; Probably Approximately Correct (PAC) ― PAC jest sposobem, którym dowiedziono wiele teorii Teorii Uczenia i ma następujący zbiór założeń:
 
 <br>
 
 **81: the training and testing sets follow the same distribution**
 
-&#10230;
+&#10230; zbiór treningowy i testowy mają taki sam rozkład.
 
 <br>
 
 **82. the training examples are drawn independently**
 
-&#10230;
+&#10230; przykłady ze zbioru treningowego są wybierane niezależnie.
 
 <br>
 
 **83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
 
-&#10230;
+&#10230; Shattering ― Mając zbiór S={x(1),...,x(d)} i zbiór klasyfikatorów H, mówimy, że H "shatteruje" S jeśli dla jakiegokolwiek zbioru etykiet {y(1),...,y(d)}, mamy:
 
 <br>
 
 **84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
 
-&#10230;
+&#10230; 
 
 <br>
 

From 627dfb20608b56ebb44fd2bbf9213c1c1e64d36b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Thu, 20 Sep 2018 13:18:31 +0200
Subject: [PATCH 37/42] [pl] supervised learning

---
 pl/cheatsheet-supervised-learning.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/pl/cheatsheet-supervised-learning.md b/pl/cheatsheet-supervised-learning.md
index 0fac40d62..2add4bafd 100644
--- a/pl/cheatsheet-supervised-learning.md
+++ b/pl/cheatsheet-supervised-learning.md
@@ -500,20 +500,20 @@
 
 &#10230; 
 
-<br>
+<br> Teoria górnej granicy (Upper bound theorem)  ― Przyjmując, że H jest skończoną hipotezą klasy, taką, że |H|=k, przyjmijmy, że δ i rozmiar przykładu jest stały. To, z prawdopodobieństwem co najmniej 1−δ mamy:
 
 **85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
 
 &#10230;
 
-<br>
+<br> Wymiar Vapnika-Chervonenkisa ― wymiar danej nieskończonej hipotezy klasy H, zapisywany VC(H) jest rozmiarem największego zbioru, który jest "shattered" przez H.
 
 **86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
 
-&#10230;
+&#10230; Przypomnienie: wymiar VC z H={zbiór liniowych klasyfikatorów w 2 wymiarach} wynosi 3.
 
 <br>
 
 **87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
 
-&#10230;
+&#10230; Teoria Vapnika ― Przyjmując że mamy H, które VC(H)=d i m jest liczbą przykładów treningowych. Z prawdopodobieństwem co najmniej 1−δ mamy:

From 1dfa111c59d0b4d89ca6d01a67fda3aee72901e3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Sun, 23 Sep 2018 17:46:47 +0200
Subject: [PATCH 38/42] Polish language translation - supervised learning

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 9ad6fc87c..e81f7caaa 100644
--- a/README.md
+++ b/README.md
@@ -15,7 +15,7 @@ This repository aims at collaboratively translating our [Machine Learning cheats
 |Cheatsheet topic|العَرَبِيَّة|עִבְרִית|[हिन्दी](https://github.com/shervinea/cheatsheet-translation/tree/master/hi)|[ಕನ್ನಡ](https://github.com/shervinea/cheatsheet-translation/tree/master/kn)|[मराठी](https://github.com/shervinea/cheatsheet-translation/tree/master/mr)|[తెలుగు](https://github.com/shervinea/cheatsheet-translation/tree/master/te)|[Türkçe](https://github.com/shervinea/cheatsheet-translation/tree/master/tr)|[Русский](https://github.com/shervinea/cheatsheet-translation/tree/master/ru)|[Polski](https://github.com/shervinea/cheatsheet-translation/tree/master/pl)
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
 |Deep learning|0%|0%|0%|0%|0%|0%|**100%**|0%|**100%**|
-|Supervised learning|0%|0%|0%|0%|0%|0%|0%|0%|0%|
+|Supervised learning|0%|0%|0%|0%|0%|0%|0%|0%|**100%**|
 |Unsupervised learning|0%|0%|0%|0%|0%|0%|0%|0%|0%|
 |ML tips and tricks|0%|0%|0%|0%|0%|0%|0%|0%|**100%**|
 |Probabilities and Statistics|0%|0%|0%|0%|0%|0%|0%|0%|0%|

From ba48544e13d5e26b76749a88e51da4727cd5946a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Sun, 23 Sep 2018 17:54:31 +0200
Subject: [PATCH 39/42] Polish language translation - deep learning missing
 questions

---
 pl/cheatsheet-deep-learning.md | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/pl/cheatsheet-deep-learning.md b/pl/cheatsheet-deep-learning.md
index 3a4424a22..7e64938b7 100644
--- a/pl/cheatsheet-deep-learning.md
+++ b/pl/cheatsheet-deep-learning.md
@@ -289,3 +289,33 @@
 **49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:**
 
 &#10230; Q-learning ― Q-learning jest bezmodelowym sposobem estymowania Q, który wygląda następująco: 
+
+<br>
+
+**50. View PDF version on GitHub**
+
+&#10230;
+
+<br> Przejrzyj wersje PDF na Githubie
+
+**51. [Neural Networks, Architecture, Activation function, Backpropagation, Dropout]**
+
+&#10230; [Sieci neuronowe, Architektura, Funkcja aktywacji, Propagacja wsteczna, Dropout]
+
+<br>
+
+**52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]**
+
+&#10230; [Konwolucyjne Sieci Neuronowe, Warstwa konwolucyjna, Normalizacja pakietu (Batch normalization)]
+
+<br>
+
+**53. [Recurrent Neural Networks, Gates, LSTM]**
+
+&#10230; [Rekurencyjne Sieci Neuronowe, Bramki, LSTM]
+
+<br>
+
+**54. [Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]**
+
+&#10230; [Uczenie wspoagane, Proces decyzyjny Markowa, Iteracja wartość/strategia, Przybliżone programowanie dynamiczne, Wyszukiwanie strategii]

From 306d178668a9341e7db5fd3a5f0206befd3ac98b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Sun, 23 Sep 2018 18:04:57 +0200
Subject: [PATCH 40/42] Polish language translation - ml tips&tricks missing
 questions

---
 ...tsheet-machine-learning-tips-and-tricks.md | 28 +++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/pl/cheatsheet-machine-learning-tips-and-tricks.md b/pl/cheatsheet-machine-learning-tips-and-tricks.md
index c6730c850..ddec59ee5 100644
--- a/pl/cheatsheet-machine-learning-tips-and-tricks.md
+++ b/pl/cheatsheet-machine-learning-tips-and-tricks.md
@@ -255,3 +255,31 @@
 &#10230; Analiza ablacyjna - analiza głównych powodów różnicy efektywności modelu testowanego i modelu podstawowego. W celu uproszczenia modelu.
 
 <br>
+
+**44. Regression metrics**
+
+&#10230; Miary regresji
+
+<br>
+
+**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
+
+&#10230; [Miary klasyfikacji, macierz pomyłek, dokładność, precyzja, czułość, F1, ROC]
+
+<br>
+
+**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
+
+&#10230; [Miary regresji, R kwadrat, CP Mallow'a, AIC, BIC]
+
+<br>
+
+**47. [Model selection, cross-validation, regularization]**
+
+&#10230; [Wybór modelu, walidacja krzyżowa, regularyzacja]
+
+<br>
+
+**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
+
+&#10230; [Diagnostyka, Niedostateczne/nadmierne dopasowanie modelu, analiza ablacyjna/błędu]

From 1ced416d6e0f97f3f2cbef3b7d38453bf24551f4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Sun, 23 Sep 2018 18:12:00 +0200
Subject: [PATCH 41/42] Polish language translation - supervised learning
 missing questions

---
 pl/cheatsheet-supervised-learning.md | 50 +++++++++++++++++++++++++++-
 1 file changed, 49 insertions(+), 1 deletion(-)

diff --git a/pl/cheatsheet-supervised-learning.md b/pl/cheatsheet-supervised-learning.md
index 2add4bafd..8cfba9a40 100644
--- a/pl/cheatsheet-supervised-learning.md
+++ b/pl/cheatsheet-supervised-learning.md
@@ -330,7 +330,7 @@
 
 **56. Gaussian Discriminant Analysis**
 
-&#10230; Analiza dykryminanty Gaussa
+&#10230; Analiza dyskryminanty Gaussa
 
 <br>
 
@@ -517,3 +517,51 @@
 **87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
 
 &#10230; Teoria Vapnika ― Przyjmując że mamy H, które VC(H)=d i m jest liczbą przykładów treningowych. Z prawdopodobieństwem co najmniej 1−δ mamy:
+
+<br>
+
+**88. [Introduction, Type of prediction, Type of model]**
+
+&#10230; [Wprowadzenie, Rodzaje predykcji, Roszaje modelu]
+
+<br>
+
+**89. [Notations and general concepts, loss function, gradient descent, likelihood]**
+
+&#10230; [Zapis i ogólne założenia, funkcja straty, opadanie gradientu, prawdopodobieństwo]
+
+<br>
+
+**90. [Linear models, linear regression, logistic regression, generalized linear models]**
+
+&#10230; [Modele liniowe, regresja liniowa, regresja logistyczna, generalne modele liniowe]
+
+<br>
+
+**91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
+
+&#10230; [Maszyny wektorów nośnych, Klasyfikator optymalnego marginesu, Strata Hinge'a, Jądro]
+
+<br>
+
+**92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
+
+&#10230; [Uczenie generatywne, Analiza dyskryminanty Gaussa, Naiwny Klasyfikator Bayesowski (Naive Bayes)]
+
+<br>
+
+**93. [Trees and ensemble methods, CART, Random forest, Boosting]**
+
+&#10230; [Drzewa i "ensembling", CART, Las losowy, Boostowanie]
+
+<br>
+
+**94. [Other methods, k-NN]**
+
+&#10230; [Inne metody, k-NN]
+
+<br>
+
+**95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
+
+&#10230; [Teoria uczenia, Nierówność Hoeffding'a, PAC, wymiary VC]

From b928903747de86a83e54ea5fe1b916627ab98385 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Jamry?= <jamrymi@gmail.com>
Date: Sun, 23 Sep 2018 19:23:35 +0200
Subject: [PATCH 42/42] Polish language translation - adding missing questions
 to templates

---
 pl/cheatsheet-unsupervised-learning.md | 41 ++++++++++++++++++++++++++
 pl/refresher-linear-algebra.md         | 26 +++++++++++++++-
 pl/refresher-probability.md            | 34 +++++++++++++++++++++
 3 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/pl/cheatsheet-unsupervised-learning.md b/pl/cheatsheet-unsupervised-learning.md
index 5826ff44b..1bf117d72 100644
--- a/pl/cheatsheet-unsupervised-learning.md
+++ b/pl/cheatsheet-unsupervised-learning.md
@@ -297,3 +297,44 @@ dimensions by maximizing the variance of the data as follows:**
 
 &#10230;
 
+<br>
+
+**51. The Machine Learning cheatsheets are now available in German.**
+
+&#10230;
+
+<br>
+
+**52. Original authors**
+
+&#10230;
+
+<br>
+
+**53. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**54. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**55. [Introduction, Motivation, Jensen's inequality]**
+
+&#10230;
+
+<br>
+
+**56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
+
+&#10230;
+
+<br>
+
+**57. [Dimension reduction, PCA, ICA]**
+
+&#10230;
diff --git a/pl/refresher-linear-algebra.md b/pl/refresher-linear-algebra.md
index a824025f7..a6b440d1e 100644
--- a/pl/refresher-linear-algebra.md
+++ b/pl/refresher-linear-algebra.md
@@ -22,7 +22,7 @@
 
 <br>
 
-**5. Matrix ― We note A∈Rm×n a matrix with n rows and m, where Ai,j∈R is the entry located in the ith row and jth column:**
+**5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
 
 &#10230;
 
@@ -313,3 +313,27 @@
 **53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
 
 &#10230;
+
+<br>
+
+**54. [General notations, Definitions, Main matrices]**
+
+&#10230;
+
+<br>
+
+**55. [Matrix operations, Multiplication, Other operations]**
+
+&#10230;
+
+<br>
+
+**56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
+
+&#10230;
+
+<br>
+
+**57. [Matrix calculus, Gradient, Hessian, Operations]**
+
+&#10230;
diff --git a/pl/refresher-probability.md b/pl/refresher-probability.md
index db03157d5..5c9b34656 100644
--- a/pl/refresher-probability.md
+++ b/pl/refresher-probability.md
@@ -345,3 +345,37 @@
 &#10230;
 
 <br>
+
+**59. [Introduction, Sample space, Event, Permutation]**
+
+&#10230;
+
+<br>
+
+**60. [Conditional probability, Bayes' rule, Independence]**
+
+&#10230;
+
+<br>
+
+**61. [Random variables, Definitions, Expectation, Variance]**
+
+&#10230;
+
+<br>
+
+**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
+
+&#10230;
+
+<br>
+
+**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
+
+&#10230;
+
+<br>
+
+**64. [Parameter estimation, Mean, Variance]**
+
+&#10230;