The topics which i have mentioned are a part of my unfinished study. Please consider reading these topics one by one to get a quick idea of the basics.
- Initially we calculate Sigmoid functions. Sigmoid functions are used as the activation function of the neurons.
- The activation function of a node decides whether or not the neuron is to be considered, i.e, activated. Read this amazing article on Activation function. This guy has explained everything we need to know on this topic.
- Next comes Cost function
Cost function – calculates the difference between the desired output and the real output, i.e, the error. It expresses how wrong our model is. Therefore, lesser cost function value implies better output.
The required piont is the global mininima or else the nearest point to x.(only one such point exists). Here x represents the weight. As a result, we gain the best W value for which Cost function is minimum => minimum error => desired output.
Training a network means minimizing the cost function.
- So, How to obtain the optimum weight? In other words, How to plot the above graph? (The above graph is the representation of Cost fn w.r.t Weight (for 1 neuron). Plotting the graph by Brute Force Method would take 10^27 secs (for n = 1000 weights/neurons) which is longer than the existence of the universe.)
- The solution is Gradient Descent.
Gradient Descent – Helps to find which way is the downhill in the graph so that we can find the local/global minima (least weight). To speed up things, we use “Derivative”.
Derivative – rate of change of error w.r.t rate of change of weight. (dE/dW) In other words, The slope of the curve. If
dE/dW = +ve, then the cost function is going uphill (undesired) dE/dW = -ve, then the cost function is going downhill (desired, as it takes us to the minima). Finally, if dE/dW = 0, then minima (desired point).