#4 #5 Further updates to explanatory text.

informatics-lab · Jul 12, 2022 · 9004c75 · 9004c75
1 parent e884066
commit 9004c75
Showing 1 changed file with 21 additions and 32 deletions.
diff --git a/03_algorithm_selection.ipynb b/03_algorithm_selection.ipynb
@@ -1560,13 +1560,11 @@
     "\n",
     "A single artifical neuron (hereafter just referred to as a neuron), is essentially a linear weighted sum of inputs plus a constant term, to which a threshold operation is applied to the result, so that the output is 1 (activated) or 0 (not activated). The cell is known as a *perceptron*. The perceptron can be trained by iteratively updating the weights so that the error in the output for example training data is reduced and minimised. This is usually done through process called *gradient descent*.  A perceptron can be updated to output a range between 0 and 1, rather than binary 0 or 1 output, using a sigmoid function rather than a threshold operation. This results in what is known as a *sigmoid neuron*. \n",
     "\n",
-    "Neural networks are a series of these neurons joined together in a network of layers. Initially the comuptational cost of more than \n",
+    "Neural networks are a series of these neurons joined together in a network of layers. Initially the comuptational cost of updating the weights for more than a few neurons was prohibitive. As computer became more powerful so more neurons were used, and with more layer. A layer is where the input of one neurons becomes the input for a subsequent neurons. How neurons are connected together is what determines *network architecture*. Initially There was one layer into which inputs where feed, then the outputs all went to 1 neuron to produce a single output. Subsequently layers were added in between which directly connected to neither input nor outpout, which are termed hidden layers. A simple feedforward fully connected network is usually visualised as follows (from wikipedia):\n",
+    "![A feedforward network with 1 hidden layer](images/Colored_neural_network.svg)\n",
     "\n",
-    "Description of Neural networks\n",
-    "* Explain a single perceptron\n",
-    "* History\n",
-    "* Weighted sum plus threshold (binary activation)\n",
-    "* Non linear activation, sigmoid neuron\n",
+    "The way gradient descent works is to calculate the change in the loss or cost function as you change each weight in turn. As the network grows and thenumber of weights grows with it, this quickly becomes expensive, and  numerical issues arose with training the weights to produce a good result. Training is now done through a technique called \"back-propoagation\", which efficiently calculates the gradients in the the weights one layer at a time, and uses the the previously calculated gradients for each layer moving backwards to calculate the gradient for the next (hence the name back-propogation and the gradients being calculated propogates backwards like a wave. \n",
+    "Further optimisations have been introduced as networks have grown, such stochastic gradient descent where subsets of training data are used in *batches* to update the weights. The mathematics around this quickly becomes very complex, so consult the references for more on the mathematical details, which are very interesting!\n",
     "\n",
     "Key terms\n",
     "* Neuron - a computing unit consisting of a weighted combination of inputs, loosely mimicing a biological neuron found in animal brains. \n",
@@ -1576,34 +1574,25 @@
     "* Bias - the constant term\n",
     "* Sigmoid function - A non-linear function applied to the linear weighted sum to ensure the output is in the range 0 to 1.\n",
     "\n",
-    "Multi layer\n",
-    "* How are they joined together?\n",
-    "\n",
-    "How do we train? \n",
-    "* Gradient descent\n",
-    "* Stochastic gradient decscent\n",
-    "* Back propagation \n",
-    "\n",
     "Key terms\n",
-    "* gradient descent\n",
-    "* Learning rate\n",
-    "* back propagation\n",
-    "* mini batch\n",
-    "* epoch\n",
-    "* copst function\n",
+    "* Gradient descent - The process of updating the weights of a neural network, based on the partial derivatives of the cost function with respect to each of the weights in the network, updating the weights \"towards\" the direction of steepest descent of the cost function.\n",
+    "* Learning rate - The scaling factor of the movement \"down the slope\" during gradient descent. The larger the learning rate the faster the movement of weights towards an optimum, but the greater the change of finding a local optimum.\n",
+    "* back propagation - the process of more efficiently calculating the cost function gradients for a network by calculating one layer at a time, and saving total computation in this way.\n",
+    "* mini batch - In gradient descent, when training using stochastic gradient descent, a mini-batch is the subset of input data for which weight gradient are calculated together.\n",
+    "* epoch - Several batches of processing of updating weights, which cumulutavely cover the whole input dataset.\n",
+    "* cost function - the different between the ground truth the network is trying to predict and the actual predictions made by the network.\n",
     "\n",
     "Hyperparameters\n",
-    "* batch size\n",
-    "* learning rate\n",
-    "* solver\n",
-    "* maximum iterations\n",
-    "\n",
+    "* learning rate (see above\n",
+    "* batch size - the input of input points used for calculate a batch of gradients in stochastic gradient descent.\n",
+    "* solver - The variant of gradient descent with back propopagation used for training\n",
+    "* maximum iterations - The total number of training loops (usually epochs) before terminating (if another stopping condition is not reached).\n",
     " \n",
     "Types of NN\n",
-    "* feed forward\n",
-    "* Convolutional Neural Network\n",
-    "* Recurrent Neural Network\n",
-    "* Graphical Neural Network\n",
+    "* feed forward - where predictions move forward through the network from input to output\n",
+    "* Convolutional Neural Network - Where some weights are shared, in the form of convolutional kernels (like image processing). The network learns these kernels along with other weights.\n",
+    "* Recurrent Neural Network - A network where some outputs feedback as input to previous layers. With careful arrangement, this allows the network to have a \"memory\" of previous input. This is used to learn data in a series or sequence, e.g. a time series variable or the words in a sentence.\n",
+    "* Graphical Neural Network - A neural network structured around a graph representation of data.\n",
     " "
    ]
   },
@@ -1614,7 +1603,7 @@
    "source": [
     "## Example - Scorates Radiation Model Emulation\n",
     "\n",
-    "Explain problem and dataset"
+    "In our first example, we will be trying to emulate the socrates radiation scheme in the Unified Model. his is a supervised regression problem, trying to emulate the very complex function represented by the radiation scheme,"
    ]
   },
   {
@@ -2653,9 +2642,9 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": ".conda-ml-weather-tutorial-tf Python (Conda)",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
-   "name": "conda-env-.conda-ml-weather-tutorial-tf-py"
+   "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {