Merge pull request #11 from AutoResearch/10-docs-make-title-case-cohe…

…rent-with-autora-parent docs: strong title case for headings
AutoResearch · Jun 24, 2023 · 4426310 · 4426310
2 parents b765efe + 2d7e6b7
commit 4426310
Show file tree

Hide file tree

Showing 2 changed files with 5 additions and 5 deletions.
diff --git a/docs/Basic Usage.ipynb b/docs/Basic Usage.ipynb
@@ -83,7 +83,7 @@
     "collapsed": false
    },
    "source": [
-    "## Set up the DARTS Regressor\n",
+    "## Set Up The DARTS Regressor\n",
     "\n",
     "We will use the DARTS Regressor to predict the outcomes. There are a number of parameters that determine how the architecture search is performed. The most important ones are listed below:\n",
     "\n",

diff --git a/docs/meta-parameters.md b/docs/meta-parameters.md
@@ -2,19 +2,19 @@
 
 Meta parameters are used to control the search space and the search algorithm. DARTS has quite a lot of those parameters. This section provides a basic overview of all parameters along with a description of their effects. 
 
-## General DARTS meta parameters
+## General DARTS Meta Parameters
 
 - **`num_graph_nodes`**: The number of latent variables used to represent the model.
 - **`max_epochs`**: The maximum number of epochs to run DARTS. This corresponds to the total number of architecture updates. These updates affect the architecture weights $\alpha$ indicating the relative contribution of each operation for a given computation step.
 
-## Meta parameters for the architecture updates
+## Meta Parameters For The Architecture Updates
 The following parameters affect the updating of the architecture weights $\alpha$:
 
 - **`arch_learning_rate_max`**: The initial (maximum) learning rate for updating the architecture updates. The higher the learning rate, the larger the steps taken to update the architecture weights. The learning rate decays with each epoch.
 - **`arch_weight_decay`**: The weight decay for the architecture weights. The higher the weight decay, the more the high architecture weights are pressured to be small.
 - **`arch_weight_decay_df`**: An additional weight decay that scales with the number of parameters (degrees of freedom) per operation. The higher this weight decay, the more DARTS will favor operations with few parameters.
 
-## Meta parameters for the parameter updates
+## Meta Parameters For The Parameter Updates
 The following parameters affect the updating of the parameters associated with each operation:
 
 - **`param_updates_per_epoch`**: The number of steps taken by the parameter optimizer per epoch. Once the architecture updates are complete, the parameters associated with each operation are updated by a stochastic gradient descent over this number of steps.
@@ -23,7 +23,7 @@ The following parameters affect the updating of the parameters associated with e
 - **`param_momentum`**: The momentum for the architecture updates. The higher the momentum, the more the steps taken to update the architecture weights will be influenced by previous steps.
 - **`param_weight_decay`**: The weight decay for the parameters. The higher the weight decay, the more the high parameters of each operation are pressured to be small.
 
-## Meta parameters for the classifier
+## Meta Parameters For The Classifier
 The final output of the DARTS model is computed by concatenating all edges in the computation graph into a single vector and then adding a linear classifier. The linear classifier can attach a coefficient to each edge (weighing the contribution of that edge to the final output), and it can add a constant bias term. The following parameters affect the behavior of the classifier:
 
 - **`train_classifier_coefficients`**: If set to `True`, the classifier coefficient of each edge will be trained (otherwise each coefficient is set to `1`, reflecting an equal contribution of each edge to the final output).