facebookresearch · jrapin · Jan 29, 2020 · Jan 17, 2020 · Jan 17, 2020 · Jan 17, 2020
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,18 @@
 
 ## master
 
+- `Candidate` class is removed, and is completely replaced by `Parameter` [#459](https://github.com/facebookresearch/nevergrad/pull/459)
+- New parametrization is now as efficient as in v0.3.0 (see CHANGELOG for v0.3.1 for contect)
+- `CandidateMaker` (`optimizer.create_candidate`) raises `DeprecationWarning`s since it new candidates/parameters
+  can be straightforwardly created (`parameter.spawn_child(new_value=new_value)`)
+- Optimizers can now hold any parametrization, not just `Instrumentation`. This for instance mean that when you
+  do `OptimizerClass(instrumentation=12, budget=100)`, the instrumentation (and therefore the candidates) will be of class
+  `ng.p.Array` (and not `ng.p.Instrumentation`), and their attribute `value` will be the corresponding `np.ndarray` value.
+  You can still use `args` and `kwargs` if you want, but it's no more needed!
+- Old `instrumentation` classes now raise deprecation warnings, and will disappear in versions >0.3.2.
+  Hence, prefere using parameters from `ng.p` than `ng.var`, and avoid using `ng.Instrumentation` altogether if
+  you don't need it anymore (or import it through `ng.p.Instrumentation`)
+
 ## v0.3.1 (2019-01-23)
 
 **Note**: this is the first step to propagate the instrumentation/parametrization framework.

diff --git a/README.md b/README.md
@@ -72,15 +72,14 @@ def square(x):
 optimizer = ng.optimizers.OnePlusOne(instrumentation=2, budget=100)
 recommendation = optimizer.minimize(square)
 print(recommendation)  # optimal args and kwargs
->>> Candidate(args=(array([0.500, 0.499]),), kwargs={})
+>>> Array{(2,)}[recombination=average,sigma=1.0]:[0.49971112 0.5002944 ]
 ```
 
-`recommendation` holds the optimal attributes `args` and `kwargs` found by the optimizer for the provided function.
-In this example, the optimal value will be found in `recommendation.args[0]` and will be a `np.ndarray` of size 2.
-
 `instrumentation=n` is a shortcut to state that the function has only one variable, of dimension `n`,
 See the [instrumentation tutorial](docs/instrumentation.md) for more complex instrumentations.
 
+`recommendation` holds the optimal value(s) found by the for the provided function. It can be
+directly accessed through `recommendation.value` which is here a `np.ndarray` of size 2.
 
 You can print the full list of optimizers with:
 ```python

diff --git a/docs/machinelearning.md b/docs/machinelearning.md
@@ -7,29 +7,24 @@ def myfunction(lr, num_layers, arg3, arg4, other_anything):
     return -accuracy  # something to minimize
 ```
 
-You should define how it must be instrumented, i.e. what are the arguments you want to optimize upon, and on which space they are defined. If you have both continuous and discrete parameters, you have a good initial guess, maybe just use `OrderedDiscrete`, `UnorderedDiscrete` for all discrete variables, `Array` for all your continuous variables, and use `PortfolioDiscreteOnePlusOne` as optimizer.
+You should define how it must be instrumented, i.e. what are the arguments you want to optimize upon, and on which space they are defined. If you have both continuous and discrete parameters, you have a good initial guess, maybe just use `TransitionChoice` for all discrete variables, `Array` for all your continuous variables, and use `PortfolioDiscreteOnePlusOne` as optimizer.
 
 ```python
 import nevergrad as ng
 # instrument learning rate and number of layers, keep arg3 to 3 and arg4 to 4
-lr = ng.var.Log(0.0001, 1)  # log distributed between 0.001 and 1
-num_layers = ng.var.OrderedDiscrete([4, 5, 6])
-instrumentation = ng.Instrumentation(lr, num_layers, 3., arg4=4)
+lr = ng.p.Log(a_min=0.0001, a_max=1)  # log distributed between 0.001 and 1
+num_layers = ng.p.TransitionChoice([4, 5, 6])
+instrumentation = ng.p.Instrumentation(lr, num_layers, 3., arg4=4)
 ```
+Make sure `instrumentation.value` holds your initial guess. It is automatically populated, but can be updated manually (just set `value` to what you want.
 
-Just take care that the default value (your initial guess) is at the middle in the list of possible values for `OrderedDiscrete`, and 0 for `Array` (you can modify this with `Array` methods). You can check that things are correct by checking that for zero you get the default:
-```python
-args, kwargs = instrumentation.data_to_arguments([0] * instrumentation.dimension)
-print(args, kwargs)
-```
-
-The fact that you use ordered discrete variables is not a big deal because by nature `PortfolioDiscreteOnePlusOne` will ignore the order. This algorithm is quite stable.
+The fact that you use (ordered) discrete variables through `TransitionChoice` is not a big deal because by nature `PortfolioDiscreteOnePlusOne` will ignore the order. This algorithm is quite stable.
 
-If you have more budget, a cool possibility is to use `CategoricalSoftmax` for all discrete variables and then apply `TwoPointsDE`. You might also compare this to `DE` (classical differential evolution). This might need a budget in the hundreds.
+If you have more budget, a cool possibility is to use `Choice` for all discrete variables and then apply `TwoPointsDE`. You might also compare this to `DE` (classical differential evolution). This might need a budget in the hundreds.
 
 If you want to double-check that you are not worse than random search, you might use `RandomSearch`.
 
-If you want something fully parallel (the number of workers can be equal to the budget), then you might use `ScrHammersleySearch`, which includes the discrete case. Then, you should use `OrderedDiscrete` rather than `CategoricalSoftmax`. This does not have the traditional drawback of grid search and should still be more uniform than random. By nature `ScrHammersleySearch` will deal correctly with `OrderedDiscrete` type for discrete variables.
+If you want something fully parallel (the number of workers can be equal to the budget), then you might use `ScrHammersleySearch`, which includes the discrete case. Then, you should use `TransitionChoice` rather than `CategoricalSoftmax`. This does not have the traditional drawback of grid search and should still be more uniform than random. By nature `ScrHammersleySearch` will deal correctly with `TransitionChoice` type for discrete variables.
 
 If you are optimizing weights in reinforcement learning, you might use `TBPSA` (high noise) or `CMA` (low noise).
 
@@ -58,7 +53,7 @@ print("Optimization of continuous hyperparameters =========")
 def train_and_return_test_error(x):
     return np.linalg.norm([int(50. * abs(x_ - 0.2)) for x_ in x])
 
-instrumentation = ng.Instrumentation(ng.var.Array(300))  # optimize on R^300
+instrumentation = ng.p.Array(300)  # optimize on R^300
 
 budget = 1200  # How many trainings we will do before concluding.
 
@@ -130,16 +125,16 @@ This function must then be instrumented in order to let the optimizer now what a
 import nevergrad as ng
 # argument transformation
 # Optimization of mixed (continuous and discrete) hyperparameters.
-arg1 = ng.var.OrderedDiscrete(["a", "b"])  # 1st arg. = positional discrete argument
+arg1 = ng.p.TransitionChoice(["a", "b"])  # 1st arg. = positional discrete argument
 # We apply a softmax for converting real numbers to discrete values.
-arg2 = ng.var.SoftmaxCategorical(["a", "c", "e"])  # 2nd arg. = positional discrete argument
-value = ng.var.Gaussian(mean=1, std=2)  # the 4th arg. is a keyword argument with Gaussian prior
+arg2 = ng.p.Choice(["a", "c", "e"])  # 2nd arg. = positional discrete argument
+value = ng.p.Scalar(init=1.0).set_mutation(sigma=2)  # the 4th arg. is a keyword argument with Gaussian prior
 
 # create the instrumentation
 # the 3rd arg. is a positional arg. which will be kept constant to "blublu"
-instrumentation = ng.Instrumentation(arg1, arg2, "blublu", value=value)
+instru = ng.Instrumentation(arg1, arg2, "blublu", value=value)
 
-print(instrumentation.dimension)  # 5 dimensional space
+print(instru.dimension)  # 5 dimensional space
 ```
 
 The dimension is 5 because:
@@ -149,22 +144,22 @@ The dimension is 5 because:
 - the 4th var. is a real number, represented by single coordinate.
 
 ```python
-args, kwargs = instrumentation.data_to_arguments([1, -80, -80, 80, 3])
-print(args, kwargs)
->>> ('b', 'e', 'blublu') {'value': 7}
-myfunction(*args, **kwargs)
->>> 8
+instru.set_standardized_data([1, -80, -80, 80, 3])
+print(instru.args, instru.kwargs)
+>>> (('b', 'e', 'blublu'), {'value': 7.0})
+myfunction(*instru.args, **instru.kwargs)
+>>> 8.0
 ```
 
 In this case:
 - `args[0] == "b"` because 1 > 0 (the threshold is 0 here since there are 2 values.
 - `args[1] == "e"` is selected because proba(e) = exp(80) / (exp(80) + exp(-80) + exp(-80)) = 1
 - `args[2] == "blublu"` because it is kept constant
-- `value == 7` because std * 3 + mean = 2 * 3 + 1 = 7
+- `value == 7` because std * 3 + current_value = 2 * 3 + 1 = 7
 The function therefore returns 7 + 1 = 8.
 
 
-Then you can run the optimization as usual. PortfolioDiscreteOnePlusOne is quite a natural choice when you have a good initial guess and a mix of discrete and continuous variables; in this case, it might be better to use `OrderedDiscrete` rather than `SoftmaxCategorical`.  
+Then you can run the optimization as usual. `PortfolioDiscreteOnePlusOne` is quite a natural choice when you have a good initial guess and a mix of discrete and continuous variables; in this case, it might be better to use `TransitionChoice` rather than `Choice`.  
 `TwoPointsDE` is often excellent in the large scale case (budget in the hundreds).
 
 ```python

diff --git a/docs/optimization.md b/docs/optimization.md
@@ -110,12 +110,12 @@ All algorithms have strengths and weaknesses. Questionable rules of thumb could
 ## Optimizing machine learning hyperparameters
 
 When optimizing hyperparameters as e.g. in machine learning. If you don't know what variables (see [instrumentation](instrumentation.md)) to use:
-- use `SoftmaxCategorical` for discrete variables
+- use `Choice` for discrete variables
 - use `TwoPointsDE` with `num_workers` equal to the number of workers available to you.
 See the [machine learning example](machinelearning.md) for more.
 
 Or if you want something more aimed at robustly outperforming random search in highly parallel settings (one-shot):
-- use `OrderedDiscrete` for discrete variables, taking care that the default value is in the middle.
+- use `TransitionChoice` for discrete variables, taking care that the default value is in the middle.
 - Use `ScrHammersleySearchPlusMiddlePoint` (`PlusMiddlePoint` only if you have continuous parameters or good default values for discrete parameters).
 
 

diff --git a/docs/parametrization.md b/docs/parametrization.md
@@ -1,58 +1,42 @@
 # Parametrization
 
-**Please note that parametrization is still a work in progress with heavy changes comming soon! We are trying to update it to make it simpler and simpler to use (all feedbacks are welcome ;) ), with the side effect that there will be breaking changes.**
+**Please note that parametrization is still a work in progress and changes are on their way (including for this documentation)! We are trying to update it to make it simpler and simpler to use (all feedbacks are welcome ;) ), with the side effect that there will be breaking changes.**
 
 The aim of parametrization is to specify what are the parameters that the optimization should be performed upon.
 The parametrization subpackage will help you do thanks to:
-- the `parameter` modules providing classes that should be used to specify each parameter.
+- the `parameter` modules (accessed by the shortcut `nevergrad.p`) providing classes that should be used to specify each parameter.
 - the `FolderFunction` which helps transform any code into a Python function in a few lines. This can be especially helpful to optimize parameters in non-Python 3.6+ code (C++, Octave, etc...) or parameters in scripts.
 
-turn a piece of code with parameters you want to optimize into a function defined on an n-dimensional continuous data space in which the optimization can easily be performed, and define how these parameters can be mutated and combined together.
-
 ## Variables
 
-6 types of variables are currently provided:
+7 types of variables are currently provided:
 - `Choice(items)`: describes a parameter which can take values within the provided list of (usually unordered categorical) items, and for which transitions are global (from one item to any other item). The returned element will be sampled as the softmax of the values on these dimensions. Be cautious: this process is non-deterministic and makes the function evaluation noisy.
 - `TransitionChoice(items)`: describes a parameter which can take values within the provided list of (usually ordered) items, and for which transitions are local (from one item to close items).
 - `Array(shape)`: describes a `np.ndarray` of any shape. The bounds of the array and the mutation of this array can be specified (see `set_bounds`, `set_mutation`). This makes it a very flexible type of variable. Eg. `Array(shape=(2, 3)).set_bounds(0, 2)` encodes for an array of shape `(2, 3)`, with values bounded between 0 and 2.
 - `Scalar(dtype)`: describes a float (the default) or an int.
   and all `Array` methods are therefore available. Note that `Gaussian(a, b)` is equivalent to `Scalar().affined(a, b)`.
 - `Log(a_min, a_max)`: describes log distributed data between two bounds. Under the hood this uses an `Scalar` with appropriate specifications for bounds and mutations.
-
+- `Instrumentation(*args, **kwargs)`: a container for other parameters. Values of parameters in the `args` will be returned as a `tuple` by `param.args`, and
+  values of parameters in the `kwargs` will be returned as a `dict` by `param.kwargs` (in practice, `param.value == (param.args, param.kwargs)`).
+  This serves to parametrize functions taking multiple arguments, since you can then call the function with `func(*param.args, **param.kwargs)`.
 
 ## Parametrization
 
 Parametrization helps you define the parameters you want to optimize upon.
-Currently most algorithms make use of it to help convert the parameters into the "standardized data" space (a real vector space),
+Currently most algorithms make use of it to help convert the parameters into the "standardized data" space (a vector space spanning all the real values),
 where it is easier to define operations.
 
+Let's define the parametrization for a function taking 3 positional arguments and one keyword argument `value`.
+- `arg1 = ng.p.TransitionChoice(["a", "b"])` is the first positional argument, it encodes the choice through a single index which can mutate in a continuous way.
+- `arg2 = ng.p.Choice(["a", "c", "e"])` is the second one, which can take 3 possible values, without any order, the selection is made stochasticly through the sampling of a softmax. It is encoded by 3 values (the softmax weights) in the "standardized space"
+- third argument will be kept constant to ` blublu`
+- `value = ng.p.Scalar()` which represents a scalar both in the parameter space, and in the "standardized space"
 
+We then define a parameter holding all these parameters, with a standardized space of dimension 5 (as the sum of the dimensions above):
 ```python
-import nevergrad as ng
-
-# argument transformation
-arg1 = ng.p.TransitionChoice(["a", "b"])  # 1st arg. = positional discrete argument
-arg2 = ng.var.Choice(["a", "c", "e"])  # 2nd arg. = positional discrete argument
-value = ng.p.Scalar()  # the 4th arg. is a keyword argument with Gaussian prior
-
-# create the instrumented function
-instrum = ng.p.Instrumentation(arg1, arg2, "blublu", value=value)
-# the 3rd arg. is a positional arg. which will be kept constant to "blublu"
-print(instrum.dimension)  # 5 dimensional space
-
-# The dimension is 5 because:
-# - the 1st discrete variable has 1 possible values, represented by a hard thresholding in
-#   a 1-dimensional space, i.e. we add 1 coordinate to the continuous problem
-# - the 2nd discrete variable has 3 possible values, represented by softmax, i.e. we add 3 coordinates to the continuous problem
-# - the 3rd variable has no uncertainty, so it does not introduce any coordinate in the continuous problem
-# - the 4th variable is a real number, represented by single coordinate.
-
-
-instrum.set_standardized_data([1, -80, -80, 80, 3]))
-# prints (instrum.args, instrum.kwargs): (('b', 'e', 'blublu'), {'value': 7})
-# b is selected because 1 > 0 (the threshold is 0 here since there are 2 values.
-# e is selected because proba(e) = exp(80) / (exp(80) + exp(-80) + exp(-80))
-# value=7 because 3 * std + mean = 7
+instru = ng.p.Instrumentation(arg1, arg2, "blublu", value=value)
+print(instru.dimension)
+>>> 5
 ```
 
 
@@ -62,8 +46,24 @@ def myfunction(arg1, arg2, arg3, value=3):
     print(arg1, arg2, arg3)
     return value**2
 
-optimizer = ng.optimizers.OnePlusOne(instrumentation=instrum, budget=100)
-recommendation = optimizer.minimize(ifunc)
+optimizer = ng.optimizers.OnePlusOne(instrumentation=instru, budget=100)
+recommendation = optimizer.minimize(myfunction)
+print(recommendation.value)
+>>> (('b', 'e', 'blublu'), {'value': -0.00014738768964717153})
+```
+
+
+
+Here is a glipse of what happens on the optimization space:
+```python
+instru.set_standardized_data([1, -80, -80, 80, 3])
+print(instru.args, instru.kwargs)
+>>> (('b', 'e', 'blublu'), {'value': 3.0})
+```
+With this code:
+- b is selected because 1 > 0 (the index is 1 for values above 0, and 0 for values under 0 since there are 2 values).
+- e is selected because proba(e) = exp(80) / (exp(80) + exp(-80) + exp(-80)) = 1
+- `value=3` because the last value of the standardized space (i.e. 3) corresponds to the value of the last kwargs.
 ```
 
 

diff --git a/nevergrad/__init__.py b/nevergrad/__init__.py
@@ -10,6 +10,7 @@
 from .instrumentation import variables as var
 from .parametrization import parameter as p
 
+
 __all__ = ["Instrumentation", "var", "optimizers", "families", "callbacks", "p"]
 
 __version__ = "0.3.1"
diff --git a/nevergrad/benchmark/experiments.py b/nevergrad/benchmark/experiments.py
@@ -56,7 +56,7 @@ def discrete2(seed: Optional[int] = None) -> Iterator[Experiment]:
 
 @registry.register
 def discrete(seed: Optional[int] = None) -> Iterator[Experiment]:
-    """Discrete test bed, including useless variables, 5 values or 2 values per character. 
+    """Discrete test bed, including useless variables, 5 values or 2 values per character.
     Poorly designed, should be reimplemented from scratch using a decent instrumentation."""
     seedg = create_seed_generator(seed)
     names = [n for n in ArtificialFunction.list_sorted_function_names() if "one" in n or "jump" in n]
@@ -191,7 +191,7 @@ def paramultimodal(seed: Optional[int] = None) -> Iterator[Experiment]:
     internal_generator = multimodal(seed, para=True)
     for xp in internal_generator:
         yield xp
-                    
+
 # pylint: disable=redefined-outer-name
 @registry.register
 def yabbob(seed: Optional[int] = None, parallel: bool = False, big: bool = False, noise: bool = False, hd: bool = False) -> Iterator[Experiment]:
@@ -296,11 +296,10 @@ def illcondipara(seed: Optional[int] = None) -> Iterator[Experiment]:
                 yield Experiment(function, optim, budget=budget, num_workers=1, seed=next(seedg))
 
 
-def _positive_sum(args_kwargs: Any) -> bool:
-    args, kwargs = args_kwargs
-    if kwargs or len(args) != 1 or not isinstance(args[0], np.ndarray):
-        raise ValueError(f"Unexpected inputs {args} and {kwargs}")
-    return float(np.sum(args[0])) > 0
+def _positive_sum(data: np.ndarray) -> bool:
+    if not isinstance(data, np.ndarray):
+        raise ValueError(f"Unexpected inputs as np.ndarray, got {data}")
+    return float(np.sum(data)) > 0
 
 
 @registry.register

diff --git a/nevergrad/benchmark/plotting.py b/nevergrad/benchmark/plotting.py
@@ -168,7 +168,6 @@ def create_plots(df: pd.DataFrame, output_folder: PathLike, max_combsize: int =
                     yindices = sorted(set([c[1] for c in df.unique(fixed)]))
                 except TypeError:
                     yindices = list(set([c[1] for c in df.unique(fixed)]))
-
                 for _ in range(len(xindices)):
                     best_algo += [[]]
                 for i in range(len(xindices)):