Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Candidate by a Parameter #459

Merged
merged 21 commits into from
Jan 29, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,18 @@

## master

- `Candidate` class is removed, and is completely replaced by `Parameter` [#459](https://github.com/facebookresearch/nevergrad/pull/459)
- New parametrization is now as efficient as in v0.3.0 (see CHANGELOG for v0.3.1 for contect)
- `CandidateMaker` (`optimizer.create_candidate`) raises `DeprecationWarning`s since it new candidates/parameters
can be straightforwardly created (`parameter.spawn_child(new_value=new_value)`)
- Optimizers can now hold any parametrization, not just `Instrumentation`. This for instance mean that when you
do `OptimizerClass(instrumentation=12, budget=100)`, the instrumentation (and therefore the candidates) will be of class
`ng.p.Array` (and not `ng.p.Instrumentation`), and their attribute `value` will be the corresponding `np.ndarray` value.
You can still use `args` and `kwargs` if you want, but it's no more needed!
- Old `instrumentation` classes now raise deprecation warnings, and will disappear in versions >0.3.2.
Hence, prefere using parameters from `ng.p` than `ng.var`, and avoid using `ng.Instrumentation` altogether if
you don't need it anymore (or import it through `ng.p.Instrumentation`)

## v0.3.1 (2019-01-23)

**Note**: this is the first step to propagate the instrumentation/parametrization framework.
Expand Down
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,15 +72,14 @@ def square(x):
optimizer = ng.optimizers.OnePlusOne(instrumentation=2, budget=100)
recommendation = optimizer.minimize(square)
print(recommendation) # optimal args and kwargs
>>> Candidate(args=(array([0.500, 0.499]),), kwargs={})
>>> Array{(2,)}[recombination=average,sigma=1.0]:[0.49971112 0.5002944 ]
```

`recommendation` holds the optimal attributes `args` and `kwargs` found by the optimizer for the provided function.
In this example, the optimal value will be found in `recommendation.args[0]` and will be a `np.ndarray` of size 2.

`instrumentation=n` is a shortcut to state that the function has only one variable, of dimension `n`,
See the [instrumentation tutorial](docs/instrumentation.md) for more complex instrumentations.

`recommendation` holds the optimal value(s) found by the for the provided function. It can be
directly accessed through `recommendation.value` which is here a `np.ndarray` of size 2.

You can print the full list of optimizers with:
```python
Expand Down
47 changes: 21 additions & 26 deletions docs/machinelearning.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,29 +7,24 @@ def myfunction(lr, num_layers, arg3, arg4, other_anything):
return -accuracy # something to minimize
```

You should define how it must be instrumented, i.e. what are the arguments you want to optimize upon, and on which space they are defined. If you have both continuous and discrete parameters, you have a good initial guess, maybe just use `OrderedDiscrete`, `UnorderedDiscrete` for all discrete variables, `Array` for all your continuous variables, and use `PortfolioDiscreteOnePlusOne` as optimizer.
You should define how it must be instrumented, i.e. what are the arguments you want to optimize upon, and on which space they are defined. If you have both continuous and discrete parameters, you have a good initial guess, maybe just use `TransitionChoice` for all discrete variables, `Array` for all your continuous variables, and use `PortfolioDiscreteOnePlusOne` as optimizer.

```python
import nevergrad as ng
# instrument learning rate and number of layers, keep arg3 to 3 and arg4 to 4
lr = ng.var.Log(0.0001, 1) # log distributed between 0.001 and 1
num_layers = ng.var.OrderedDiscrete([4, 5, 6])
instrumentation = ng.Instrumentation(lr, num_layers, 3., arg4=4)
lr = ng.p.Log(a_min=0.0001, a_max=1) # log distributed between 0.001 and 1
num_layers = ng.p.TransitionChoice([4, 5, 6])
instrumentation = ng.p.Instrumentation(lr, num_layers, 3., arg4=4)
```
Make sure `instrumentation.value` holds your initial guess. It is automatically populated, but can be updated manually (just set `value` to what you want.

Just take care that the default value (your initial guess) is at the middle in the list of possible values for `OrderedDiscrete`, and 0 for `Array` (you can modify this with `Array` methods). You can check that things are correct by checking that for zero you get the default:
```python
args, kwargs = instrumentation.data_to_arguments([0] * instrumentation.dimension)
print(args, kwargs)
```

The fact that you use ordered discrete variables is not a big deal because by nature `PortfolioDiscreteOnePlusOne` will ignore the order. This algorithm is quite stable.
The fact that you use (ordered) discrete variables through `TransitionChoice` is not a big deal because by nature `PortfolioDiscreteOnePlusOne` will ignore the order. This algorithm is quite stable.

If you have more budget, a cool possibility is to use `CategoricalSoftmax` for all discrete variables and then apply `TwoPointsDE`. You might also compare this to `DE` (classical differential evolution). This might need a budget in the hundreds.
If you have more budget, a cool possibility is to use `Choice` for all discrete variables and then apply `TwoPointsDE`. You might also compare this to `DE` (classical differential evolution). This might need a budget in the hundreds.

If you want to double-check that you are not worse than random search, you might use `RandomSearch`.

If you want something fully parallel (the number of workers can be equal to the budget), then you might use `ScrHammersleySearch`, which includes the discrete case. Then, you should use `OrderedDiscrete` rather than `CategoricalSoftmax`. This does not have the traditional drawback of grid search and should still be more uniform than random. By nature `ScrHammersleySearch` will deal correctly with `OrderedDiscrete` type for discrete variables.
If you want something fully parallel (the number of workers can be equal to the budget), then you might use `ScrHammersleySearch`, which includes the discrete case. Then, you should use `TransitionChoice` rather than `CategoricalSoftmax`. This does not have the traditional drawback of grid search and should still be more uniform than random. By nature `ScrHammersleySearch` will deal correctly with `TransitionChoice` type for discrete variables.

If you are optimizing weights in reinforcement learning, you might use `TBPSA` (high noise) or `CMA` (low noise).

Expand Down Expand Up @@ -58,7 +53,7 @@ print("Optimization of continuous hyperparameters =========")
def train_and_return_test_error(x):
return np.linalg.norm([int(50. * abs(x_ - 0.2)) for x_ in x])

instrumentation = ng.Instrumentation(ng.var.Array(300)) # optimize on R^300
instrumentation = ng.p.Array(300) # optimize on R^300

budget = 1200 # How many trainings we will do before concluding.

Expand Down Expand Up @@ -130,16 +125,16 @@ This function must then be instrumented in order to let the optimizer now what a
import nevergrad as ng
# argument transformation
# Optimization of mixed (continuous and discrete) hyperparameters.
arg1 = ng.var.OrderedDiscrete(["a", "b"]) # 1st arg. = positional discrete argument
arg1 = ng.p.TransitionChoice(["a", "b"]) # 1st arg. = positional discrete argument
# We apply a softmax for converting real numbers to discrete values.
arg2 = ng.var.SoftmaxCategorical(["a", "c", "e"]) # 2nd arg. = positional discrete argument
value = ng.var.Gaussian(mean=1, std=2) # the 4th arg. is a keyword argument with Gaussian prior
arg2 = ng.p.Choice(["a", "c", "e"]) # 2nd arg. = positional discrete argument
value = ng.p.Scalar(init=1.0).set_mutation(sigma=2) # the 4th arg. is a keyword argument with Gaussian prior

# create the instrumentation
# the 3rd arg. is a positional arg. which will be kept constant to "blublu"
instrumentation = ng.Instrumentation(arg1, arg2, "blublu", value=value)
instru = ng.Instrumentation(arg1, arg2, "blublu", value=value)

print(instrumentation.dimension) # 5 dimensional space
print(instru.dimension) # 5 dimensional space
```

The dimension is 5 because:
Expand All @@ -149,22 +144,22 @@ The dimension is 5 because:
- the 4th var. is a real number, represented by single coordinate.

```python
args, kwargs = instrumentation.data_to_arguments([1, -80, -80, 80, 3])
print(args, kwargs)
>>> ('b', 'e', 'blublu') {'value': 7}
myfunction(*args, **kwargs)
>>> 8
instru.set_standardized_data([1, -80, -80, 80, 3])
print(instru.args, instru.kwargs)
>>> (('b', 'e', 'blublu'), {'value': 7.0})
myfunction(*instru.args, **instru.kwargs)
>>> 8.0
```

In this case:
- `args[0] == "b"` because 1 > 0 (the threshold is 0 here since there are 2 values.
- `args[1] == "e"` is selected because proba(e) = exp(80) / (exp(80) + exp(-80) + exp(-80)) = 1
- `args[2] == "blublu"` because it is kept constant
- `value == 7` because std * 3 + mean = 2 * 3 + 1 = 7
- `value == 7` because std * 3 + current_value = 2 * 3 + 1 = 7
The function therefore returns 7 + 1 = 8.


Then you can run the optimization as usual. PortfolioDiscreteOnePlusOne is quite a natural choice when you have a good initial guess and a mix of discrete and continuous variables; in this case, it might be better to use `OrderedDiscrete` rather than `SoftmaxCategorical`.  
Then you can run the optimization as usual. `PortfolioDiscreteOnePlusOne` is quite a natural choice when you have a good initial guess and a mix of discrete and continuous variables; in this case, it might be better to use `TransitionChoice` rather than `Choice`.  
`TwoPointsDE` is often excellent in the large scale case (budget in the hundreds).

```python
Expand Down
4 changes: 2 additions & 2 deletions docs/optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,12 +110,12 @@ All algorithms have strengths and weaknesses. Questionable rules of thumb could
## Optimizing machine learning hyperparameters

When optimizing hyperparameters as e.g. in machine learning. If you don't know what variables (see [instrumentation](instrumentation.md)) to use:
- use `SoftmaxCategorical` for discrete variables
- use `Choice` for discrete variables
- use `TwoPointsDE` with `num_workers` equal to the number of workers available to you.
See the [machine learning example](machinelearning.md) for more.

Or if you want something more aimed at robustly outperforming random search in highly parallel settings (one-shot):
- use `OrderedDiscrete` for discrete variables, taking care that the default value is in the middle.
- use `TransitionChoice` for discrete variables, taking care that the default value is in the middle.
- Use `ScrHammersleySearchPlusMiddlePoint` (`PlusMiddlePoint` only if you have continuous parameters or good default values for discrete parameters).


Expand Down
68 changes: 34 additions & 34 deletions docs/parametrization.md
Original file line number Diff line number Diff line change
@@ -1,58 +1,42 @@
# Parametrization

**Please note that parametrization is still a work in progress with heavy changes comming soon! We are trying to update it to make it simpler and simpler to use (all feedbacks are welcome ;) ), with the side effect that there will be breaking changes.**
**Please note that parametrization is still a work in progress and changes are on their way (including for this documentation)! We are trying to update it to make it simpler and simpler to use (all feedbacks are welcome ;) ), with the side effect that there will be breaking changes.**

The aim of parametrization is to specify what are the parameters that the optimization should be performed upon.
The parametrization subpackage will help you do thanks to:
- the `parameter` modules providing classes that should be used to specify each parameter.
- the `parameter` modules (accessed by the shortcut `nevergrad.p`) providing classes that should be used to specify each parameter.
- the `FolderFunction` which helps transform any code into a Python function in a few lines. This can be especially helpful to optimize parameters in non-Python 3.6+ code (C++, Octave, etc...) or parameters in scripts.

turn a piece of code with parameters you want to optimize into a function defined on an n-dimensional continuous data space in which the optimization can easily be performed, and define how these parameters can be mutated and combined together.

## Variables

6 types of variables are currently provided:
7 types of variables are currently provided:
- `Choice(items)`: describes a parameter which can take values within the provided list of (usually unordered categorical) items, and for which transitions are global (from one item to any other item). The returned element will be sampled as the softmax of the values on these dimensions. Be cautious: this process is non-deterministic and makes the function evaluation noisy.
- `TransitionChoice(items)`: describes a parameter which can take values within the provided list of (usually ordered) items, and for which transitions are local (from one item to close items).
- `Array(shape)`: describes a `np.ndarray` of any shape. The bounds of the array and the mutation of this array can be specified (see `set_bounds`, `set_mutation`). This makes it a very flexible type of variable. Eg. `Array(shape=(2, 3)).set_bounds(0, 2)` encodes for an array of shape `(2, 3)`, with values bounded between 0 and 2.
- `Scalar(dtype)`: describes a float (the default) or an int.
and all `Array` methods are therefore available. Note that `Gaussian(a, b)` is equivalent to `Scalar().affined(a, b)`.
- `Log(a_min, a_max)`: describes log distributed data between two bounds. Under the hood this uses an `Scalar` with appropriate specifications for bounds and mutations.

- `Instrumentation(*args, **kwargs)`: a container for other parameters. Values of parameters in the `args` will be returned as a `tuple` by `param.args`, and
values of parameters in the `kwargs` will be returned as a `dict` by `param.kwargs` (in practice, `param.value == (param.args, param.kwargs)`).
This serves to parametrize functions taking multiple arguments, since you can then call the function with `func(*param.args, **param.kwargs)`.

## Parametrization

Parametrization helps you define the parameters you want to optimize upon.
Currently most algorithms make use of it to help convert the parameters into the "standardized data" space (a real vector space),
Currently most algorithms make use of it to help convert the parameters into the "standardized data" space (a vector space spanning all the real values),
where it is easier to define operations.

Let's define the parametrization for a function taking 3 positional arguments and one keyword argument `value`.
- `arg1 = ng.p.TransitionChoice(["a", "b"])` is the first positional argument, it encodes the choice through a single index which can mutate in a continuous way.
- `arg2 = ng.p.Choice(["a", "c", "e"])` is the second one, which can take 3 possible values, without any order, the selection is made stochasticly through the sampling of a softmax. It is encoded by 3 values (the softmax weights) in the "standardized space"
- third argument will be kept constant to ` blublu`
- `value = ng.p.Scalar()` which represents a scalar both in the parameter space, and in the "standardized space"

We then define a parameter holding all these parameters, with a standardized space of dimension 5 (as the sum of the dimensions above):
```python
import nevergrad as ng

# argument transformation
arg1 = ng.p.TransitionChoice(["a", "b"]) # 1st arg. = positional discrete argument
arg2 = ng.var.Choice(["a", "c", "e"]) # 2nd arg. = positional discrete argument
value = ng.p.Scalar() # the 4th arg. is a keyword argument with Gaussian prior

# create the instrumented function
instrum = ng.p.Instrumentation(arg1, arg2, "blublu", value=value)
# the 3rd arg. is a positional arg. which will be kept constant to "blublu"
print(instrum.dimension) # 5 dimensional space

# The dimension is 5 because:
# - the 1st discrete variable has 1 possible values, represented by a hard thresholding in
# a 1-dimensional space, i.e. we add 1 coordinate to the continuous problem
# - the 2nd discrete variable has 3 possible values, represented by softmax, i.e. we add 3 coordinates to the continuous problem
# - the 3rd variable has no uncertainty, so it does not introduce any coordinate in the continuous problem
# - the 4th variable is a real number, represented by single coordinate.


instrum.set_standardized_data([1, -80, -80, 80, 3]))
# prints (instrum.args, instrum.kwargs): (('b', 'e', 'blublu'), {'value': 7})
# b is selected because 1 > 0 (the threshold is 0 here since there are 2 values.
# e is selected because proba(e) = exp(80) / (exp(80) + exp(-80) + exp(-80))
# value=7 because 3 * std + mean = 7
instru = ng.p.Instrumentation(arg1, arg2, "blublu", value=value)
print(instru.dimension)
>>> 5
```


Expand All @@ -62,8 +46,24 @@ def myfunction(arg1, arg2, arg3, value=3):
print(arg1, arg2, arg3)
return value**2

optimizer = ng.optimizers.OnePlusOne(instrumentation=instrum, budget=100)
recommendation = optimizer.minimize(ifunc)
optimizer = ng.optimizers.OnePlusOne(instrumentation=instru, budget=100)
recommendation = optimizer.minimize(myfunction)
print(recommendation.value)
>>> (('b', 'e', 'blublu'), {'value': -0.00014738768964717153})
```



Here is a glipse of what happens on the optimization space:
```python
instru.set_standardized_data([1, -80, -80, 80, 3])
print(instru.args, instru.kwargs)
>>> (('b', 'e', 'blublu'), {'value': 3.0})
```
With this code:
- b is selected because 1 > 0 (the index is 1 for values above 0, and 0 for values under 0 since there are 2 values).
- e is selected because proba(e) = exp(80) / (exp(80) + exp(-80) + exp(-80)) = 1
- `value=3` because the last value of the standardized space (i.e. 3) corresponds to the value of the last kwargs.
```


Expand Down
1 change: 1 addition & 0 deletions nevergrad/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from .instrumentation import variables as var
from .parametrization import parameter as p


__all__ = ["Instrumentation", "var", "optimizers", "families", "callbacks", "p"]

__version__ = "0.3.1"
13 changes: 6 additions & 7 deletions nevergrad/benchmark/experiments.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ def discrete2(seed: Optional[int] = None) -> Iterator[Experiment]:

@registry.register
def discrete(seed: Optional[int] = None) -> Iterator[Experiment]:
"""Discrete test bed, including useless variables, 5 values or 2 values per character.
"""Discrete test bed, including useless variables, 5 values or 2 values per character.
Poorly designed, should be reimplemented from scratch using a decent instrumentation."""
seedg = create_seed_generator(seed)
names = [n for n in ArtificialFunction.list_sorted_function_names() if "one" in n or "jump" in n]
Expand Down Expand Up @@ -191,7 +191,7 @@ def paramultimodal(seed: Optional[int] = None) -> Iterator[Experiment]:
internal_generator = multimodal(seed, para=True)
for xp in internal_generator:
yield xp

# pylint: disable=redefined-outer-name
@registry.register
def yabbob(seed: Optional[int] = None, parallel: bool = False, big: bool = False, noise: bool = False, hd: bool = False) -> Iterator[Experiment]:
Expand Down Expand Up @@ -296,11 +296,10 @@ def illcondipara(seed: Optional[int] = None) -> Iterator[Experiment]:
yield Experiment(function, optim, budget=budget, num_workers=1, seed=next(seedg))


def _positive_sum(args_kwargs: Any) -> bool:
args, kwargs = args_kwargs
if kwargs or len(args) != 1 or not isinstance(args[0], np.ndarray):
raise ValueError(f"Unexpected inputs {args} and {kwargs}")
return float(np.sum(args[0])) > 0
def _positive_sum(data: np.ndarray) -> bool:
if not isinstance(data, np.ndarray):
raise ValueError(f"Unexpected inputs as np.ndarray, got {data}")
return float(np.sum(data)) > 0


@registry.register
Expand Down
1 change: 0 additions & 1 deletion nevergrad/benchmark/plotting.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,6 @@ def create_plots(df: pd.DataFrame, output_folder: PathLike, max_combsize: int =
yindices = sorted(set([c[1] for c in df.unique(fixed)]))
except TypeError:
yindices = list(set([c[1] for c in df.unique(fixed)]))

for _ in range(len(xindices)):
best_algo += [[]]
for i in range(len(xindices)):
Expand Down
Loading