Skip to content

Commit

Permalink
docs: New documentation about the "Refinements" concept
Browse files Browse the repository at this point in the history
  • Loading branch information
apparentlymart committed Feb 6, 2023
1 parent 7416265 commit 1291057
Show file tree
Hide file tree
Showing 3 changed files with 305 additions and 0 deletions.
5 changes: 5 additions & 0 deletions docs/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,11 @@ promises to never produce an unknown value for an operation unless one of the
operands is itself unknown, and so applications can opt out of this additional
complexity by never providing unknown values as operands.

At minimum an unknown value has a type constraint which describes a set of
types that the final value could possibly have once known. In some cases we
can refine an unknown value with additional dynamic information, using
[Value Refinements](refinements.md).

## Type Equality and Type Conformance

Two types are said to be equal if they are exactly equivalent. Each type kind
Expand Down
9 changes: 9 additions & 0 deletions docs/marks.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,3 +106,12 @@ unmark the whole data structure first (e.g. using `Value.UnmarkDeep`) and then
decide what to do with those marks in order to ensure that if it makes sense
to propagate them through the serialization then they will get represented
somehow.

## Relationship to "Refinements"

The idea of annotating a value with additional information has some overlap
with the concept of [Refinements](refinements.md). However, the two have
different purposes and so different design details and tradeoffs.

For more details, see
[the corresponding section in the Refinements documentation](refinements.md#relationship-to-marks).
291 changes: 291 additions & 0 deletions docs/refinements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,291 @@
# Value Refinements

_Refinements_ are dynamic annotations associated with unknown values that
each shrink the range of possible values futher than can be represented by
type constraint alone.

When an unknown value is refined, it allows certain operations against that
unknown value to produce a known result, and allows some operations to fail
earlier than they would with a fully-unknown value by detecting that a valid
result is impossible using just the refinement information.

Refinements always _shrink_ the range of an unknown value, and never grow it.
That makes it valid for some operations to ignore refinements and just treat
an unknown value as representing any possible value of its type constraint,
which is important to avoid burdening all downstream callers of `cty` from
handling all refinements and from immediately adding support for new kinds of
refinement if this model gets extended in future releases.

However, note that `Value.RawEquals` _does_ take into account refinements, so
any tests that assert against the exact final value of an operation may need
to be updated after adopting a new version of `cty` which makes increased use
of refinements. `Value.RawEquals` is not intended as part of the _user model_
of `cty` and so this should not negatively impact the end-user-visible behavior
of an application using `cty`, although of course they might benefit from
more specific results from operations that can now take refinements into
account.

## How to refine a value

You can derive a more refined value from a less refined value by using the
`Value.Refine` method to obtain a _refinement builder_, which uses the
builder pattern to construct a new value with one or more extra refinements.

```go
val := cty.UnknownVal(cty.String).Refine().
NotNull().
StringPrefix("https://").
NewValue()
```

The above snippet would produce a refined local value whose range is limited
only to non-null strings which start with the prefix `"https://"`. This
information can, in theory, allow `val.Equals(cty.NullVal(cty.String))` to
return `cty.False` rather than `cty.UnknownVal(cty.Bool)`, and allow a prefix
match against the string to return a known result.

In practice not all operations against unknown values can make full use of
unknown value refinements, but hopefully the coverage will increase over time.

Only unknown values can have refinements, because known values are already
refined by their concrete value: simple values like `cty.Zero` are constrained
to exactly one value, while some values like `cty.ListValEmpty(cty.DynamicPseudoType)`
represent a set of possible values -- all empty lists of any element type, in
this case.

However, the `Refine` operation _is_ also supported for known values and in that
case acts as a self-checking assertion that the known value does actually
meet the requirements. If you write your codepaths to unconditionally assign
refinements regardless of whether the value is known then your code will
self-check and raise a panic if the final known value doesn't match the
previously-promised refinements.

A similar rule applies to applying new refinements already-refined values: it's
fine to describe a less specific refinement, which will therefore be ignored
because it adds no new information. It's an application bug to describe a
contradictory refinement, such as a new string prefix that doesn't match one
previously assigned.

## Value ranges

The `Refine()` method described above constructs a value with refinements. To
access the information from those refinements, use the `Value.Range` method to
obtain a `cty.ValueRange` object, which describes a superset of all of the
values that a particular value could have.

For example, you can use `val.Range().DefinitelyNotNull()` to test whether a
particular value is guaranteed to be non-null once it is finally known. This
again works for both known and unknown values, so e.g.
`cty.StringVal("foo").Range().DefinitelyNotNull()` returns `true` because
a known, non-null string value is _definitely not null_.

When writing operations that depend only on information that can be determined
from refinements it's valid to depend exclusively on `Value.Range` and rely on
the fact that the range of an already-known value is just a very narrow range
that covers only what that specific value covers.

The model of value ranges is imprecise, though: it's limited only to information
we can track for unknown values through refinements. Many operations will still
need a special codepath to handle the unknown case vs. the known case so they
can take into account the additional detail from the exact value once known.

## Available Refinements

The set of possible refinement types might grow over time, but the initial set
is focused on a narrow set of possibilities that seems likely to allow a number
of other operations to either produce known results from unknown input or to
rule that particular input is invalid despite not yet being known.

The most notable restriction on refinements is that the available refinements
vary depending on the type constraint of the value being refined.

The least flexible case is `cty.DynamicVal` -- an unknown value of an unknown
type -- which is the one value that cannot be refined at all and will cause
a panic if you try. This is a pragmatic compromise for backward compatibility:
existing callers use patterns like `val == cty.DynamicVal` to test for this
specific special value, and any refinements of that value would make it no
longer equal.

Unknown values of built-in exact types, and also unknown values whose type
_kind_ is constrained even if the element/attribute types are not, can at
least be refined as being non-null, and because that is a common situation
there is a shorthand for it which avoids using the builder pattern:
`val.RefineNotNull()`.

All other possible refinements are type-constraint-specific:

* `cty.String`

For strings we can refine a known prefix of the string, which is intended
for situations where the string represents some microsyntax with a
known prefix, such as a URL of a particular known scheme.

* `.StringPrefix(string)` specifies a known prefix of the final string.

By default an unknown string has no known prefix, which is the same
as the prefix being the empty string.

Because `cty`'s model of strings is a sequence of Unicode grapheme
clusters, `.StringPrefix` will quietly disregard trailing Unicode
code units of the given prefix that might combine with other code
units to form a new combined grapheme. This is a good safe default
behavior for situations where the remainder of the string is under
end-user control and might begin with combining diacritics or
emoji variation sequences. Applications should not rely on the
details of this heuristic because it may become more precise in
later releases.

* `.StringPrefixFull(string)` is like `.StringPrefix` but does not trim
possibly-combining code units from the end of the given string.

Applications must use this with care, making sure that they control
the final string enough to guarantee that the subsequent additional
code units will never combine with any characters in the given prefix.

* `cty.Number`

For numbers we can refine both the lower and upper bound of possible values,
with each boundary being either inclusive or exclusive.

* `.NumberRangeLowerBound(cty.Value, bool)` refines the lower bound of
possible values for an unknown number. The boolean argument represents
whether the bound is _inclusive_.

The given value must be a non-null `cty.Number` value. An unrefined
number effectively has a lower bound of `(cty.NegativeInfinity, true)`.

* `.NumberRangeUpperBound(cty.Value, bool)` refines the upper bound of
possible values for an unknown number. The boolean argument represents
whether the bound is _inclusive_.

The given value must be a non-null `cty.Number` value. An unrefined
number effectively has an upper bound of `(cty.PositiveInfinity, true)`.

* `.NumberRangeInclusive(min, max cty.Value)` is a helper wrapper around
the previous two methods that declares both an upper and lower bound
at the same time, while specifying that both are inclusive bounds.

* `cty.List`, `cty.Set`, and `cty.Map` types

For all collection types we can refine the lower and upper bound of the
length of the collection. The boundaries on length are always inclusive
and are integers, because it isn't possible to have a fraction of an
element.

* `.CollectionLengthLowerBound(int)` refines the lower bound of possible
lengths for an unknown collection.

An unrefined collection effectively has a lower bound of zero, because
it's not possible for a collection to have a negative length.

* `.CollectionLengthUpperBound(int)` refines the upper bound of possible
lengths for an unknown collection.

An unrefined collection has an upper bound that matches the largest
valid Go slice index on the current platform, because `cty`'s
collections are implemented in terms of Go's collection types.
However, applications should typically not expose that specific value
to users (it's an implementation detail) and should instead present
the maximum value as an unconstrained length.

* `.CollectionLength(int)` is a shorthand that refines both the lower and
upper bounds to the same value. This is a helpful requirement to make
whenever possible because it will often allow the final value to be
a known collection with unknown elements, as described in
[Refinement Value Collapse](#refinement-value-collapse).

Some built-in operations will automatically take into account refinements from
their input operands and propagate them in a suitable way to the result.
However, that is not a guarantee for all operations and so should be treated
as a "best effort" behavior which will hopefully become more precise in future
versions.

Behaviors implemented in downstream applications, such as custom functions
using [the function system](functions.md), might also take into account
refinements. If they do their work using only _operation methods_ on `Value`
then the handling of refinements might come for free. If they do work using
_integration methods_ instead then they will need to explicitly handle
refinements if desired. If they don't then by default the result from an
unknown input will be a totally-unrefined unknown value, though will hopefully
still have a useful type constraint.

## Refinement Value Collapse

For some kinds of refinement it's possible to constrain the range so much that
only one possible value remains. In that case, the `.NewValue()` method of the
refinement builder might return a known value instead of an unknown value.

For example, if the lower bound and upper bound of a collection's length are
equal then the length of the collection is effectively known. For some lengths
of some collection kinds the refinement can collapse into a known collection
containing unknown values. For example, an unknown list that's known to have
exactly two values can be represented equivalently as a known list of length
two where both elements are unknown themselves.

The exact details of how refinement collapse is decided might change in future
versions, but only in ways that can make results "more known". It would be a
breaking change to weaken a rule to produce unknown values in more cases, so
that kind of change would be reserved only for fixing an important bug or
design error.

## Refinements are Dynamic Only

Refinements belong to unknown values rather than to type constraints, and so
refining an unknown value does not change its type constraint.

This design is a tradeoff: making the refinements dynamic and implicit means
that it's possible to add more detailed refinements over type without making
breaking changes to explicit type information, but the downside is that
it isn't possible to represent refinements in any situation that is only
aware of types.

For example, it isn't currently possible to represent the idea of an unknown
map whose elements each have a further refinement applied, because the
refinements apply to the map itself and there are not yet any specific element
values for the element refinements to attach to.

(It would be possible in theory to allow refining an unknown collection with
meta-refinements about its hypothetical elements, but that is not currently
supported because it would mean that refinements would need to be resolved
recursively and that would be considerably more complex and expensive than
the current single-value-only refinements structure.)

## Refinements Under Serialization

Refinements are intentionally designed so that they only constrain the range
of an unknown value, and never expand it. This means that it should typically
be safe to discard refinements in situations like serialization where there
may not be any way to represent the refinements. After decoding the unknown
value now has a wider range but it should still be a superset of the true
range of the value. This is an example of the general rule that no operation
on an unknown value is _guaranteed_ to fully preserve the input refinements
or to consider them when calculating the result.

The official MessagePack serialization in particular does have some support
for retaining approximations of refinements as part of its serialization of
unknown values, using a MessagePack extension value. Some detail may still
be lost under round-tripping but the output range should always be a superset
of the input range. As long as both the serializer and deserializer are using
the `cty/msgpack` sub-package unknown values will propagate automatically
without any additional caller effort.

## Relationship to "Marks"

The idea of annotating a value with additional information has some overlap
with the concept of [Marks](marks.md). However, the two have different purposes
and so different design details and tradeoffs.

Marks should typically be used for additional information that is independent
of the specific type and value, such as marking a value as having come from
a sensitive location. The marking then propagates to all results from operations
on that value, usually without changing the behavior of that operation. In a
sense the mark represents the _origin_ of the value rather than the value
itself.

Refinements are instead directly part of the value. By reducing the possible
range of an unknown value placeholder, other downstream operations can in turn
produce a more refined result, or possibly even a known result from unknown
inputs. Refinements do not naively propagate from one value to the next, but
some operations will use the refinements of their operands to calculate a new
set of refiments for their result, with the rules varying on a case-by-case
basis depending on what calculation the operation represents.

0 comments on commit 1291057

Please sign in to comment.