Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GAP-34] Demand&Offer Specification Language Grammar #83

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
gap: GAP-34
title: Demand/Offer Language Grammar Specifications
description: Grammar Specifications for elements of Demand&Offer language
author: stranger80
status: Draft
type: Standards
---

## Abstract
The Demand&Offer Specification Language [article](https://golem-network.gitbook.io/golem-infrastructure-documentation-develop/architecture/golem-demand-and-offer-specification-language) created as a description for a Golem building block, does only include an informal specification of the language (eg. constraints filter expressions) grammar. An implementation of expression resolver has been created as part of `yagna` reference implementation, however this implementation is not fully consistent, and fails on processing of a number of edge cases. Therefore a formal grammar specification is proposed, with clarifying enhancements, so that a non-ambiguous point of reference exists.

## Motivation
The proposed grammar fixes a number of errors in the `yagna` reference implementation of the Golem constraint expression resolver, and removes a number of known ambiguities of the original Demand&Offer Specification Language article.

## Specification
The Demand&Offer constraints expression grammar is specified [here](../../standards/constraints_grammar.md).

## Rationale
The proposed grammar has been largely based on LDAP Filters [RFC 4515](https://www.rfc-editor.org/rfc/rfc4515). This reference has been consciously selected to avoid "reinventing the wheel" and only modify the existing specification to suit the intended purpose.

## Backwards Compatibility
The proposed enhancements to the constraint expression language are not 100% backwards compatible, however it is assumed that currently the only working Demand/Offer implementations are the ones maintained by Golem Factory. None of the "breaking" edge cases are used by current Golem Factory-maintained software.

## Security Considerations
N/A

## Copyright
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
5 changes: 4 additions & 1 deletion standards/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Golem Computation Resource Standards

## Version 0.7.1
## Version 0.8.0

This repository defines a framework for the specification of Standards for Computing Resources available on Golem ecosystem. It is supplementary to Golem Demand & Offer Specification Language and is meant to prescriptively implement structure in a broad space of conceivable services.

Expand Down Expand Up @@ -61,3 +61,6 @@ Properties (and usage conters) marked as `Deprecated` are still supported in the

## Standard properties - Cheat sheet
[Cheat sheet](cheat_sheet.md)

## References
[Demand & Offer Specification Language - Constraints Grammar](./constraints_grammar.md)
136 changes: 136 additions & 0 deletions standards/constraints_grammar.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# Demand & Offer Specification Language - Constraints Grammar

## Demand & Offer Constraint Definition

The string representation of an LDAP search filter is a string of
UTF-8 [RFC3629] encoded Unicode characters [Unicode] that is defined
by the following grammar, following the ABNF notation defined [here](https://www.rfc-editor.org/rfc/rfc822).
The filter format uses a prefix notation.

```
filter = LPAREN filtercomp RPAREN
filtercomp = and / or / not / item
and = AMPERSAND filterlist
or = VERTBAR filterlist
not = EXCLAMATION filter
filterlist = 1*filter
item = simple / present / substring / extensible
simple = attr filtertype assertionvalue
filtertype = equal / approx / greaterorequal / lessorequal
equal = EQUALS
greater = RANGLE
less = LANGLE
greaterorequal = RANGLE EQUALS
lessorequal = LANGLE EQUALS
present = attr EQUALS ASTERISK
substring = attr EQUALS [initial] any [final]
initial = assertionvalue
any = ASTERISK *(assertionvalue ASTERISK)
final = assertionvalue
assertionvalue = valueencoding
attr = attribute / typedattribute
attribute = attributedescription / attrwithaspect
attrwithaspect = attributedescription LBRACKET aspect RBRACKET
aspect = attributedescription
typedattribute = attribute DOLLAR typecharacter
typecharacter = "v" / "d" / "t"
valueencoding = 0*(normal / escaped)
normal = UTF1SUBSET
escaped = ESC HEX HEX
UTF1SUBSET = <see below>
EXCLAMATION = "!"
AMPERSAND = "&"
ASTERISK = "*"
VERTBAR = "|"
EQUALS = "="
RANGLE = ">"
LANGLE = "<"
RBRACKET = "]"
LBRACKET = "["
ESC = "\"
DOLLAR = "$"
HEX = "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" / "a" / "A" / "b" / "B" / "c" / "C" / "d" / "D" / "e" / "E" / "f" / "F"

```

### `attributedescription`

The `attributedescription` rule specifies the allowed attribute names.
An attribute name may consist of any Unicode character apart from the following:

`(`, `)`, `[`, `]`, `=`, `<`, `>`, `\`, `$`

### `valueencoding`

The `valueencoding` rule ensures that the entire filter string is a
valid UTF-8 string and provides that the octets that represent the
ASCII characters `*` (ASCII 0x2a), `(` (ASCII 0x28), `)` (ASCII
0x29), `\` (ASCII 0x5c), and `NUL` (ASCII 0x00) are represented as a
backslash `\` (ASCII 0x5c) followed by the two hexadecimal digits
representing the value of the encoded octet.

This simple escaping mechanism eliminates filter-parsing ambiguities
and allows any filter that can be represented in LDAP to be
represented as a NUL-terminated string. Other octets that are part
of the `normal` set may be escaped using this mechanism, for example,
non-printing ASCII characters.

For AssertionValues that contain UTF-8 character data, each octet of
the character to be escaped is replaced by a backslash and two hex
digits, which form a single octet in the code of the character. For
example, the filter checking whether the `cn` attribute contained a
value with the character `*` anywhere in it would be represented as
`(cn=*\2a*)`.


### AttributeDescription

An `AttributeDescription` is a string of characters which may include any characters apart of:

Character
------------------------------
=
<
>
(
)
[
]
$
\


### Whitespaces

Any whitespace characters separating constraint expression `filter` elements specified in the grammar above - shall be ignored. This implies, that constraint expression can be freely formatted (eg. indentation, etc), as long as the formatting whitespace is added outside of individual filter expressions (delimited with `()`).

## Examples

This section gives a few examples of search filters written using
this notation.

(cn=Babs Jensen)
(!(cn=Tim Howes))
(&(objectClass=Person)(|(sn=Jensen)(cn=Babs J*)))
(o=univ*of*mich*)

The following examples illustrate the use of the escaping mechanism.

(o=Parens R Us \28for all your parenthetical needs\29)
(cn=*\2A*)
(filename=C:\5cMyFile)
(bin=\00\00\00\04)
(sn=Lu\c4\8di\c4\87)

The first example shows the use of the escaping mechanism to
represent parenthesis characters. The second shows how to represent a
"*" in a value, preventing it from being interpreted as a substring
indicator. The third illustrates the escaping of the backslash
character.

The fourth example shows a filter searching for the four-byte value
0x00000004, illustrating the use of the escaping mechanism to
represent arbitrary data, including NUL characters.

The final example illustrates the use of the escaping mechanism to
represent various non-ASCII UTF-8 characters.