Skip to content

Commit

Permalink
Major overhaul of the AST infrastructure.
Browse files Browse the repository at this point in the history
High-level summary of the most important changes:

- Restructure the main AST pass loop. Instead of running passes from all
  plugins interleaved at the same time, we now run plugins sequentially;
  specifically first Spicy, then HILTI. After each plugin's run, we have
  a fully resolved AST that's then transformed into the starting AST for
  the subsequent plugin.

  Note that the Spicy plugin still calls into the HILTI plugin for
  resolving its AST: that's because the two representations share a lot
  of semantics so that we can outsource a bunch of stuff that the Spicy
  AST needs to do to the HILTI plugin --- it's the same logic.

  As part of this, we regroup the visitors: the
  `{resolver,operator-resolver,id-resolver}.cc` are mostly merged into a
  single `resolver.cc`, plus a new `normalizer.cc` where some stuff
  moved that enforces some constraints on the AST before anything
  else. `visitors/apply-coercions.cc` moved to `visitors/coercer.cc`,
  and the former `coercer.cc` moved out of the visitors directory one
  level up into `coercion.cc` because it's not an AST pass.

- New AST structure that doesn't compute anything dynamically anymore;
  all information is stored in AST nodes. We now almost exclusively pass
  around references to nodes, no longer copies. We got some new helper
  classes for that: `hilti::optional_ref<T>`, `hilti::node::Set<T>`,
  `hilti::node::Range<T>`.

  This especially includes replacing all dynamically computed types with
  AST nodes, getting rid of `type::effectiveType()`.

  No more static setters, we modify nodes directly now. However, when
  using typed nodes, we pass them around as `const`. When one needs to
  modify a node, that therefore usually involves a cast:
  `p.node.as<T>.setXYZ()`. In a few cases, it needs a `const_cast` of a
  typed reference.

  None more (public) setters returning `Node` references. New methods
  with names ending in `*Ref` return `NodeRef` instead.

  For more on the new structure and conventions, also see the Wiki:
  https://github.com/zeek/spicy/wiki/Notes-on-the-new-internal-HILTI-Spicy-AST-structure

- Declarations now get globally unique "canonical IDs'.

- Struct fields are now declarations.

- Functions have a parent type that for methods will point to their
  struct/unit type.

- `typeID()` and `cxxID()` moved up a level to become properties of
  `Type` (vs. being properties of the encapsulated specific type
  instantiations).

- Using singletons for `type::Auto` and `type::Void`.

- Remove "original nodes" and also most uses of "preserved nodes".

- Remove AST-internal node caching.

- AST dumps include node addresses to track identity.

- We pass compiler `Unit`s around through `shared_ptr`, because they have a
  clear identity while being stored at various places.
  • Loading branch information
rsmmr committed Sep 21, 2021
1 parent 446992c commit b098367
Show file tree
Hide file tree
Showing 408 changed files with 11,788 additions and 9,132 deletions.
6 changes: 6 additions & 0 deletions CHANGES
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@

1.3.0-dev.101 | 2021-09-21 12:10:25 +0200

* Major internal overhaul of the AST infrastructure. (Robin Sommer,
Corelight)

1.3.0-dev.100 | 2021-09-17 13:18:50 +0200

* Fix lints in rst files. (Benjamin Bannier, Corelight)
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.3.0-dev.100
1.3.0-dev.101
29 changes: 21 additions & 8 deletions doc/autogen/spicy-types.spicy
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Specifies an address' IP family.

.. spicy-code::

type AddressFamily = {
type AddressFamily = enum {
IPv4, # IP4 address
IPv6 # IPv6 address
};
Expand All @@ -25,7 +25,7 @@ Specifies the bit order for individual bit ranges inside a bitfield.

.. spicy-code::

type BitOrder = {
type BitOrder = enum {
LSB0, # bits are interpreted as lowest-significant-bit coming first
MSB0 # bits are interpreted as most-significant-bit coming first
};
Expand All @@ -38,13 +38,26 @@ Specifies byte order for data operations.

.. spicy-code::

type ByteOrder = {
type ByteOrder = enum {
Little, # data is in little-endian byte order
Big, # data is in big-endian byte order
Network, # data is in network byte order (same a big endian)
Host # data is in byte order of the host we are executing on
};

.. _spicy_charset:

.. rubric:: ``spicy::Charset``

Specifies the character set for bytes encoding/decoding.

.. spicy-code::

type Charset = enum {
ASCII,
UTF8
};

.. _spicy_matchstate:

.. rubric:: ``spicy::MatchState``
Expand All @@ -59,7 +72,7 @@ Specifies a transport-layer protocol.

.. spicy-code::

type Protocol = {
type Protocol = enum {
TCP,
UDP,
ICMP
Expand All @@ -73,7 +86,7 @@ Specifies the type of a real value.

.. spicy-code::

type RealType = {
type RealType = enum {
IEEE754_Single, # single precision in IEEE754 format
IEEE754_Double # double precision in IEEE754 format
};
Expand All @@ -86,7 +99,7 @@ Specifies the policy for a sink's reassembler when encountering overlapping data

.. spicy-code::

type ReassemblerPolicy = {
type ReassemblerPolicy = enum {
First # take the original data & discard the new data
};

Expand All @@ -98,7 +111,7 @@ Specifies a side an operation should operate on.

.. spicy-code::

type Side = {
type Side = enum {
Left, # operate on left side
Right, # operate on right side
Both # operate on both sides
Expand All @@ -112,7 +125,7 @@ Specifies direction of a search.

.. spicy-code::

type Direction = {
type Direction = enum {
Forward, # search forward
Backward, # search backward
};
Expand Down
2 changes: 1 addition & 1 deletion doc/autogen/types/bytes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@
Returns the number of bytes the value contains.

.. spicy:operator:: bytes::Sum const~bytes t:bytes <sp> op:+ <sp> t:bytes
.. spicy:operator:: bytes::Sum bytes t:bytes <sp> op:+ <sp> t:bytes
Returns the concatenation of two bytes values.

Expand Down
2 changes: 1 addition & 1 deletion doc/autogen/types/sink.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
sink. If data has already been written when a filter is added, an
error is triggered.

.. spicy:method:: sink::connect_mime_type sink connect_mime_type False void (inout mt: bytes)
.. spicy:method:: sink::connect_mime_type sink connect_mime_type False void (mt: bytes)
Connects parsing units to a sink for all parsers that support a given
MIME type. All subsequent write operations to the sink will pass their
Expand Down
4 changes: 4 additions & 0 deletions doc/autogen/types/tuple.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
.. rubric:: Operators

.. spicy:operator:: tuple::CustomAssign <tuple> t:(x,~...,~y) = t:<tuple>
Assigns element-wise to the left-hand-side tuple

.. spicy:operator:: tuple::Equal bool t:tuple <sp> op:== <sp> t:tuple
Compares two tuples element-wise.
Expand Down
26 changes: 26 additions & 0 deletions doc/autogen/types/unit.rst
Original file line number Diff line number Diff line change
Expand Up @@ -91,3 +91,29 @@
Usage of this method requires the unit to be declared with the
``%random-access`` property.

.. rubric:: Operators

.. spicy:operator:: unit::HasMember bool t:unit <sp> op:?. <sp> t:<field>
Returns true if the unit's field has a value assigned (not counting
any ``&default``).

.. spicy:operator:: unit::Member <field~type> t:unit <sp> op:. <sp> t:<field>
Retrieves the value of a unit's field. If the field does not have a
value assigned, it returns its ``&default`` expression if that has
been defined; otherwise it triggers an exception.

.. spicy:operator:: unit::TryMember <field~type> t:unit <sp> op:.? <sp> t:<field>
Retrieves the value of a unit's field. If the field does not have a
value assigned, it returns its ``&default`` expression if that has
been defined; otherwise it signals a special non-error exception to
the host application (which will normally still lead to aborting
execution, similar to the standard dereference operator, unless the
host application specifically handles this exception differently).

.. spicy:operator:: unit::Unset void unset <sp> t:unit.<field>
Clears an optional field.

30 changes: 0 additions & 30 deletions doc/programming/language/types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -503,36 +503,6 @@ Unit

.. include:: /autogen/types/unit.rst

.. These are copied and adapted from the corresponding struct
.. operators. We have to hardcode them as we currently have no
.. way to pull the struct operators over into the unit type
.. automatically.
.. rubric:: Operators

.. spicy:operator:: unit::HasMember bool t:unit <sp> op:?. <sp> t:<field>
Returns true if the unit's field has a value assigned (not counting
any ``&default``).

.. spicy:operator:: unit::Member <field~type> t:unit <sp> op:. <sp> t:<field>
Retrieves the value of a unit's field. If the field does not yet have
a value assigned, it returns its ``&default`` expression if that has
been defined; otherwise it triggers an exception.

.. spicy:operator:: unit::TryMember <field~type> t:unit <sp> op:.? <sp> t:<field>
Retrieves the value of a unit's field. If the field does not yet have
a value assigned, it returns its ``&default`` expression if that has
been defined. Otherwise it triggers an exception, unless used in a
context that specifically allows for that situation (such as,
inside the Zeek plugin's `evt` files).

.. spicy:operator:: unit::Unset void unset <sp> t:unit.<field>
Resets a field back to its original uninitialized state.

.. _type_vector:

Vector
Expand Down
7 changes: 4 additions & 3 deletions doc/programming/parsing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -491,9 +491,10 @@ The most commonly used hooks are:

``on <field name> { ... }`` (field hook)
Executes just after the given unit field has been parsed. The
parsed value is accessible through the ``$$`` identifier. It will
also have been assigned to the field already, potentially with any
relevant type conversion applied (see :ref:`attribute_convert`).
parsed value is accessible through the ``$$``, potentially with
any relevant type conversion applied (see
:ref:`attribute_convert`). The same will also have been assigned
to the field already.

.. _foreach:

Expand Down
14 changes: 13 additions & 1 deletion doc/scripts/autogen-spicy-lib
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,19 @@ awk -v "target=$1" -v "ns=$2" '
printf(".. rubric:: ``%s::%s``\n\n", ns, label);
printf("%s\n\n", comment);
printf(".. spicy-code::\n\n");
printf(" type %s = {\n", $3);
printf(" type %s = enum {\n", $3);
}
comment = "";
next;
}
# Struct
/public type .* = struct { *$/ {
if ( target == "types" ) {
printf(".. _spicy_%s:\n\n", tolower($3));
printf(".. rubric:: ``%s::%s``\n\n", ns, $3);
printf("%s\n\n", comment);
}
comment = "";
Expand Down
9 changes: 7 additions & 2 deletions doc/scripts/spicy-doc-to-rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,11 @@ Operators = {
TypedType.sub("\\1", op.operands[1].rst(
in_operator=True, markup=False)),
op.operands[0].rst(in_operator=True, markup=False)),
"CustomAssign": lambda op: "{} = {}".format(
op.operands[0].rst(in_operator=True),
op.operands[1].rst(in_operator=True)
),

"Delete": lambda op: "delete <sp> {}[{}]".format(
op.operands[0].rst(in_operator=True),
op.operands[1].rst(in_operator=True, markup=False)
Expand Down Expand Up @@ -137,8 +142,8 @@ NamespaceMappings = {
}

TypeMappings = {
"hilti::rt::regexp::MatchState": "spicy::MatchState",
"hilti::rt::bytes::Side": "spicy::Side",
"::hilti::rt::regexp::MatchState": "spicy::MatchState",
"::hilti::rt::bytes::Side": "spicy::Side",
}

LibraryType = re.compile(r'__library_type\("(.*)"\)')
Expand Down
34 changes: 17 additions & 17 deletions hilti/lib/hilti.hlt
Original file line number Diff line number Diff line change
@@ -1,38 +1,38 @@

module hilti {

public type BitOrder = enum { LSB0, MSB0 } &cxxname="hilti::rt::integer::BitOrder";
public type ByteOrder = enum { Little, Big, Network, Host } &cxxname="hilti::rt::ByteOrder";
public type Side = enum { Left, Right, Both } &cxxname="hilti::rt::bytes::Side";
public type AddressFamily = enum { IPv4, IPv6 } &cxxname="hilti::rt::AddressFamily";
public type RealType = enum { IEEE754_Single, IEEE754_Double } &cxxname="hilti::rt::real::Type";
public type Protocol = enum { TCP, UDP, ICMP } &cxxname="hilti::rt::Protocol";
public type Charset = enum { ASCII, UTF8} &cxxname="hilti::rt::bytes::Charset";
public type BitOrder = enum { LSB0, MSB0 } &cxxname="::hilti::rt::integer::BitOrder";
public type ByteOrder = enum { Little, Big, Network, Host } &cxxname="::hilti::rt::ByteOrder";
public type Side = enum { Left, Right, Both } &cxxname="::hilti::rt::bytes::Side";
public type AddressFamily = enum { IPv4, IPv6 } &cxxname="::hilti::rt::AddressFamily";
public type RealType = enum { IEEE754_Single, IEEE754_Double } &cxxname="::hilti::rt::real::Type";
public type Protocol = enum { TCP, UDP, ICMP } &cxxname="::hilti::rt::Protocol";
public type Charset = enum { ASCII, UTF8 } &cxxname="::hilti::rt::bytes::Charset";
public type Captures = vector<bytes>;

public type MatchState = struct {
method Captures captures(stream data);
} &cxxname="hilti::rt::regexp::MatchState";
} &cxxname="::hilti::rt::regexp::MatchState";

declare public void print(any obj, bool newline = True) &cxxname="hilti::rt::print" &have_prototype;
declare public void printValues(tuple<*> t, bool newline = True) &cxxname="hilti::rt::printValues" &have_prototype;
declare public void print(any obj, bool newline = True) &cxxname="::hilti::rt::print" &have_prototype;
declare public void printValues(tuple<*> t, bool newline = True) &cxxname="::hilti::rt::printValues" &have_prototype;

declare public void debug(string dbg_stream, any obj) &cxxname="HILTI_RT_DEBUG" &have_prototype;
declare public void debugIndent(string dbg_stream) &cxxname="hilti::rt::debug::indent" &have_prototype;
declare public void debugDedent(string dbg_stream) &cxxname="hilti::rt::debug::dedent" &have_prototype;
declare public void debugIndent(string dbg_stream) &cxxname="::hilti::rt::debug::indent" &have_prototype;
declare public void debugDedent(string dbg_stream) &cxxname="::hilti::rt::debug::dedent" &have_prototype;

declare public time current_time() &cxxname="hilti::rt::time::current_time" &have_prototype;
declare public time mktime(uint<64> y, uint<64> m, uint<64> d, uint<64> H, uint<64> M, uint<64> S) &cxxname="hilti::rt::time::mktime" &have_prototype;
declare public time current_time() &cxxname="::hilti::rt::time::current_time" &have_prototype;
declare public time mktime(uint<64> y, uint<64> m, uint<64> d, uint<64> H, uint<64> M, uint<64> S) &cxxname="::hilti::rt::time::mktime" &have_prototype;

declare public void abort() &cxxname="hilti::rt::abort_with_backtrace" &have_prototype;
declare public void abort() &cxxname="::hilti::rt::abort_with_backtrace" &have_prototype;

declare public string linker_scope() &cxxname="hilti::rt::linker_scope" &have_prototype;

# Base type for all exceptions.
public type Exception = exception &cxxname="hilti::rt::Exception";
public type Exception = exception &cxxname="::hilti::rt::Exception";

# Base type for all exception generated by the runtime system. Catching
# this allows to continue after operations triggering runtime errors.
public type RuntimeError = exception &cxxname="hilti::rt::RuntimeError";
public type RuntimeError = exception &cxxname="::hilti::rt::RuntimeError";

}
7 changes: 7 additions & 0 deletions hilti/runtime/include/types/reference.h
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,13 @@ class ValueReference {
*/
T* operator->() { return _safeGet(); }

/**
* Implicitly converts to the contained type.
*
* @throws NullReference if the instance does not refer to a valid value
*/
operator const T&() const { return *_safeGet(); }

/**
* Compares the values of two references.
*
Expand Down
3 changes: 1 addition & 2 deletions hilti/runtime/src/init.cc
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
// Copyright (c) 2020-2021 by the Zeek Project. See LICENSE for details.

#include "hilti/rt/init.h"

#include <sys/resource.h>
#include <unistd.h>

Expand All @@ -11,6 +9,7 @@
#include <hilti/rt/configuration.h>
#include <hilti/rt/context.h>
#include <hilti/rt/global-state.h>
#include <hilti/rt/init.h>
#include <hilti/rt/logging.h>

using namespace hilti::rt;
Expand Down
5 changes: 2 additions & 3 deletions hilti/runtime/src/library.cc
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
// Copyright (c) 2020-2021 by the Zeek Project. See LICENSE for details.

#include "hilti/rt/library.h"

#include <dlfcn.h>

#include <utility>
Expand All @@ -10,6 +8,7 @@
#include <hilti/rt/exception.h>
#include <hilti/rt/fmt.h>
#include <hilti/rt/json.h>
#include <hilti/rt/library.h>
#include <hilti/rt/logging.h>

using namespace hilti::rt;
Expand Down Expand Up @@ -103,7 +102,7 @@ hilti::rt::Result<void*> hilti::rt::Library::symbol(std::string_view name) const

auto* symbol = ::dlsym(_handle, name.data());

if ( auto error = ::dlerror() )
if ( ::dlerror() )
return result::Error(fmt("symbol '%s' not found", name));

return symbol;
Expand Down
3 changes: 1 addition & 2 deletions hilti/runtime/src/logging.cc
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
// Copyright (c) 2020-2021 by the Zeek Project. See LICENSE for details.

#include "hilti/rt/logging.h"

#include <hilti/rt/debug-logger.h>
#include <hilti/rt/logging.h>
#include <hilti/rt/util.h>

using namespace hilti::rt;
Expand Down
2 changes: 0 additions & 2 deletions hilti/runtime/src/type-info.cc
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
// Copyright (c) 2020-2021 by the Zeek Project. See LICENSE for details.

#include "hilti/rt/type-info.h"

#include <cinttypes>

#include <hilti/rt/type-info.h>
Expand Down
3 changes: 1 addition & 2 deletions hilti/runtime/src/types/bytes.cc
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
// Copyright (c) 2020-2021 by the Zeek Project. See LICENSE for details.

#include "hilti/rt/types/bytes.h"

#include <hilti/rt/types/bytes.h>
#include <hilti/rt/types/integer.h>
#include <hilti/rt/types/regexp.h>
#include <hilti/rt/types/stream.h>
Expand Down
Loading

0 comments on commit b098367

Please sign in to comment.