Releases: zeek/spicy
v1.12.0
New Functionality
-
We now support
if
around a block of unit items:type X = unit { x: uint8; if ( self.x == 1 ) { a1: bytes &size=2; a2: bytes &size=2; }; };
One can also add an
else
-block:type X = unit { x: uint8; if ( self.x == 1 ) { a1: bytes &size=2; a2: bytes &size=2; } else { b1: bytes &size=2; b2: bytes &size=2; }; };
-
We now support attaching an
%error
handler to an individual field:type Test = unit { a: b"A"; b: b"B" %error { print "field B %error", self; } c: b"C"; };
With input
AxC
, that handler will trigger, whereas withABx
it won't. If the unit had a unit-wide%error
handler as well, that one would trigger in both cases (i.e., forb
, in addition to its field local handler).The handler can also be provided separately from the field:
on b %error { ... }
In that separate version, one can receive the error message as well by declaring a corresponding string parameter:
on b(msg: string) %error { ... }
This works externally, from outside the unit, as well:
on Test::b(msg: string) %error { ... }
-
GH-1856: We added support for specifying a dedicated error message for
requires
failures.This now allows creating custom error messages when a
&require
condition fails. Example:type Foo = unit { x: uint8 &requires=($$ == 1 : error"Deep trouble!'"); # or, shorter: y: uint8 &requires=($$ == 1 : "Deep trouble!'"); };
This is powered by a new condition test expression
COND : ERROR
. -
We reworked C++ code generation so now many parsers should compile faster. This is accomplished by both improved dependency tracking when emitting C++ code for a module as well as by a couple of new peephole optimization passes which additionally reduced the emitted code.
Changed Functionality
- Add
CMAKE_CXX_FLAGS
toHILTI_CONFIG_RUNTIME_LD_FLAGS
. - Speed up compilation of many parsers by streamlining generated C++ code.
- Add
starts_with
split
,split1
,lower
andupper
methods tostring
. - GH-1874: Add new library function
spicy::bytes_to_mac
. - Optimize
spicy::bytes_to_hexstring
andspicy::bytes_to_mac
. - Improve validation of attributes so incompatible or invalid attributes should be rejected more reliably.
- Optimize parsing for
bytes
of fixed size as well as literals. - Add a couple of peephole optimizations to reduce emitted C++ code.
- GH-1790: Provide proper error message when trying access an unknown unit field.
- GH-1792: Prioritize error message reporting unknown field.
- GH-1803: Fix namespacing of
hilti
IDs in Spicy-side diagnostic output. - GH-1895: Do no longer escape backslashes when printing strings or bytes.
- GH-1857: Support
&requires
for individual vector items. - GH-1859: Improve error message when a unit parameter is used as a field.
- GH-1898: Disallow attributes on "type aliases".
- GH-1938: Deprecate
&count
attribute.
Bug fixes
- GH-1815: Disallow expanding limited
View
's again withlimit
. - Fix
to_uint(ByteOrder)
for empty byte ranges. - Fix undefined shifts of 32bit integer in
toInt()
. - GH-1817: Prevent null ptr dereference when looking on nodes without
Scope
. - Fix use of move'd from variable.
- GH-1823: Don't qualify magic linker symbols with C++ namespace.
- Fix diagnostics seen when compiling with GCC.
- GH-1852: Fix
skip
with units. - GH-1832: Fail for vectors with bytes but no stop.
- GH-1860: Fix parsing for vectors of literals.
- GH-1847: Fix resynchronization issue with trimmed input.
- GH-1844: Fix nested look-ahead parsing.
- GH-1842: Fix when input redirection becomes visible.
- GH-1846: Fix bug with captures groups.
- GH-1875: Fix potential nullptr dereference when comparing streams.
- GH-1867: Fix infinite loops with recursive types.
- GH-1868: Associate source code locations with current fiber instead of current thread.
- GH-1871: Fix
&max-size
on unit containing aswitch
. - GH-1791: Fix usage of
&convert
with unit's requiring parameters. - GH-1858: Fix the literals parsers not following coercions.
- GH-1893: Encompass child node's location in parent.
- GH-1919: Validate that sets are sortable.
- GH-1918: Fix potential segfault with stream iterators.
- GH-1856: Disallow dereferencing a
result<void>
value. - Fix issue with type inference for
result
constructor.
Documentation
v1.11.3
Bug fixes
-
GH-1846: Fix bug with captures groups.
When extracting the data matching capture groups we'd take it from the beginning of the stream, not the beginning of the current view, even though the latter is what we are matching against.
-
Add missing trim after matching a regular expression.
-
GH-1875: Fix potential nullptr dereference when comparing streams.
Because we are operating on unsafe iterators, need to catch when one goes out of bounds.
-
GH-1842: Fix when input redirection becomes visible.
With
&parse-at/from
we were updating the internal state on our current position immediately, meaning they were visible already when evaluating other attributes on the same field afterwards, which is unexpected. -
GH-1844: Fix nested look-ahead parsing.
When parsing nested vectors all using look-ahead, we need to return control back to upper level when an inner look-ahead isn't found.
This may change the error message for "normal" look-ahead parsing (see test baseline), but the new one seems fine and potentially even better.
v1.11.2
Bug fixes
-
GH-1860: Fix parsing for vectors of literals.
This was broken in two ways:
- with the
(LITERAL)[]
syntax, the parser would not recognize literals using type constructors - with the syntax
LITERAL[]
, we'd try to store the parsed value into a vector
- with the
-
GH-1847: Fix resynchronization issue with trimmed input.
When input had been trimmed,
View::advanceToNextData
could end up returning a view starting ahead of the valid area. -
GH-1852: Fix
skip
with units.For unit parsing with
skip
, we would create a temporary instance but wouldn't properly initialize it, meaning for example that parameters weren't available. We now generally fully initialize any destination, even if temporary.
v1.11.1
Bug fixes
-
GH-1831: Fix optimizer regression.
We were no longer marking types as used that are referenced through a type name.
-
GH-1823: Don't qualify magic linker symbols with C++ namespace.
We need them at their original values because that's what the runtime lbirary is hard-coded to expect.
-
Fix use of move'd from variable.
Function parameters still shadown members in C++. This is a fixup of c3abbbe.
-
Fix undefined shifts of 32bit integer in
toInt()
.1U
is 32bit on a 64bit system and shifting it by more than 31 bits is undefined. The following does currently produce-4294967296
instead of-1
:b"\xff\xff\xff\xff".to_int(spicy::ByteOrder::Big)
-
Fix
to_uint(ByteOrder)
for empty byte ranges.to_uint()
andto_int()
for empty byte ranges throw when attempting to convert printable decimals to integers. Do the same for the byte order versions. The assumption is that it is really an error when the user callsto_int()
orto_uint()
on an empty byte range. -
GH-1817: Prevent null ptr dereference when looking on nodes without
Scope
. -
GH-1815: Disallow expanding limited
View
s again withlimit
.The documented semantics of
View::limit
are that it creates a new view with equal or smaller size. In contrast to that we would still have allowed to expand views with more callslimit
again as well.This patch changes the implementation of
View::limit
so it can only ever make aView
smaller.We also tweak the implementation of the check for consumed
&size
when used together with&eod
: if the&size
was already nested in a limited view a larger&size
value could previously extend the view so the&eod
effectively was ignored. Since we now do not extend theView
anymore we need to only activate the check for consumed&size
if&eod
was not specified since in this case the user communicated that they are fine with consuming less data. -
GH-1810: Fix nested look-ahead switches.
-
Remember normalized paths when checking for duplicate files in driver.
While we ignore duplicate files it was still possible to erroneously add the same file multiple times to a compilation. Catch this trivial case.
-
GH-1462: Remember files processed by the driver.
We did this previously but stopped doing it with #1462.
-
Remove a few value copies.
-
GH-1813: Fix equality implementation of module
UID
.We already computed a
unique
ID
value for each module to allow declaring the sameID
name multiple times; we however did not consistently use that value in the implementation ofmodule::UID
equality and hash operators which is addressed by this patch.
v1.11.0
New Functionality
-
GH-3779: Add
%sync_advance
hook.This adds support for a new unit hook:
on %sync_advance(offset: uint64) { ... }
This hook is called regularly during error recovery when synchronization skips over data or gaps while searching for a valid synchronization point. It can be used to check in on the synchronization to, e.g., abort further processing if it just keeps failing.
offset
is the current position inside the input stream that synchronization just skipped to.By default, "called regularly" means that it's called every 4KB of input skipped over while searching for a synchronization point. That value can be changed by setting a unit property
%sync-advance-block-size = <number of bytes>
.As an additional minor tweak, this also changes the name of what used to be the
__gap__
profiler to now be called__sync_advance
because it's profiling the time spent in skipping data, not just gaps. -
Add unit method
stream()
to access current input stream, and stream methodstatistics()
to retrieve input statistics.This returns a struct of the following type, reflecting the input seen so far:
type StreamStatistics = struct { num_data_bytes: uint64; ## number of data bytes processed num_data_chunks: uint64; ## number of data chunks processed, excluding empty chunks num_gap_bytes: uint64; ## number of gap bytes processed num_gap_chunks: uint64; ## number of gap chunks processed, excluding empty chunks };
-
GH-1750: Add
to_real
method tobytes
. This interprets the data as representing an ASCII-encoded floating point number and converts that into areal
. The data can be in either decimal or hexadecimal format. If it cannot be parsed as either, throws anInvalidValue
exception. -
GH-1608: Add
get_optional
method to maps.
This returns anoptional
value either containing the map's element for the given key if that entry exists, or an unsetoptional
if it does not. -
GH-90/GH-1733: Add
result
andspicy::Error
types to Spicy to facilitate error handling.
Changed Functionality
- The Spicy compiler has become a bit more strict and is now rejecting some ill-defined code constructs that previous versions ended up letting through. Specifically, the following cases will need updating in existing code:
- Identifiers from the (internal)
hilti::
namespace are no longer accessible. Usually you can just scope them withspicy::
instead. - Previous versions did not always enforce constness as it should have. In particular, function parameters could end up being mutable even when they weren't declared as
inout
. Nowinout
is required for supporting any mutable operations on a parameter, so make sure to add it where needed. - When using unit parameters, the type of any
inout
parameters now must be unit itself. To pass other types into a unit so that they can be modified by the unit, use reference instead ofinout
. For example, usetype Foo = unit(s: sink&)
instead oftype Foo = unit(inout: sink)
. See https://docs.zeek.org/projects/spicy/en/latest/programming/parsing.html#unit-parameters for more.
- Identifiers from the (internal)
- The Spicy compiler new uses a more streamlined storage and access scheme to represent source code. This speeds up work up util C++ source translation (e.g., faster time to first error message during development).
spicyc
options-c
and-l
no longer support compiling multiple Spicy source files to C++ code individually to then build them all together. This was a rarely used feature and actually already broken in some situations. Instead, usespicyc -x
to produce the C++ code for all needed Spicy source files at once.-c
and-l
remain available for debugging purposes.- The
spicyc
option-P
now requires a prefix argument that sets the C++ namespace, just like-x <prefix>
does. This is so that the prototypes match the actual code generated by-x
. To get the same identifiers as before, use an empty prefix (-P ""
). - GH-1763: Restrict initialization of
const
values to literals. This means that e.g.,const
values cannot be initialized from otherconst
values or function calls anymore. result
andnetwork
are now keywords and cannot be used anymore as user-specified indentifiers.- GH-1661: Deprecate usage of
&convert
with&chunked
. - GH-1657: Reduce data copying when passing data to the driver.
- GH-1501: Improve some error messages for runtime parse errors.
- GH-1655: Reject joint usage of filters and look-ahead.
- GH-1675: Extend runtime profiling to measure parser input volume.
- GH-1624: Enable optimizations when running
spicy-build
.
Bug fixes
- GH-1759: Fix
if
-condition withswitch
parsing. - Fix Spicy's support for
network
type. - GH-1598: Enforce that the argument
new
is either a type or a ctor. - GH-1742, GH-1760: Unroll constructors of big containers in generated code. We previously would generate code which would be expensive to compiler for some compilers. We now generate more friendly code.
- GH-1745: Fix C++ initialization of global constants through global functions.
- GH-1743: Use a checked cast for
map
'sin
operator. - GH-1664: Fix
&convert
typing issue with bit ranges. - GH-1724: Fix skipping in size-constrained units. We previously could skip too much data if
skip
was used in a unit with a global&size
. - Fix incremental skipping. We previously would incorrectly compute the amount of data to skip which could have potentially lead to the parser consuming more data than available.
- GH-1586: Make skip productions behave like the production they are wrapping.
- GH-1711: Fix forwarding of a reference unit parameter to a non-reference parameter.
- GH-1599: Fix integer increment/decrement operators require mutable arguments.
- GH-1493: Support/fix public type aliases to units.
Documentation
- Add new section with guidelines and best practices. This focuses on performance for now, but may be extended with other areas alter. Much of the content was contributed by Corelight Labs.
- Fix documented type mapping for integers.
- Document generic operators.
v1.10.1
-
Update CI setups.
-
Fix repeated evaluations of
&parse-at
expression.
v1.9.1
-
Drop
;
after#pragma
. -
Update CI setups.
-
Fix repeated evaluations of
&parse-at
expression. -
Fix stray Python escape sequence.
-
Drop freebsd-12 from CI.
-
GH-1617: Fix handling of
%synchronize-*
attributes for units in lists.We previously would not detect
%synchronize-at
or%synchronize-from
attributes if the unit was not directly in a field, i.e., we mishandled the common case of synchronizing on a unit in a list.With this patch we now handle these attributes, regardless of how the unit appears.
v1.8.4
-
Drop
;
after#pragma
. -
Update CI setups.
-
Fix repeated evaluations of
&parse-at
expression. -
Fix stray Python escape sequence.
-
Fix skipping of literal fields with condition.
-
Fix type of generated code for
string::size
.While we defined
string
's size operator to return anuint64
and documented that it returns the length in codepoints, not bytes, we still generated C++ code which worked on the underlying bytes (i.e., it directly invokedstd::string::size
instead of usinghilti::rt::string::size
).
v1.10.0
Changed Functionality
-
Numerous improvements to improve throughput of generated parsers.
For this release we have revisited the code typically generated for parsers and the runtime libraries they use with the goal of improving throughput of parsers at runtime. Coarsely summarized this work was centered around
- reduction of allocations during parsing
- reduction of data copies during parsing
- use of dedicated, hand-check implementations for automatically generated code to avoid overhead from safety checks in the runtime libraries
With these changes we see throughput improvements of some parsers in the range of 20-30%. This work consisted of numerous incremental changes, see
CHANGES
for the full list of changes. -
GH-1667: Always advance input before attempting resynchronization.
When we enter resynchronization after hitting a parse error we previously would have left the input alone, even though we know it fails to parse. We then relied fully on resynchronization to advance the input.
With this patch we always forcibly advance the input to the next non-gap position. This has no effect for synchronization on literals, but allows it to happen earlier for regular expressions.
-
GH-1659: Lift requirement that
bytes
forwarded from filter be mutable. -
GH-1489: Deprecate &bit-order on bit ranges.
This had no effect and allowing it may be confusing to users. Deprecate it with the idea of eventual removal.
-
Extend location printing to include single-line ranges.
For a location of, e.g., "line 1, column 5 to 10", we now print
1:5-1:10
, whereas we used to print it as only1:5
, hence dropping information. -
GH-1500: Add
+=
operator forstring
.This allows appending to a
string
without having to allocate a new string. This might perform better most of the time. -
GH-1640: Implement skipping for any field with known size.
This patch adds
skip
support for fields with&size
attribute or of builtin type with known size. If a unit has a known size and it is specified in a&size
attribute this also allows to skip over unit fields.
Bug fixes
-
GH-1605: Allow for unresolved types for set
in
operator. -
GH-1617: Fix handling of
%synchronize-*
attributes for units in lists.We previously would not detect
%synchronize-at
or%synchronize-from
attributes if the unit was not directly in a field, i.e., we mishandled the common case of synchronizing on a unit in a list.We now handle these attributes, regardless of how the unit appears.
-
GH-1585: Put closing of unit sinks behind feature guard.
This code gets emitted, regardless of whether a sink was actually connected or not. Put it behind a feature guard so it does not enable the feature on its own.
-
GH-1652: Fix filters consuming too much data.
We would previously assume that a filter would consume all available data. This only holds if the filter is attached to a top-level unit, but in general not if some sub-unit uses a filter. With this patch we explicitly compute how much data is consumed.
-
GH-1668: Fix incorrect data consumption for
&max-size
.We would previously handle
&size
and&max-size
almost identical with the only difference that&max-size
sets up a slightly larger view to accommodate a sentinel. In particular, we also used identical code to set up the position where parsing should resume after such a field.This was incorrect as it is in general impossible to tell where parsing continues after a field with
&max-size
since it does not signify a fixed view like&size
. We now compute the next position for a&max-size
field by inspecting the limited view to detect how much data was extracted. -
GH-1522: Drop overzealous validator.
A validator was intended to reject a pattern of incorrect parsing of vectors, but instead ending up rejecting all vector parsing if the vector elements itself produced vectors. We dropped this validation.
-
GH-1632: Fix regex processing using
{n,m}
repeat syntax being off by one -
GH-1648: Provide meaningful unit
__begin
value when parsing starts.We previously would not provide
__begin
when starting the initial parse. This meant that e.g.,offset()
was not usable if nothing ever got parsed.We now provide a meaningful value.
-
Fix skipping of literal fields with condition.
-
GH-1645: Fix
&size
check.The current parsing offset could legitimately end up just beyond the
&size
amount. -
GH-1634: Fix infinite loop in regular expression parsing.
Documentation
-
Update documentation of
offset()
. -
Fix docs namespace for symbols from
filter
module.We previously would document these symbols to be in
spicy
even though they are infilter
. -
Add bitfield examples.
v1.8.3
-
GH-1645: Fix
&size
check.The current parsing offset could legitimately end up just beyond the
&size
amount. -
GH-1617: Fix handling of
%synchronize-*
attributes for units in lists.We previously would not detect
%synchronize-at
or%synchronize-from
attributes if the unit was not directly in a field, i.e., we mishandled the common case of synchronizing on a unit in a list.With this patch we now handle these attributes, regardless of how the unit appears.