From f68df152a13a302966617a9a0cb99f8dea634e56 Mon Sep 17 00:00:00 2001 From: Rerumu <25379555+Rerumu@users.noreply.github.com> Date: Sun, 6 Nov 2022 00:21:09 -0400 Subject: [PATCH 01/18] Create type-byte-array.md --- rfcs/type-byte-array.md | 51 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) create mode 100644 rfcs/type-byte-array.md diff --git a/rfcs/type-byte-array.md b/rfcs/type-byte-array.md new file mode 100644 index 000000000..55b90e3f0 --- /dev/null +++ b/rfcs/type-byte-array.md @@ -0,0 +1,51 @@ +# Byte Array Type + +## Summary + +A new constructed type which serves as a mutable array of bytes. Ideally, it would expose an API that would allow for building, reading, and writing to the internal buffer. A particularly good example type that this could be derived from is [this Java class](https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/nio/ByteBuffer.html). + +## Motivation + +With this type, we ideally solve the use cases for binary format encoding and decoding, compression, and active/compact memory. This opens the door for developers to work with file formats that might've been too large to represent with tables or to write to strings. It also allows for writing algorithms that deal with raw data often, such as compression or hashing. Web services that exchange data in packed formats could also benefit from this. + +## Design + +While an API as extensive as the example might be good, it's only really necessary to have methods for treating this block of data as a random access collections of bytes with potentially resizable backing. This could ideally be exposed as: + +* A constructor with predefined size and endianness. + +* A method for resizing the existing object while retaining data in bounds. + +* Methods for reading... + + * ... a string given an offset and size. + + * ... signed and unsigned integers of 8, 16, and 32 bits given an offset. + + * ... floating point values of 32 and 64 bits given an offset. + + * Methods for writing... + + * ... a string given an offset, size, and value. + + * ... signed and unsigned integers of 8, 16, and 32 bits given an offset and value. + + * ... floating point values of 32 and 64 bits given an offset and value. + +As Luau can't represent 64 bit integers in user code, the methods for these could be omitted or simply serve as a shortcut for writing what can be represented already without needing 2 calls to the 32 bit methods. + +## Drawbacks + +Depending on implementation this could increase the complexity of the VM and related code. If this is to be implemented as a built-in, optimized type it might need specialized fast paths for all relevant opcodes. Additionally, this type would have to come in with methods and some sort of constructor to be useful, which does not have precedent in the open source Luau distribution (even `vector` lacks an exposed constructor or methods). + +## Alternatives + +The workarounds without this feature are significantly inefficient: + +* Tables can, at most, represent 64 bits per slot using expensive `vector` packing. + +* Tables with or without packing severely bloat memory, as each array entry is subject to Luau value size and alignment. + +* Strings are immutable and can’t be used to efficiently construct binary data without exponential allocations. + +* Built in `string.pack` and `string.unpack` can’t cover more complex schemas on their own or formats which are edited mid-creation. From 23bc89f62a11869d9129504257253a0d0c45d474 Mon Sep 17 00:00:00 2001 From: Rerumu <25379555+Rerumu@users.noreply.github.com> Date: Mon, 11 Sep 2023 14:35:55 -0400 Subject: [PATCH 02/18] Update proposal --- rfcs/type-byte-array.md | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/rfcs/type-byte-array.md b/rfcs/type-byte-array.md index 55b90e3f0..ed450f8a5 100644 --- a/rfcs/type-byte-array.md +++ b/rfcs/type-byte-array.md @@ -2,7 +2,7 @@ ## Summary -A new constructed type which serves as a mutable array of bytes. Ideally, it would expose an API that would allow for building, reading, and writing to the internal buffer. A particularly good example type that this could be derived from is [this Java class](https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/nio/ByteBuffer.html). +A new built in type to serve as an array of bytes, with a library for reading and writing to the internal buffer. A particularly good example type that this could be derived from is [this Java class](https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/nio/ByteBuffer.html). ## Motivation @@ -10,33 +10,33 @@ With this type, we ideally solve the use cases for binary format encoding and de ## Design -While an API as extensive as the example might be good, it's only really necessary to have methods for treating this block of data as a random access collections of bytes with potentially resizable backing. This could ideally be exposed as: +This could ideally be exposed as a new Luau library similar to that of `string` or `table`. It would contain functions for: -* A constructor with predefined size and endianness. +* Instantiating a new byte array object with an initial size. -* A method for resizing the existing object while retaining data in bounds. +* Resizing an existing object while keeping previously assigned values. -* Methods for reading... +* Reading... - * ... a string given an offset and size. + * ... signed and unsigned integers of 8, 16, and 32 bits given an offset and endianness. - * ... signed and unsigned integers of 8, 16, and 32 bits given an offset. + * ... floating point values of 32 and 64 bits given an offset and endianness. - * ... floating point values of 32 and 64 bits given an offset. + * ... a string given an offset and size. - * Methods for writing... + * Writing... - * ... a string given an offset, size, and value. + * ... signed and unsigned integers of 8, 16, and 32 bits given an offset, value, and endianness. - * ... signed and unsigned integers of 8, 16, and 32 bits given an offset and value. + * ... floating point values of 32 and 64 bits given an offset, value, and endianness. - * ... floating point values of 32 and 64 bits given an offset and value. + * ... a string given an offset, size, and value. -As Luau can't represent 64 bit integers in user code, the methods for these could be omitted or simply serve as a shortcut for writing what can be represented already without needing 2 calls to the 32 bit methods. +As Luau can't represent 64 bit integers in user code, the functions for these could be omitted or simply serve as a shortcut for writing what can be represented already without needing 2 calls to the 32 bit functions. ## Drawbacks -Depending on implementation this could increase the complexity of the VM and related code. If this is to be implemented as a built-in, optimized type it might need specialized fast paths for all relevant opcodes. Additionally, this type would have to come in with methods and some sort of constructor to be useful, which does not have precedent in the open source Luau distribution (even `vector` lacks an exposed constructor or methods). +Depending on implementation this could increase the complexity of the VM and related code. If this is to be implemented as a built-in, optimized type, it might need specialized fast paths for all relevant opcodes. Additionally, this type would have to come in with a whole new library table as part of the global environment, which could cause name collisions in older code. ## Alternatives From cf884f437d9444b53902146c20c3abe013c7f3d6 Mon Sep 17 00:00:00 2001 From: Rerumu <25379555+Rerumu@users.noreply.github.com> Date: Mon, 11 Sep 2023 21:18:34 -0400 Subject: [PATCH 03/18] Update proposal --- rfcs/type-byte-array.md | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/rfcs/type-byte-array.md b/rfcs/type-byte-array.md index ed450f8a5..4f7ff0085 100644 --- a/rfcs/type-byte-array.md +++ b/rfcs/type-byte-array.md @@ -6,33 +6,37 @@ A new built in type to serve as an array of bytes, with a library for reading an ## Motivation -With this type, we ideally solve the use cases for binary format encoding and decoding, compression, and active/compact memory. This opens the door for developers to work with file formats that might've been too large to represent with tables or to write to strings. It also allows for writing algorithms that deal with raw data often, such as compression or hashing. Web services that exchange data in packed formats could also benefit from this. +With this type, we solve the use cases for binary format encoding and decoding, compression, and active/compact memory. This opens the door for developers to work with file formats that might've been too large to represent with tables or to write to strings. It also allows for writing algorithms that deal with raw data often, such as compression or hashing. Web services that exchange data in packed formats could also benefit from this. ## Design -This could ideally be exposed as a new Luau library similar to that of `string` or `table`. It would contain functions for: +This would be exposed as a new Luau library similar to that of `string` or `table`, with functions for: -* Instantiating a new byte array object with an initial size. +* Instantiating the object with a fixed size. -* Resizing an existing object while keeping previously assigned values. +* Fetching the size of the object. + +* Copying a range of data from one object to another. * Reading... - * ... signed and unsigned integers of 8, 16, and 32 bits given an offset and endianness. + * ... signed and unsigned integers of 8, 16, and 32 bits given an offset. - * ... floating point values of 32 and 64 bits given an offset and endianness. + * ... floating point values of 32 and 64 bits given an offset. * ... a string given an offset and size. * Writing... - * ... signed and unsigned integers of 8, 16, and 32 bits given an offset, value, and endianness. + * ... signed and unsigned integers of 8, 16, and 32 bits given an offset and value. - * ... floating point values of 32 and 64 bits given an offset, value, and endianness. + * ... floating point values of 32 and 64 bits given an offset and value. * ... a string given an offset, size, and value. -As Luau can't represent 64 bit integers in user code, the functions for these could be omitted or simply serve as a shortcut for writing what can be represented already without needing 2 calls to the 32 bit functions. +Read and write operations for relevant types are little endian as it is the most common use case, and conversion is often trivial to do manually. + +Additionally, unaligned offsets in all operations are valid and behave as expected. ## Drawbacks From 232a209ef3fa0a193de41fa8dc136450f1fde16b Mon Sep 17 00:00:00 2001 From: vegorov-rbx <75688451+vegorov-rbx@users.noreply.github.com> Date: Thu, 5 Oct 2023 10:44:14 -0700 Subject: [PATCH 04/18] Propose concrete functions and a global library --- rfcs/type-byte-array.md | 110 +++++++++++++++++++++++++++++++++++----- 1 file changed, 97 insertions(+), 13 deletions(-) diff --git a/rfcs/type-byte-array.md b/rfcs/type-byte-array.md index 4f7ff0085..af446d2ce 100644 --- a/rfcs/type-byte-array.md +++ b/rfcs/type-byte-array.md @@ -10,37 +10,121 @@ With this type, we solve the use cases for binary format encoding and decoding, ## Design -This would be exposed as a new Luau library similar to that of `string` or `table`, with functions for: +This type will be called 'buffer' and will be implemented using `userdata` with a new reserved tag. -* Instantiating the object with a fixed size. +Operations on this type will be exposed through a new Luau library called 'buffer`, with the following functions: -* Fetching the size of the object. +`buffer.create(size: number): buffer` -* Copying a range of data from one object to another. +Instantiates the object with a fixed size. +Each byte is initialized to 0. -* Reading... +'size' has to be an integer and it cannot be negative. Maximum size is defined by implementation, but it at least matches the maximum string size. - * ... signed and unsigned integers of 8, 16, and 32 bits given an offset. +`buffer.fromstring(str: string): buffer` - * ... floating point values of 32 and 64 bits given an offset. +Instantiates the object from a string. +Size of the buffer is fixed and equals to the length of the string. - * ... a string given an offset and size. +`buffer.tostring(): string` - * Writing... +Returns the buffer data as a string. - * ... signed and unsigned integers of 8, 16, and 32 bits given an offset and value. +`buffer.len(b: buffer): number` - * ... floating point values of 32 and 64 bits given an offset and value. +Returns the size of the buffer. - * ... a string given an offset, size, and value. +'__len' metamethod is not proposed at this time. + +`buffer.copy(target_buffer: buffer, target_offset: number, source_buffer: buffer, source_offset: number, count: number) -> ()` + +Copy 'count' bytes from 'source_buffer' starting at offset 'source_offset' into the 'target_buffer' at 'target_offset'. + +Offsets and 'count' have to be numbers, each number is cast to an integer in an implementation-defined way. + +`buffer.readi8(b: buffer, offset: number): number` + +`buffer.readu8(b: buffer, offset: number): number` + +`buffer.readi16(b: buffer, offset: number): number` + +`buffer.readu16(b: buffer, offset: number): number` + +`buffer.readi32(b: buffer, offset: number): number` + +`buffer.readu32(b: buffer, offset: number): number` + +`buffer.readf32(b: buffer, offset: number): number` + +`buffer.readf64(b: buffer, offset: number): number` + +Used to read the data from the buffer by reinterpreting bytes at the offset as the type in the argument and converting it into a number. + +`buffer.writei8(b: buffer, offset: number, value: number): ()` + +`buffer.writeu8(b: buffer, offset: number, value: number): ()` + +`buffer.writei16(b: buffer, offset: number, value: number): ()` + +`buffer.writeu16(b: buffer, offset: number, value: number): ()` + +`buffer.writei32(b: buffer, offset: number, value: number): ()` + +`buffer.writeu32(b: buffer, offset: number, value: number): ()` + +`buffer.writef32(b: buffer, offset: number, value: number): ()` + +`buffer.writef64(b: buffer, offset: number, value: number): ()` + +Used to write data to the buffer by converting the number into the type specified by the argument and reinterpreting it as individual bytes. + +Conversion to unsigned numbers uses `bit32` library semantics. + +`buffer.readstring(b: buffer, offset: number, count: number): string` + +Used to read a string of length 'count' from the buffer at specified offset. + +`buffer.writestring(b: buffer, offset: number, value: string, count: number?): ()` + +Used to write data from a string into the buffer at specified offset. + +If an optional 'count' is specified, only 'count' bytes are taken from the string. 'count' cannot be larger that the string length. + +--- + +All offsets start at 0. Read and write operations for relevant types are little endian as it is the most common use case, and conversion is often trivial to do manually. Additionally, unaligned offsets in all operations are valid and behave as expected. +Unless otherwise specified, if a read or write operation would cause an access outside the data in the buffer, an error is thrown. + ## Drawbacks -Depending on implementation this could increase the complexity of the VM and related code. If this is to be implemented as a built-in, optimized type, it might need specialized fast paths for all relevant opcodes. Additionally, this type would have to come in with a whole new library table as part of the global environment, which could cause name collisions in older code. +This introduces 'buffer' as a class type in global typing context and adds new global 'buffer' table. +While class type might intersect with user-defined 'buffer' type, such type redefinitions ares already allowed in Luau, so this should not cause new type errors. +Same goes for the global table, users can already override globals like 'string', so additional of a new global is backwards-compatible, but new table will not be accessible in such a case. + +Depending on implementation this could increase the complexity of the VM and related code. If this is to be implemented as a built-in, optimized type, it might need specialized fast paths for all relevant opcodes. + +## Extensions + +To support additional use cases, we can provide a set of `pushTYPE` and `takeTYPE` library functions and extend the type to have an internal cursor. +This will make it easy to write/read data from a buffer as one would from a file, without having to track the current offset manually. +Additional functions like `pos` and `setpos` can be added to access this internal cursor. + +This extension can be made by changing the internal representation without affecting older code. + +One drawback here might be that the cursor is attached to the data and raises a question if the value is preserved when object is serialized over the network. + +--- + +Additional possibility will be to make the buffer change size automatically by `pushTYPE` interface. (explicit resize can be implemented with the existing interface). +This can also be changed almost transparently for older code. +One difference will be that `pushTYPE` will not throw when reaching the end of the data. Unless it is decided that other write operations could also resize implicitly. + +Implementation can have a performance impact however as data will be read through a pointer redirection. ## Alternatives From f09b7770703d28a745e63e5087a9fb7c891ef66b Mon Sep 17 00:00:00 2001 From: vegorov-rbx <75688451+vegorov-rbx@users.noreply.github.com> Date: Thu, 5 Oct 2023 14:37:56 -0700 Subject: [PATCH 05/18] Updates to specify int conversion, floating point representation and the content of the metatable --- rfcs/type-byte-array.md | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/rfcs/type-byte-array.md b/rfcs/type-byte-array.md index af446d2ce..cffde5b95 100644 --- a/rfcs/type-byte-array.md +++ b/rfcs/type-byte-array.md @@ -34,8 +34,6 @@ Returns the buffer data as a string. Returns the size of the buffer. -'__len' metamethod is not proposed at this time. - `buffer.copy(target_buffer: buffer, target_offset: number, source_buffer: buffer, source_offset: number, count: number) -> ()` Copy 'count' bytes from 'source_buffer' starting at offset 'source_offset' into the 'target_buffer' at 'target_offset'. @@ -60,6 +58,9 @@ Offsets and 'count' have to be numbers, each number is cast to an integer in an Used to read the data from the buffer by reinterpreting bytes at the offset as the type in the argument and converting it into a number. +Floating-point numbers are read from a format specified by IEEE 754. +When reading the value of any NaN representation, implementation can (but not required to) replace it with a different quiet NaN representation. + `buffer.writei8(b: buffer, offset: number, value: number): ()` `buffer.writeu8(b: buffer, offset: number, value: number): ()` @@ -78,8 +79,11 @@ Used to read the data from the buffer by reinterpreting bytes at the offset as t Used to write data to the buffer by converting the number into the type specified by the argument and reinterpreting it as individual bytes. +Conversion to integer numbers performs a truncation of the number value. Results of converting special number values (inf/nan) is platform-specific. Conversion to unsigned numbers uses `bit32` library semantics. +Floating-point numbers are stored in a format specified by IEEE 754. + `buffer.readstring(b: buffer, offset: number, count: number): string` Used to read a string of length 'count' from the buffer at specified offset. @@ -100,6 +104,20 @@ Additionally, unaligned offsets in all operations are valid and behave as expect Unless otherwise specified, if a read or write operation would cause an access outside the data in the buffer, an error is thrown. +### Metatable + +`buffer` also has a metatable, inside this metatable: +* '__type' is defined to return 'buffer'. `type()` will return 'userdata' +* '__eq' is defined to compare buffers, by comparing sizes first, followed by content comparison +* metatable is locked + +No other metamethod is defined, naming a few specific onces: +* '__len' is not proposed at this time +* '__index' is not defined, so there is no `b[1] = a` interface to write bytes. Neither can you call library functions as methods like `b:writei16(10, 12)` +* '__iter' is not defined +* '__tostring' is not defined, generic userdata behavior remains, returning 'buffer: 0xpointer' +* ordering is not defined + ## Drawbacks This introduces 'buffer' as a class type in global typing context and adds new global 'buffer' table. From 49455dfa09407a80781bc8cf7e5bfb9688a8fcaa Mon Sep 17 00:00:00 2001 From: vegorov-rbx <75688451+vegorov-rbx@users.noreply.github.com> Date: Thu, 5 Oct 2023 14:40:52 -0700 Subject: [PATCH 06/18] Better specification for buffer.copy --- rfcs/type-byte-array.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/rfcs/type-byte-array.md b/rfcs/type-byte-array.md index cffde5b95..5b569c57c 100644 --- a/rfcs/type-byte-array.md +++ b/rfcs/type-byte-array.md @@ -38,6 +38,9 @@ Returns the size of the buffer. Copy 'count' bytes from 'source_buffer' starting at offset 'source_offset' into the 'target_buffer' at 'target_offset'. +It is possible for 'source_buffer' and 'target_buffer' to be the same. +Copying an overlapping region inside the same buffer acts as if the source region is copied into a temporary buffer and then that buffer is copied over to the target. + Offsets and 'count' have to be numbers, each number is cast to an integer in an implementation-defined way. `buffer.readi8(b: buffer, offset: number): number` From 9c492923db21bde24ccf6e1e5cc184e3da54ec36 Mon Sep 17 00:00:00 2001 From: vegorov-rbx <75688451+vegorov-rbx@users.noreply.github.com> Date: Fri, 6 Oct 2023 13:05:43 -0700 Subject: [PATCH 07/18] Remove '__eq' --- rfcs/type-byte-array.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/rfcs/type-byte-array.md b/rfcs/type-byte-array.md index 5b569c57c..bf0f6ef4e 100644 --- a/rfcs/type-byte-array.md +++ b/rfcs/type-byte-array.md @@ -111,7 +111,6 @@ Unless otherwise specified, if a read or write operation would cause an access o `buffer` also has a metatable, inside this metatable: * '__type' is defined to return 'buffer'. `type()` will return 'userdata' -* '__eq' is defined to compare buffers, by comparing sizes first, followed by content comparison * metatable is locked No other metamethod is defined, naming a few specific onces: @@ -119,7 +118,7 @@ No other metamethod is defined, naming a few specific onces: * '__index' is not defined, so there is no `b[1] = a` interface to write bytes. Neither can you call library functions as methods like `b:writei16(10, 12)` * '__iter' is not defined * '__tostring' is not defined, generic userdata behavior remains, returning 'buffer: 0xpointer' -* ordering is not defined +* '__eq'/'__lt'/'__le' are not defined ## Drawbacks From b6947765ef050d5bee2128d763ff82ada65342f4 Mon Sep 17 00:00:00 2001 From: Arseny Kapoulkine Date: Wed, 11 Oct 2023 08:43:12 -0700 Subject: [PATCH 08/18] Update type-byte-array.md Rewrite design/motivation sections. --- rfcs/type-byte-array.md | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/rfcs/type-byte-array.md b/rfcs/type-byte-array.md index bf0f6ef4e..016f8c07c 100644 --- a/rfcs/type-byte-array.md +++ b/rfcs/type-byte-array.md @@ -2,11 +2,19 @@ ## Summary -A new built in type to serve as an array of bytes, with a library for reading and writing to the internal buffer. A particularly good example type that this could be derived from is [this Java class](https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/nio/ByteBuffer.html). +A new built in type to serve as a mutable array of bytes, with a library for reading and writing the contents. ## Motivation -With this type, we solve the use cases for binary format encoding and decoding, compression, and active/compact memory. This opens the door for developers to work with file formats that might've been too large to represent with tables or to write to strings. It also allows for writing algorithms that deal with raw data often, such as compression or hashing. Web services that exchange data in packed formats could also benefit from this. +Existing mechanisms for representing binary data in Luau can be insufficient for performance-oriented use cases. + +A binary blob may be represented as an array of numbers 0-255 (idiomatic and reasonably performant, but very space-inefficient: each element takes 16 bytes, and it's difficult to work with data that is wider than bytes) or a string (only works for read-only cases, data extraction is possible via `string.unpack` but not very efficient). Neither of the two options are optimal, especially when the use case is data encoding (as opposed to decoding). + +While the host can provide custom data types that close this gap using `userdata` with overridden `__index`/`__newindex` that provide byte storage, the resulting type would be memory-efficient but not performance-efficient due to the cost of metamethod dispatch for every access. Additionally, since every host has a different API, this would make it difficult to write portable Luau algorithms that require efficient binary access. + +With this type, we solve the use cases for binary format encoding and decoding. This opens the door for developers to work with file formats that might've been too large to represent with tables or to write to strings. It also allows for writing algorithms that deal with raw data often, such as compression or hashing. Web services that exchange data in packed formats could also benefit from this. The new type can also serve as a more efficient internal representation for libraries that provide higher level objects like images or geometry data. + +Other high level languages support similar data structures, for example [Java ByteByffer](https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/nio/ByteBuffer.html) or [JavaScript ArrayBuffer](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer). ## Design From 6a4f1200a5c768459bfa5f3bb02e924dd9f2d09a Mon Sep 17 00:00:00 2001 From: vegorov-rbx <75688451+vegorov-rbx@users.noreply.github.com> Date: Wed, 11 Oct 2023 09:55:35 -0700 Subject: [PATCH 09/18] buffer is now a built-in type and a C API is defined --- rfcs/type-byte-array.md | 45 ++++++++++++++++++++++++++++------------- 1 file changed, 31 insertions(+), 14 deletions(-) diff --git a/rfcs/type-byte-array.md b/rfcs/type-byte-array.md index 016f8c07c..2712a15f6 100644 --- a/rfcs/type-byte-array.md +++ b/rfcs/type-byte-array.md @@ -18,7 +18,9 @@ Other high level languages support similar data structures, for example [Java By ## Design -This type will be called 'buffer' and will be implemented using `userdata` with a new reserved tag. +This type will be called 'buffer' and will be implemented using a new built-in type (GCObject with new tag). + +By default, metatable is not set for this type and can only be modified using `lua_setmetatable` C API. Operations on this type will be exposed through a new Luau library called 'buffer`, with the following functions: @@ -107,7 +109,9 @@ If an optional 'count' is specified, only 'count' bytes are taken from the strin --- -All offsets start at 0. +All offsets start at 0 (not to be confused with indices that start at 1 in Luau tables). +This choice is made for both performance reasons (no need to subtract 1) and for compatibility with data formats that often describe field positions using offsets. +While there is a way to solve the performance problem using luajit trick where table array part is allocated from index 0, this would mean that data in the buffer has 1 extra byte and this complicates the bounds checking. Read and write operations for relevant types are little endian as it is the most common use case, and conversion is often trivial to do manually. @@ -115,26 +119,39 @@ Additionally, unaligned offsets in all operations are valid and behave as expect Unless otherwise specified, if a read or write operation would cause an access outside the data in the buffer, an error is thrown. -### Metatable +### Public C API + +`void* lua_tobuffer(lua_State* L, int idx, size_t* len);` + +Used to fetch buffer data pointer and buffer size at specified location. + +If there is no buffer at the location, `NULL` is returned and `len` is not modified. + +`void* lua_newbuffer(lua_State* L, size_t l);` + +Pushes new buffer of size `l` onto the stack. -`buffer` also has a metatable, inside this metatable: -* '__type' is defined to return 'buffer'. `type()` will return 'userdata' -* metatable is locked +`void* luaL_checkbuffer(lua_State* L, int narg, size_t* len);` -No other metamethod is defined, naming a few specific onces: -* '__len' is not proposed at this time -* '__index' is not defined, so there is no `b[1] = a` interface to write bytes. Neither can you call library functions as methods like `b:writei16(10, 12)` -* '__iter' is not defined -* '__tostring' is not defined, generic userdata behavior remains, returning 'buffer: 0xpointer' -* '__eq'/'__lt'/'__le' are not defined +Similar to `lua_tobuffer`, but throws a tag error if there is no buffer at specified location. + +`int luaopen_buffer(lua_State* L);` + +Registers the 'buffer' library. If `luaL_openlibs` is used, that includes the 'buffer' library. + +`LUA_BUFFERLIBNAME` + +Macro containing the 'buffer' library name. ## Drawbacks This introduces 'buffer' as a class type in global typing context and adds new global 'buffer' table. -While class type might intersect with user-defined 'buffer' type, such type redefinitions ares already allowed in Luau, so this should not cause new type errors. +While class type might intersect with user-defined 'buffer' type, such type redefinitions are already allowed in Luau, so this should not cause new type errors. Same goes for the global table, users can already override globals like 'string', so additional of a new global is backwards-compatible, but new table will not be accessible in such a case. -Depending on implementation this could increase the complexity of the VM and related code. If this is to be implemented as a built-in, optimized type, it might need specialized fast paths for all relevant opcodes. +This increases the complexity of the VM a little bit, since support for new tagged type is required in interpreter loop and GC. + +There is also a string buffer C API; by having functions talk about 'buffer' (like `luaL_extendbuffer`) and use `luaL_Buffer`, it might be a point of confusion for C API users. ## Extensions From 21a72c19ca3a5295041c3bbaa742581dff69c31f Mon Sep 17 00:00:00 2001 From: vegorov-rbx <75688451+vegorov-rbx@users.noreply.github.com> Date: Fri, 13 Oct 2023 09:45:35 -0700 Subject: [PATCH 10/18] Update type-byte-array.md --- rfcs/type-byte-array.md | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/rfcs/type-byte-array.md b/rfcs/type-byte-array.md index 2712a15f6..76e65782a 100644 --- a/rfcs/type-byte-array.md +++ b/rfcs/type-byte-array.md @@ -44,14 +44,23 @@ Returns the buffer data as a string. Returns the size of the buffer. -`buffer.copy(target_buffer: buffer, target_offset: number, source_buffer: buffer, source_offset: number, count: number) -> ()` +`buffer.copy(target_buffer: buffer, target_offset: number, source_buffer: buffer, source_offset: number, count: number?): ()` Copy 'count' bytes from 'source_buffer' starting at offset 'source_offset' into the 'target_buffer' at 'target_offset'. It is possible for 'source_buffer' and 'target_buffer' to be the same. Copying an overlapping region inside the same buffer acts as if the source region is copied into a temporary buffer and then that buffer is copied over to the target. -Offsets and 'count' have to be numbers, each number is cast to an integer in an implementation-defined way. +If 'source_offset' is nil or is omitted, it defaults to 0. +If 'count' is 'nil' or is omitted, the whole 'source_buffer' data starting from 'source_offset' is taken. + +`buffer.fill(b: buffer, offset: number, value: number, count: number?): ()` + +Set 'count' bytes in the buffer starting from specified offset to 'value'. + +'value' is converted to unsigned integer using `bit32` library semantics, lower 8 bits are taken from the resulting integer to use as the byte value. + +If 'count' is 'nil' or is omitted, all bytes after the specified offset are set. `buffer.readi8(b: buffer, offset: number): number` @@ -92,7 +101,7 @@ When reading the value of any NaN representation, implementation can (but not re Used to write data to the buffer by converting the number into the type specified by the argument and reinterpreting it as individual bytes. -Conversion to integer numbers performs a truncation of the number value. Results of converting special number values (inf/nan) is platform-specific. +Conversion to integer numbers performs a truncation of the number value. Results of converting special number values (inf/nan) are platform-specific. Conversion to unsigned numbers uses `bit32` library semantics. Floating-point numbers are stored in a format specified by IEEE 754. @@ -105,7 +114,7 @@ Used to read a string of length 'count' from the buffer at specified offset. Used to write data from a string into the buffer at specified offset. -If an optional 'count' is specified, only 'count' bytes are taken from the string. 'count' cannot be larger that the string length. +If an optional 'count' is specified, only 'count' bytes are taken from the string. 'count' cannot be larger than the string length. --- @@ -113,6 +122,8 @@ All offsets start at 0 (not to be confused with indices that start at 1 in Luau This choice is made for both performance reasons (no need to subtract 1) and for compatibility with data formats that often describe field positions using offsets. While there is a way to solve the performance problem using luajit trick where table array part is allocated from index 0, this would mean that data in the buffer has 1 extra byte and this complicates the bounds checking. +Offsets and 'count' numbers are cast to an integer in an implementation-defined way. + Read and write operations for relevant types are little endian as it is the most common use case, and conversion is often trivial to do manually. Additionally, unaligned offsets in all operations are valid and behave as expected. From e7dc69adf1d4313fb34608f79b881a9aa386a98d Mon Sep 17 00:00:00 2001 From: vegorov-rbx <75688451+vegorov-rbx@users.noreply.github.com> Date: Fri, 13 Oct 2023 09:46:18 -0700 Subject: [PATCH 11/18] Update type-byte-array.md --- rfcs/type-byte-array.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/type-byte-array.md b/rfcs/type-byte-array.md index 76e65782a..a5de7b6fc 100644 --- a/rfcs/type-byte-array.md +++ b/rfcs/type-byte-array.md @@ -44,7 +44,7 @@ Returns the buffer data as a string. Returns the size of the buffer. -`buffer.copy(target_buffer: buffer, target_offset: number, source_buffer: buffer, source_offset: number, count: number?): ()` +`buffer.copy(target_buffer: buffer, target_offset: number, source_buffer: buffer, source_offset: number?, count: number?): ()` Copy 'count' bytes from 'source_buffer' starting at offset 'source_offset' into the 'target_buffer' at 'target_offset'. From cbe95e8dd8756a0c9a3408494e77413ba117cff2 Mon Sep 17 00:00:00 2001 From: vegorov-rbx <75688451+vegorov-rbx@users.noreply.github.com> Date: Mon, 16 Oct 2023 04:32:34 -0700 Subject: [PATCH 12/18] Update rfcs/type-byte-array.md Co-authored-by: Micah --- rfcs/type-byte-array.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/type-byte-array.md b/rfcs/type-byte-array.md index a5de7b6fc..d34626fe9 100644 --- a/rfcs/type-byte-array.md +++ b/rfcs/type-byte-array.md @@ -176,7 +176,7 @@ One drawback here might be that the cursor is attached to the data and raises a --- -Additional possibility will be to make the buffer change size automatically by `pushTYPE` interface. (explicit resize can be implemented with the existing interface). +An additional possibility would be to make the buffer change size automatically by `pushTYPE` interface (explicit resize can be implemented with the existing interface). This can also be changed almost transparently for older code. One difference will be that `pushTYPE` will not throw when reaching the end of the data. Unless it is decided that other write operations could also resize implicitly. From ad1afb9cfe62a0477c6809b7539491e7bb8267c8 Mon Sep 17 00:00:00 2001 From: vegorov-rbx <75688451+vegorov-rbx@users.noreply.github.com> Date: Mon, 16 Oct 2023 04:41:51 -0700 Subject: [PATCH 13/18] Update type-byte-array.md --- rfcs/type-byte-array.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/rfcs/type-byte-array.md b/rfcs/type-byte-array.md index d34626fe9..52e42cf86 100644 --- a/rfcs/type-byte-array.md +++ b/rfcs/type-byte-array.md @@ -80,7 +80,6 @@ If 'count' is 'nil' or is omitted, all bytes after the specified offset are set. Used to read the data from the buffer by reinterpreting bytes at the offset as the type in the argument and converting it into a number. -Floating-point numbers are read from a format specified by IEEE 754. When reading the value of any NaN representation, implementation can (but not required to) replace it with a different quiet NaN representation. `buffer.writei8(b: buffer, offset: number, value: number): ()` @@ -104,8 +103,6 @@ Used to write data to the buffer by converting the number into the type specifie Conversion to integer numbers performs a truncation of the number value. Results of converting special number values (inf/nan) are platform-specific. Conversion to unsigned numbers uses `bit32` library semantics. -Floating-point numbers are stored in a format specified by IEEE 754. - `buffer.readstring(b: buffer, offset: number, count: number): string` Used to read a string of length 'count' from the buffer at specified offset. @@ -126,6 +123,10 @@ Offsets and 'count' numbers are cast to an integer in an implementation-defined Read and write operations for relevant types are little endian as it is the most common use case, and conversion is often trivial to do manually. +Integer numbers are read and written using two's complement representation. + +Floating-point numbers are read and written using a format specified by IEEE 754. + Additionally, unaligned offsets in all operations are valid and behave as expected. Unless otherwise specified, if a read or write operation would cause an access outside the data in the buffer, an error is thrown. @@ -164,7 +165,7 @@ This increases the complexity of the VM a little bit, since support for new tagg There is also a string buffer C API; by having functions talk about 'buffer' (like `luaL_extendbuffer`) and use `luaL_Buffer`, it might be a point of confusion for C API users. -## Extensions +## Alternatives To support additional use cases, we can provide a set of `pushTYPE` and `takeTYPE` library functions and extend the type to have an internal cursor. This will make it easy to write/read data from a buffer as one would from a file, without having to track the current offset manually. From 7435a829d52b195c156e5b213811425604e2f984 Mon Sep 17 00:00:00 2001 From: Arseny Kapoulkine Date: Mon, 16 Oct 2023 08:57:25 -0700 Subject: [PATCH 14/18] Update type-byte-array.md Simplified Alternatives section --- rfcs/type-byte-array.md | 25 ++++--------------------- 1 file changed, 4 insertions(+), 21 deletions(-) diff --git a/rfcs/type-byte-array.md b/rfcs/type-byte-array.md index 52e42cf86..a1278b494 100644 --- a/rfcs/type-byte-array.md +++ b/rfcs/type-byte-array.md @@ -167,30 +167,13 @@ There is also a string buffer C API; by having functions talk about 'buffer' (li ## Alternatives -To support additional use cases, we can provide a set of `pushTYPE` and `takeTYPE` library functions and extend the type to have an internal cursor. -This will make it easy to write/read data from a buffer as one would from a file, without having to track the current offset manually. -Additional functions like `pos` and `setpos` can be added to access this internal cursor. - -This extension can be made by changing the internal representation without affecting older code. - -One drawback here might be that the cursor is attached to the data and raises a question if the value is preserved when object is serialized over the network. - ---- - -An additional possibility would be to make the buffer change size automatically by `pushTYPE` interface (explicit resize can be implemented with the existing interface). -This can also be changed almost transparently for older code. -One difference will be that `pushTYPE` will not throw when reaching the end of the data. Unless it is decided that other write operations could also resize implicitly. - -Implementation can have a performance impact however as data will be read through a pointer redirection. - -## Alternatives - The workarounds without this feature are significantly inefficient: * Tables can, at most, represent 64 bits per slot using expensive `vector` packing. - * Tables with or without packing severely bloat memory, as each array entry is subject to Luau value size and alignment. - * Strings are immutable and can’t be used to efficiently construct binary data without exponential allocations. - * Built in `string.pack` and `string.unpack` can’t cover more complex schemas on their own or formats which are edited mid-creation. + +The proposed buffer object has no cursor/position as part of its state; while it would be possible to implement this along with a separate set of APIs like `pushTYPE` and `takeTYPE`, this addition can be possible to implement later and it makes the buffer structure more complicated; additionally, external offset management might be easier to optimize and is more orthogonal as we do not need to duplicate stateful and stateless functions. + +The proposed buffer object is not resizeable; this is possible to implement later using explicit `buffer.resize` call, however this may result in a performance impact for native implemenation as the data will be read through a pointer redirection and will be more difficult to optimize; thus this version of the RFC only proposes fixed length buffers. That said, if resizeable buffers are desired in the future, we would plan to enhance the current buffer type instead of making a parallel resizeable buffer type to reduce complexity. From 2bc8de17eee6c78a3e3046451a90d8196702b111 Mon Sep 17 00:00:00 2001 From: Arseny Kapoulkine Date: Mon, 16 Oct 2023 11:15:38 -0700 Subject: [PATCH 15/18] Update type-byte-array.md Minor wording tweak --- rfcs/type-byte-array.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/type-byte-array.md b/rfcs/type-byte-array.md index a1278b494..853164ea6 100644 --- a/rfcs/type-byte-array.md +++ b/rfcs/type-byte-array.md @@ -174,6 +174,6 @@ The workarounds without this feature are significantly inefficient: * Strings are immutable and can’t be used to efficiently construct binary data without exponential allocations. * Built in `string.pack` and `string.unpack` can’t cover more complex schemas on their own or formats which are edited mid-creation. -The proposed buffer object has no cursor/position as part of its state; while it would be possible to implement this along with a separate set of APIs like `pushTYPE` and `takeTYPE`, this addition can be possible to implement later and it makes the buffer structure more complicated; additionally, external offset management might be easier to optimize and is more orthogonal as we do not need to duplicate stateful and stateless functions. +The proposed buffer object has no cursor/position as part of its state; while it would be possible to implement this along with a separate set of APIs like `pushTYPE` and `takeTYPE`, this addition is always possible to implement later and it makes the buffer structure more complicated; additionally, external offset management might be easier to optimize and is more orthogonal as we do not need to duplicate stateful and stateless functions. The proposed buffer object is not resizeable; this is possible to implement later using explicit `buffer.resize` call, however this may result in a performance impact for native implemenation as the data will be read through a pointer redirection and will be more difficult to optimize; thus this version of the RFC only proposes fixed length buffers. That said, if resizeable buffers are desired in the future, we would plan to enhance the current buffer type instead of making a parallel resizeable buffer type to reduce complexity. From c45e1d0181ded694b748b592d17f5d7e698cf017 Mon Sep 17 00:00:00 2001 From: Arseny Kapoulkine Date: Mon, 16 Oct 2023 11:16:31 -0700 Subject: [PATCH 16/18] Update and rename type-byte-array.md to type-byte-buffer.md --- rfcs/{type-byte-array.md => type-byte-buffer.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename rfcs/{type-byte-array.md => type-byte-buffer.md} (99%) diff --git a/rfcs/type-byte-array.md b/rfcs/type-byte-buffer.md similarity index 99% rename from rfcs/type-byte-array.md rename to rfcs/type-byte-buffer.md index 853164ea6..1180fdbb4 100644 --- a/rfcs/type-byte-array.md +++ b/rfcs/type-byte-buffer.md @@ -1,4 +1,4 @@ -# Byte Array Type +# Byte buffer type ## Summary From 740aa8d730095e4ee475ac27d702ef5c9e64c7c0 Mon Sep 17 00:00:00 2001 From: vegorov-rbx <75688451+vegorov-rbx@users.noreply.github.com> Date: Wed, 18 Oct 2023 06:04:01 -0700 Subject: [PATCH 17/18] Update type-byte-buffer.md --- rfcs/type-byte-buffer.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/rfcs/type-byte-buffer.md b/rfcs/type-byte-buffer.md index 1180fdbb4..e4e086e04 100644 --- a/rfcs/type-byte-buffer.md +++ b/rfcs/type-byte-buffer.md @@ -143,6 +143,12 @@ If there is no buffer at the location, `NULL` is returned and `len` is not modif Pushes new buffer of size `l` onto the stack. +`lua_isbuffer(L, n)` + +C macro helper to check if value at the specified location is a buffer. + +Simiar to `lua_istable`/`lua_isvector`/`lua_isthread` it's a simple wrapper over `lua_type` call and doesn't require internal coercions/internal field access like `lua_isnumber`/`lua_iscfunction`. + `void* luaL_checkbuffer(lua_State* L, int narg, size_t* len);` Similar to `lua_tobuffer`, but throws a tag error if there is no buffer at specified location. From d100d463573f385d837b75975afd593738b30fe6 Mon Sep 17 00:00:00 2001 From: vegorov-rbx <75688451+vegorov-rbx@users.noreply.github.com> Date: Thu, 19 Oct 2023 01:13:16 -0700 Subject: [PATCH 18/18] Update type-byte-buffer.md --- rfcs/type-byte-buffer.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/rfcs/type-byte-buffer.md b/rfcs/type-byte-buffer.md index e4e086e04..b01ac00c2 100644 --- a/rfcs/type-byte-buffer.md +++ b/rfcs/type-byte-buffer.md @@ -2,11 +2,11 @@ ## Summary -A new built in type to serve as a mutable array of bytes, with a library for reading and writing the contents. +A new built-in type to serve as a mutable array of bytes, with a library for reading and writing the contents. ## Motivation -Existing mechanisms for representing binary data in Luau can be insufficient for performance-oriented use cases. +The existing mechanisms for representing binary data in Luau can be insufficient for performance-oriented use cases. A binary blob may be represented as an array of numbers 0-255 (idiomatic and reasonably performant, but very space-inefficient: each element takes 16 bytes, and it's difficult to work with data that is wider than bytes) or a string (only works for read-only cases, data extraction is possible via `string.unpack` but not very efficient). Neither of the two options are optimal, especially when the use case is data encoding (as opposed to decoding). @@ -14,7 +14,7 @@ While the host can provide custom data types that close this gap using `userdata With this type, we solve the use cases for binary format encoding and decoding. This opens the door for developers to work with file formats that might've been too large to represent with tables or to write to strings. It also allows for writing algorithms that deal with raw data often, such as compression or hashing. Web services that exchange data in packed formats could also benefit from this. The new type can also serve as a more efficient internal representation for libraries that provide higher level objects like images or geometry data. -Other high level languages support similar data structures, for example [Java ByteByffer](https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/nio/ByteBuffer.html) or [JavaScript ArrayBuffer](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer). +Other high-level languages support similar data structures, for example [Java ByteByffer](https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/nio/ByteBuffer.html) or [JavaScript ArrayBuffer](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer). ## Design @@ -34,7 +34,7 @@ Each byte is initialized to 0. `buffer.fromstring(str: string): buffer` Instantiates the object from a string. -Size of the buffer is fixed and equals to the length of the string. +The size of the buffer is fixed and equals to the length of the string. `buffer.tostring(): string` @@ -165,7 +165,7 @@ Macro containing the 'buffer' library name. This introduces 'buffer' as a class type in global typing context and adds new global 'buffer' table. While class type might intersect with user-defined 'buffer' type, such type redefinitions are already allowed in Luau, so this should not cause new type errors. -Same goes for the global table, users can already override globals like 'string', so additional of a new global is backwards-compatible, but new table will not be accessible in such a case. +The same goes for the global table, users can already override globals like 'string', so additional of a new global is backwards-compatible, but new table will not be accessible in such a case. This increases the complexity of the VM a little bit, since support for new tagged type is required in interpreter loop and GC. @@ -182,4 +182,4 @@ The workarounds without this feature are significantly inefficient: The proposed buffer object has no cursor/position as part of its state; while it would be possible to implement this along with a separate set of APIs like `pushTYPE` and `takeTYPE`, this addition is always possible to implement later and it makes the buffer structure more complicated; additionally, external offset management might be easier to optimize and is more orthogonal as we do not need to duplicate stateful and stateless functions. -The proposed buffer object is not resizeable; this is possible to implement later using explicit `buffer.resize` call, however this may result in a performance impact for native implemenation as the data will be read through a pointer redirection and will be more difficult to optimize; thus this version of the RFC only proposes fixed length buffers. That said, if resizeable buffers are desired in the future, we would plan to enhance the current buffer type instead of making a parallel resizeable buffer type to reduce complexity. +The proposed buffer object is not resizeable; this is possible to implement later using explicit `buffer.resize` call, however this may result in a performance impact for native implementation as the data will be read through a pointer redirection and will be more difficult to optimize; thus, this version of the RFC only proposes fixed length buffers. That said, if resizeable buffers are desired in the future, we would plan to enhance the current buffer type instead of making a parallel resizeable buffer type to reduce complexity.