-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
c7a1fb7
commit d29ecf7
Showing
28 changed files
with
677 additions
and
170 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
name: Main | ||
name: Build | ||
|
||
on: | ||
workflow_dispatch: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
# Simple, Versatile Format | ||
|
||
[![Build Status](https://github.com/tempname11/svf/workflows/main/badge.svg)](https://github.com/tempname11/svf/actions?workflow=main) | ||
![Alpha](https://img.shields.io/badge/alpha-blue) | ||
|
||
**SVF** is a format for structured data that is focused on being both machine-friendly, and human-friendly — in that order! Currently, it's in **alpha**, meaning that while most of the core functionality is working, there is still a lot of work to be done (see [Roadmap](#roadmap)). | ||
|
||
More precisely, this project currently includes: | ||
- A small text language to describe data schemas. | ||
- A CLI tool to work with the schemas. | ||
- C and C++ libraries to actually work with data at runtime. | ||
|
||
### Machine-friendly | ||
|
||
One of the design goals for the format is simplicity. "Simplicity" includes "how many things the machine has to do before it can access the data". The vast majority of the modern CPUs are 64-bit, little endian (x86-64 and ARM), and it makes sense to optimize for them. The format is binary, and bar [versioning hurdles](#Data evolution), reading it essentially involves repeatedly viewing regions of memory as known structures (i.e. C/C++ structs, with caveats). Writing it is similarly straightforward, in typical scenarios involving just copying structures to buffers, and lightweight book-keeping. | ||
|
||
If the data you are reading is "binary-compatible" with the schema you are expecting, then the overhead should be small. If not, there will be a pre-processing step involved. However, these cases should be either rare, or entirely avoidable with a little bit of planning. | ||
|
||
### Human-friendly | ||
|
||
The recent proliferation of text-based data formats like JSON seemingly tells us that they are superior, at least in terms of being more suited for viewing and manipulation by people. That may be true (it's not easy to read binary blobs, let alone modify them by hand!), but a large part of the problem is simply solved by good tooling, and the remaining drawbacks are outweighted by the beneftis. This is an assumption that needs proving, of course. | ||
|
||
A key part of this approach is a good GUI viewer/editor for the format. See [Roadmap](#roadmap). | ||
|
||
### Data Schemas | ||
|
||
The format is structured, meaning that all data has an associated schema. Here is a silly example of a schema: | ||
```rs | ||
#name Hello | ||
|
||
World: struct { | ||
population: U64; // 64-bit unsigned integer. | ||
gravitational_constant: F32; // 32-bit floating-point. | ||
current_year: I16; // 16-bit integer (signed). | ||
mechanics: Mechanics; | ||
name: String; | ||
}; | ||
|
||
Mechanics: choice { | ||
classical; | ||
quantum; | ||
custom: String; | ||
}; | ||
|
||
String: struct { | ||
utf8: U8[]; | ||
}; | ||
``` | ||
|
||
Note, that the concept of a "string" is not built-in, but can be easily expressed in a schema. In this instance, UTF-8 is the chosen encoding, and expressed how you would expect — as an array 8-bit characters. | ||
|
||
### Data Evolution | ||
|
||
The above schema may resemble `struct` declarations in C/C++. In fact, the CLI code generator will output structures very much like what you would expect. What's the benefit over writing these C/C++ declarations by hand? Well, programs change over time, with their data requirements also changing. There are two typical scenarios: | ||
|
||
- Data is written into a file by a program. Later, a newer version of that program, reads it back. This requires **backward compatibility**, if the schema was modified. | ||
- Data is being sent over the network, say, from server to client. The server is updated. The client is now older than the server, but still needs to process the message. This requires **forward compatibility**, if the server schema was modified. | ||
|
||
Both of these scenarios are covered by: | ||
- Checking the schema compatibility before reading the message. | ||
- Potentially converting the data to the new schema. This is only necessary, if the schemas are not "binary-compatible". | ||
|
||
### Usage | ||
|
||
See: | ||
- ["Hello" example schema](svf_tools/schema/Hello.txt), the same as above. | ||
- ["Hello" written to file in C](svf_tools/src/test/hello_write.c) | ||
- ["Hello" read from bytes in C++](svf_tools/src/test/hello_read.cpp) | ||
- ["JSON" example schema](svf_tools/schema/JSON.txt), as a small exercise to encode JSON. | ||
- ["JSON" written to file in C++](svf_tools/src/test/json_write.cpp) | ||
- ["JSON" read from bytes in C](svf_tools/src/test/json_read.c) | ||
|
||
### Roadmap | ||
|
||
There is a number of things left to do before **beta** status can be assigned, which would signify that the project is tentatively ready for "serious" usage. This list is not comprehensive. | ||
|
||
- Adversarial inputs need to be handled well in the runtime libraries. Since the format is binary, and the code is written in C, there's a lot of potential for exploits. There are some guardrails in place already, and nothing prevents safety in theory, but there was no big, focused effort in this direction yet. | ||
- Better error handling, with readable error messages where applicable (e.g. `svfc` output). Currently, a lot of code simply sets an error flag, it gets checked later, and that's about it. | ||
- More extensive testing. Some basic tests are in-place, but something more serious needs to be done to catch corner cases. Perhaps, a fuzzy test case generator. | ||
- Clear platform support. This includes OS, CPU architectures, compilers, language standards. | ||
- Schema-less messages, and generally more focus on optimizing working with schema at runtime. Right now, the full schema is always included in the message, which is simple, and easy to debug. But e.g. in network scenarios, where the data might be much smaller than the schema, and transferred often, excluding it would be critical. This means that the user's code would need some other way to get the schema. Also, checking schema compatibility is not free, and could be skipped in some cases. | ||
- Once the data format is stabilized, some kind of unambiguous specification, whether formal, or informal, needs to be written down. This is also true for the schema text language, although less important. | ||
- A GUI tool to work with data, which is essential for debugging, and for general use as well. It needs to be easy to use, and runnable on most developer machines. | ||
|
||
### Alternatives | ||
|
||
Here are some established formats you might want to use instead. They have their own drawbacks, of course, which is why this project was started. | ||
|
||
- Protocol Buffers: https://protobuf.dev/ | ||
- Cap'n Proto: https://capnproto.org/ | ||
- FlatBuffers: https://flatbuffers.dev/ | ||
- Text-based formats, like JSON, YAML, or XML. | ||
|
||
### License | ||
|
||
[MIT](./LICENSE.txt). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.