Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add (optional at compile-time) G3Frame JSON output #69

Draft
wants to merge 65 commits into
base: master
Choose a base branch
from

Conversation

cozzyd
Copy link
Contributor

@cozzyd cozzyd commented Dec 17, 2021

I envision it being useful to access data in .g3 format from web
applications for monitoring purposes. For that reason, it would
be useful to have a way to convert any data to JSON (one possible alternative
would be a .g3 binary file reader in javascript, but that is much more
work).

Fortunately, cereal supports JSON as an archive format, so adding
JSON output is nearly trivial. We mostly just need to make sure
the JSONOutputArchive version of all the serializations are compiled,
and write a saveJSON() method for g3frame. An asJSON() (as_json() in
Python) method is also provided that returns a string.

There there are a few small differences from binary output:

  • Don't bother emitting crc sums
  • Don't FLAC encode, ever
  • Output the character instead of the number for frametype

Currently, this is only enabled by a new compile-time option (the cmake
variable ENABLE_JSON_OUTPUT), though in the future it will probably be
enabled by default if it doesn't break anything. It does add moderately
to binary size and compile time, but hopefully that's not a huge deal.
The asJSON/as_json methods still exist without JSON support, but return
an error in JSON format.

Also included is a new script, spt3g-jsonify, that will read in a
.g3[.gz] file and output a json stream as a proof of concept.

There are a few places where some other code had to be modified, due to
the different API for binary in cereal text and binary archive formats.
Actually, it's in code that will never be run, but gets generated and
must compile.

Still TODO:

  • docs
  • tests
  • example HTTP endpoint (likely using boost::beast or equivalent to produce gzipped json).

I envision it being useful to access data in .g3 format from web
applications for monitoring purposes. For that reason, it would
be useful to have a way to convert any data to JSON (one possible alternative
would be a .g3 binary file reader in javascript, but that is much more
work).

Fortunately, cereal supports JSON as an archive format, so adding
JSON output is nearly trivial. We mostly just need to make sure
the JSONOutputArchive version of all the serializations are compiled,
and write a saveJSON() method for g3frame. An asJSON() (as_json() in
Python) method is also provided that returns a string.

There there are a few small differences from binary output:

 - Don't bother emitting crc sums
 - Don't FLAC encode, ever
 - Output the character instead of the number for frametype

Currently, this is only enabled by a new compile-time option (the cmake
variable ENABLE_JSON_OUTPUT), though in the future it will probably be
enabled by default if it doesn't break anything. It does add moderately
to binary size and compile time, but hopefully that's not a huge deal.
The asJSON/as_json methods still exist without JSON support, but return
an error in JSON format.

Also included is a new script, spt3g-jsonify, that will read in a
.g3[.gz] file and output a json stream as a proof of concept.

There are a few places where some other code had to be modified, due to
the different API for binary in cereal text and binary archive formats.
Actually, it's in code that will never be run, but gets generated and
must compile.
CMakeLists.txt Outdated Show resolved Hide resolved
By using the PUBLIC target, it both affects the compilation of core and
anything compiled against core.
@arahlin
Copy link
Member

arahlin commented Dec 18, 2021

By the way, I cherry-picked your cmake action fix onto master. Thanks for fixing that!

cozzyd and others added 12 commits April 22, 2022 09:34
I envision it being useful to access data in .g3 format from web
applications for monitoring purposes. For that reason, it would
be useful to have a way to convert any data to JSON (one possible alternative
would be a .g3 binary file reader in javascript, but that is much more
work).

Fortunately, cereal supports JSON as an archive format, so adding
JSON output is nearly trivial. We mostly just need to make sure
the JSONOutputArchive version of all the serializations are compiled,
and write a saveJSON() method for g3frame. An asJSON() (as_json() in
Python) method is also provided that returns a string.

There there are a few small differences from binary output:

 - Don't bother emitting crc sums
 - Don't FLAC encode, ever
 - Output the character instead of the number for frametype

Currently, this is only enabled by a new compile-time option (the cmake
variable ENABLE_JSON_OUTPUT), though in the future it will probably be
enabled by default if it doesn't break anything. It does add moderately
to binary size and compile time, but hopefully that's not a huge deal.
The asJSON/as_json methods still exist without JSON support, but return
an error in JSON format.

Also included is a new script, spt3g-jsonify, that will read in a
.g3[.gz] file and output a json stream as a proof of concept.

There are a few places where some other code had to be modified, due to
the different API for binary in cereal text and binary archive formats.
Actually, it's in code that will never be run, but gets generated and
must compile.
By using the PUBLIC target, it both affects the compilation of core and
anything compiled against core.
@arahlin
Copy link
Member

arahlin commented Feb 27, 2024

I think it probably makes some sense to move the python GIL / threading context machinery to a separate PR, since it's used in a few different places (G3PipelineInfo, G3Reader, G3Writer, G3EventBuilder...) and is not specific to this particular feature.

arahlin and others added 10 commits February 27, 2024 23:57
This PR creates a new class that simplifies initialization of python threads, as
well as acquiriing / releasing the Python global interpreter lock in various
contexts.

Use cases include:

1. Ensuring that Py_Initialize() is properly called at the beginning of a
   program that is expected to interact with the python interpreter, and also
   that Py_Finalize() is called when the program is finished.
2. Ensuring that the current thread state is saved and the GIL released as
   necessary, e.g. for IO operations, and then the thread state is restored on
   completion.
3. Ensuring that the GIL is acquired for one-off interaction with the python
   interpreter, and released when complete.

A G3PythonContext object is used throughout the library code for cases 2 and 3.
If the python interpreter has not been initialized (i.e. the compiled program is
expected to be purely in C++), then these context objects are essentially no-op.
If the python interpreter is initialized (e.g. inside a python program or
command-line interface), then these context objects will handle the GIL
appropriately.

See the examples/cppexample.cxx C++ program for a simple implementation of the
above behavior.

This PR also adds logic throughout the G3PipelineInfo and G3ModuleConfig
class definitions to enable them to serialize appropriately in a pure-C++
program.
These are python objects, and if we allow them to be deleted otherwise,
bad things happen. This fixes at least most of the concurrency problems
I have with reading files that have G3PipelineInfo in them?
These are python objects, and if we allow them to be deleted otherwise,
bad things happen. This fixes at least most of the concurrency problems
I have with reading files that have G3PipelineInfo in them?
Use a G3MapFrameObject storage structure for the module arguments, rather than a
map of python objects.  Since the serialization process requires a call to
repr() for non-G3FrameObjects anyway, do this step in the python shim that
creates the config in the first place.

Also ensure that simple scalar values are serialized as frame objects.

Adds a new ``spt3g.core.to_g3frameobject`` function for converting
python objects to G3FrameObjects.
@arahlin
Copy link
Member

arahlin commented Mar 1, 2024

The context PR is merged, but you probably need #148 (also merged, should fix your latest build failure) and #147 (pending review) to make this work.

@cozzyd
Copy link
Contributor Author

cozzyd commented Mar 1, 2024

Hmm, I wonder why it worked for me without updating the c++ standard...

@arahlin
Copy link
Member

arahlin commented Mar 3, 2024

Ok, you should be able to merge with master, so that just your json changes would be part of this PR now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants