Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider orjson support #271

Closed
Tinche opened this issue Jul 4, 2020 · 11 comments
Closed

Consider orjson support #271

Tinche opened this issue Jul 4, 2020 · 11 comments

Comments

@Tinche
Copy link
Contributor

Tinche commented Jul 4, 2020

Hi,

orjson is a new contender in the space of optimized json libraries, claiming to be the fastest. I'm wondering if it can be made to work with structlog.

orjson only outputs bytes (utf8 encoded strings). Decoding this output back into a string would probably lose some of the advantages of using orjson in the first place, so I'm interested whether we can figure out a way to integrate it without decoding.

I'm mostly interested in writing to stdout/stderr in a server environment, where these optimizations are most useful. A custom logger class could write the bytes directly to stdout using sys.stdout.buffer, I guess.

structlog doesn't like the final output of the processor chain to be bytes, right?

@hynek
Copy link
Owner

hynek commented Jul 5, 2020

structlog doesn't care at all; people use it to send dicts to servers. All that'd be needed is a BytesLogger or something in https://github.com/hynek/structlog/blob/master/src/structlog/_loggers.py

@Tinche
Copy link
Contributor Author

Tinche commented Jul 5, 2020

@hynek
Copy link
Owner

hynek commented Jul 5, 2020

Returning strings is just a shortcut for ((string,),{}) IIRC.

@Tinche
Copy link
Contributor Author

Tinche commented Jul 5, 2020

Ah I see, so there's nothing stopping me from playing with this. Thanks!

@mklokocka
Copy link

So do I understand this correctly that to make it work at the moment without decoding the bytes to a string (and this indeed loses some performance according to a small benchmark I did) I need to do something like the following:

class BytesLoggerFactory(structlog.PrintLoggerFactory):
    """Produce `BytesLogger`."""
    def __init__(self, file=None):
        super().__init__(file)

    def __call__(self, *args):
        return BytesLogger(self._file)


class BytesLogger(structlog.PrintLogger):
    """Print events as bytes into a file."""
    def __init__(self, file=None):
        super().__init__(file)

        self._write = self._file.buffer.write
        self._flush = self._file.buffer.flush

    def __setstate__(self, state):
        super().__setstate__(state)

        self._write = self._file.buffer.write
        self._flush = self._file.buffer.flush

    def msg(self, message):
        """
        Print *message*.
        """
        with self._lock:
            until_not_interrupted(self._write, message + b"\n")
            until_not_interrupted(self._flush)

    log = debug = info = warn = warning = msg
    fatal = failure = err = error = critical = exception = msg

and for serialization use something like

def serializer(event_dict, **kwargs):
    """Serialize with orjson.

    Args:
        event_dict: Context.
        **kwargs: Serialization keyword arguments.

    """
    return (orjson.dumps(event_dict, option=orjson.OPT_NON_STR_KEYS, **kwargs),), {}

to avoid exceptions from the structlog.stdlib.BoundLogger? :)

@hynek
Copy link
Owner

hynek commented Jul 29, 2020

You tell us, if you tested/benchmarked it. :D

Do you have any numbers on vs stdlib json and simplejson? Please use https://pypi.org/project/pyperf/

We might fold it into the docs if it's significant enough.

@mklokocka
Copy link

I only did a basic benchmark of orjson (with and without decoding of bytes to string) vs stdlib json and rapidjson on an "average event dict" that I send to structlog. I also compared orjson vs stdlib on dumps and loads on a fairly large (~500 KB) JSON file. The numbers on my machine are promising, but I would have to redo the benchmark to make it presentable. :)

There are quite a few results on the orjson page though.

I am not really sure if or how it would be possible to benchmark the integration of orjson and structlog however.

@hynek
Copy link
Owner

hynek commented Aug 14, 2020

Yeah I know orjson is very fast, the question is if it makes any difference in the context of small dicts that go thru I/O.

@hynek hynek closed this as completed in 45ce4e0 Nov 8, 2020
@hynek
Copy link
Owner

hynek commented Nov 8, 2020

I believe https://www.structlog.org/en/latest/api.html#structlog.BytesLogger does what you asked for. LMK if not.

I'll try to add basic log levels before releasing too.

@Tinche
Copy link
Contributor Author

Tinche commented Nov 8, 2020

Excellent, thanks! Will give this a try.

hynek added a commit that referenced this issue Nov 12, 2020
@hynek
Copy link
Owner

hynek commented Nov 12, 2020

1baf23f was the missing piece that allows processors to return bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants