Skip to content

Latest commit

 

History

History
225 lines (153 loc) · 11.9 KB

README.md

File metadata and controls

225 lines (153 loc) · 11.9 KB

Jil (WIP)

A fast JSON serializer, built on Sigil with a number of somewhat crazy optimization tricks.

While usable in it's current state, Jil is far from finished. It should be treated as a Work In Progress, don't use it for anything serious just yet...

Preliminary releases are available on Nuget in addition to this repository.

Usage

    using(var output = new StringWriter())
	{
		JSON.Serialize(
			new
			{
				MyInt = 1,
				MyString = "hello world",
				// etc.
			},
			output
		);
	}

The first time Jil is used to serialize a given configuration and type pair, it will spend extra time building the serializer. Subsequent invocations will be much faster, so if a consistently fast runtime is necessary in your code you may want to "prime the pump" with an earlier "throw away" serialization.

The suggested way to use Jil is with the generic JSON.Serialize method, however a slightly slower JSON.SerializeDynamic method is also available which does not require types to be known at compile time. SerializeDynamic always does a few extra lookups and branhches when compared to Serialize, and the first invocation for a given type will do a small amount of additiona code generation.

Note, at this time Jil does not include a JSON deserializer.

Supported Types

Jil will only serialize types that can be reasonably represented as JSON.

The following types (and any user defined types composed of them) are supported:

  • Strings (including char)
  • Booleans
  • Integer numbers (int, long, byte, etc.)
  • Floating point numbers (float, double, and decimal)
  • DateTimes
    • See Configuration for further details
  • Nullable types
  • Enumerations
  • Guids
  • IList<T> implementations
  • IDictionary<TKey, TValue> implementations where TKey is a string or enumeration

Jil serializes public fields and properties; the order in which they are serialized is not defined (it is unlikely to be in declaration order).

Configuration

Jil's JSON.Serialize method takes an optional Options parameter which controls:

  • The format of serialized DateTimes, one of
    • NewtonsoftStyleMillisecondsSinceUnixEpoch, a string, ie. "/Date(##...##)/"
    • MillisecondsSinceUnixEpoch, a number, which can be passed directly to JavaScript's Date() constructor
    • SecondsSinceUnixEpoch, a number, commonly refered to as unix time
    • ISO8601, a string, ie. "2011-07-14T19:43:37Z"
  • Whether or not to exclude null values when serializing dictionaries, and object members
  • Whether or not to "pretty print" while serializing, which adds extra linebreaks and whitespace for presentation's sake
  • Whether or not the JSON will be used as JSONP (which requires slightly more work be done w.r.t. escaping)
  • Whether or not to include inherited members when serializing

Benchmarks

Jil aims to be the fastest general purpose JSON serializer for .NET. Flexibility and "nice to have" features are explicitly discounted in the pursuit of speed.

For comparison, here's how Jil stacks up against other popular .NET serializers in a synthetic benchmark:

All three libraries are in use at Stack Exchange in various production roles.

Numbers can found in this Google Document.

The Question, Answer, and User types are taken from the Stack Exchange API.

Data for each type is randomly generated from a fixed seed. Random text is biased towards ASCII*, but includes all unicode.

To sanity check these results, a serializer benchmark from theburningmonk was forked to include Jil. Source available on Github.

Numbers can be found in this Google Document. Note that times are in milliseconds in this benchmark, and in microseconds in the preceeding one. Also be aware that the following serializers in theburningmonk benchmark are not JSON serializers: protobuf-net, MongoDB Driver BSON, and Json.Net BSON.

These benchmarks were run on a machine with the following specs:

  • Operating System: Windows 8 Enterprise 64-bit (6.2, Build 9200) (9200.win8_gdr.130531-1504)
  • System Manufacturer: Apple Inc.
  • System Model: MacBookPro8,2
  • Processor: Intel(R) Core(TM) i7-2860QM CPU @ 2.50GHz (8 CPUs), ~2.5GHz
  • Memory: 8192MB RAM
    • DDR3
    • Dual Channel
    • 665.2 MHZ

As with all benchmarks, take these with a grain of salt.

*This is meant to simulate typical content from the Stack Exchange API.

Tricks

Jil has a lot of tricks to make it fast. These may be interesting, even if Jil itself is too limitted for your use.

Sigil

Jil does a lot of IL generation to produce tight, focus code. While possible with ILGenerator, Jil instead uses the Sigil library. Sigil automatically does a lot of the busy work you'd normally have to do manually to produce ideal IL. Using Sigil also makes hacking on Jil much more productive, as debuging IL generation without it is pretty slow going.

Trade Memory For Speed

Jil's internal serializers are (in the absense of recursive types) monolithic, and per-type; avoiding extra runtime lookups, and giving .NET's JIT more context when generating machine code.

The serializers Jil create also do no Options checking at serialization time, Options are baked in at first use. This means that Jil may create up to 32 different serializers for a single type (though in practice, many fewer).

Optimizing Member Access Order

Perhaps the most arcane code in Jil determines the preferred order to access members, so the CPU doesn't stall waiting for values from memory.

Members are divided up into 4 groups:

  • Simple
    • primitive ValueTypes such as int, double, etc.
  • Nullable Types
  • Recursive Types
  • Everything Else

Members within each group are ordered by the offset of the fields backing them (properties are decompiled to determine fields they use).

This is a fairly naive implementation of this idea, there's almost more that could be squeezed out especially with regards to consistency of gains.

Don't Allocate If You Can Avoid It

.NET's GC is excellent, but no-GC is still faster than any-GC.

Jil tries to avoid allocating any reference types, with following exceptions:

Escaping Tricks

JSON has escaping rules for \, ", and control characters. These can be kind be time consuming to deal with, Jil avoids as much as possible in two ways.

First, all known key names are once and baked into the generated delegates like so. Known keys are member names and enumeration values.

Second, rather than lookup encoded characters in a dictionary or a long series of branches Jil does explicit checks for " and \ and turns the rest into a subtraction and jump table lookup. This comes out to ~three branches (with mostly consistently taken paths, good for branch prediction in theory) per character.

This works because control characters in .NET strings (bascally UTF-16, but might as well be ASCII for this trick) are sequential, being [0,31].

JSONP also requires escaping of line separator (\u2028) and paragraph separator (\u2029) characters. When configured to serialize JSONP, Jil escapes them in the same manner as \ and ".

Custom Number Formatting

While number formatting in .NET is pretty fast, it has a lot of baggage to handle custom number formatting.

Since JSON has a strict definition of a number, a Write() implementation without configuration is noticeably faster. To go the extra mile, Jil contains separate implementations for int, uint, ulong, and long.

Jil does not include custom decimal, double, or single Write() implementations, as despite my best efforts I haven't been able to beat the one's built into .NET. If you think you're up to the challenge, I'd be really interested in seeing code that is faster than the included implementations.

Custom Date Formatting

Similarly to numbers, each of Jil's date formats has a custom Write() implementation.

Custom Guid Formatting

Noticing a pattern?

Jil has a custom Guid writer (which is one of the reason's Jil only supports the D format).

Fun fact about this method, I tested a more branch heavy version (which removed the byte lookup) which turned out to be considerably slower than the built-in method due to branch prediction failures. Type 4 Guids being random makes for something quite close to the worst case for branch prediciton.

Different Code For Arrays

Although arrays implement IList<T> the JIT generates much better code if you give it array-ish IL to chew on, so Jil does so.

Special Casing Enumerations With Sequential Values

Many enums end up having sequential values, Jil will exploit this if possible and generate a subtraction and jump table lookup. Non-sequential enumerations are handled with a long series of branches.