-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can I assume that pandoc-api-version
always appears first?
#3211
Comments
The JSON format defines objects as (emphasis added):
So no, it's not guaranteed that a certain name (like |
Sure, it's unordered, but at the same time Pandoc controls the implementation which is why I'm asking. For instance, I haven't had a single instance of 't' appearing after 'c' , and I relied on that to speed up the JSON decoding significantly. Other advantages are:
I don't know if writing a deterministic JSON is trivial on the Haskell side or not. If it's a big task and would mess up the codebase, then the clear choice would be to rewrite parts of panflute to be order-agnostic, even at the cost of speed. If it's completely trivial, then it might even make it easier for Pandoc when debugging. On a related note, JSON feels quite verbose compared to native format, perhaps something like messagePack, or taking out the |
No, you can't assume it. Pandoc uses the aeson library which doesn't guarantee it. In our tests, we saw the order change under windows, but not under linux/osx. In any case, since it's not part of the JSON spec, we can't expect it of other output. |
Sounds reasonable |
We could use aeson-pretty to make the order deterministic https://hackage.haskell.org/package/aeson-pretty-0.8.2/docs/Data-Aeson-Encode-Pretty.html If we set indent to 0, we could get the deterministic order Not sure it's worth it though. Anyone who consumes |
|
Agreed. If we're discussing the JSON format already however (and apologies if this was discussed somewhere before, maybe along @jkr's rewrite of the JSON handling...?): Has a more self-contained and more self-explanatory JSON serialization of the pandoc AST been considered? For example instead of:
something like this:
With this format we could easily add more fields to elements. And filters that don't know those fields would simply ignore them without breaking. |
+++ Mauro Bieg [Nov 07 16 01:14 ]:
That's a great idea, and it's a shame we didn't consider @jkr what do you think? |
Definitely, and AFAIK everyone is doing that. However, in order to maintain the order of the loaded JSON, python allows objects to be passed as tuples instead of dicts. But I agree that depending on features not in the JSON standard would probably cause headaches down the line. About Mauro's suggestion; agree that it makes reading and debugging the JSON way easier. |
Looks clearer to me. The only issue I have is that it breaks the "t"/"c" model. I'm not sure if that's a big deal, but if we have "text" as a key here, why not have "text" as a key in "Str" too? But I would like to replace an array with an object wherever possible. I guess this seems more consistent to me:
|
Well, I'd use an array when you have a list of things and an object when you have key-value pairs...
Yeah, that was not really though through. The idea is to keep the JSON as simple as possible and only nest when you have child elements in the document tree (usually With that in mind, here a fleshed out version (I'm using
|
Yeah, I wasn't clear. I prefer objects to arrays when you're modeling structure: keys are better than having to figure out what something is by its position in a list. (It used to be that meta was json[0] and blocks was json[1], for example). I'd be curious to see what sort of slowdown typing out "text" and "content" produces on large files running through filters. On a few of my tests, removing the empty lists ( |
A large slowdown would be my guess. Removing the empty lists reduced the size of some of my test files from ~ 1.8mb to 200kb, and extending the names of common fields would do the opposite. |
As of Pandoc 2.7.1 is JSON AST output deterministic? We are thinking of prettifying the JSON and tracking it using git. Therefore, deterministic ordering would be greatly helpful for having clean diffs. We could always sort by keys, but I imagine there is actually some intelligible order to the JSON that is best to preserve if possible. |
I don't think the output is completely deterministic,
because of maps in YAML metadata. (JSON serialization
of hash maps is not deterministic.)
Daniel Himmelstein <[email protected]> writes:
…> We could use aeson-pretty to make the order deterministic and also make the JSON human-readable in other ways.
If we set indent to 0, we could get the deterministic order without the extra whitespace.
As of Pandoc 2.7.1 is JSON AST output deterministic? We are thinking of prettifying the JSON and tracking it using git. Therefore, deterministic ordering would be greatly helpful for having clean diffs. We could always sort by keys, but I imagine there is actually some intelligible order to the JSON that is best to preserve if possible
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#3211 (comment)
|
On my system (and on the TravisCI systems), Pandoc outputs a predictable JSON:
However, I've got some reports * that this is not always the case:
In both cases we had the latest Pandoc version:
Is this a bug, or is there no guarantee about the order of the elements in a map? I monkey-patched panflute to work around different orders at the first level, but doing the same for all the "t" and "c" items would be a bit messy..
Thanks,
S
The text was updated successfully, but these errors were encountered: