Rework node struct #326

nwellnhof · 2020-01-19T17:34:15Z

Fixes #309.

jgm · 2020-01-20T17:20:41Z

src/node.h

  int marker_offset;
  int padding;
  int start;
-  cmark_delim_type delimiter;
+  unsigned char list_type;


I'm just curious (as a C ignoramus) why this is needed. Does the compiler default to using more than one byte for a cmark_delim_type?

The C standard mandates a single implementation-defined type for enums. Compilers typically use int.

oh, that's good to know.

jgm · 2020-01-20T17:21:52Z

This looks good to me!

Query: does anything in this affect the public API?

nwellnhof · 2020-01-20T17:31:54Z

Query: does anything in this affect the public API?

Yes, the addition of malloc in cmark_mem in the last commit. But this only affects people who use custom allocators which seems like a rarely used feature.

jgm · 2020-01-20T17:48:01Z

OK, can you put something very prominent in the commit message so I'll remember to highlight this API change in the next release?

Use zero-terminated C strings instead of cmark_chunks without storing the length. The length of code literals will be readded in a later commit. strlen overhead for code info should be negligible. Reduces size of struct cmark_node by 8 bytes.

Use zero-terminated C strings instead of cmark_chunks without storing the length. This introduces a few additional strlen computations, but overhead should be low. Allows to reduce size of struct cmark_node later.

Reduces size of struct cmark_node by 8 bytes.

Use zero-terminated C strings and a separate length field instead of cmark_chunks. Literal inline text will now be copied from the parent block's content buffer, slowing the benchmark down by 10-15%. The node struct never references memory of other nodes now, fixing commonmark#309. Node accessors don't have to check for delayed creation of C strings, so parsing and iterating all literals using the public API should actually be faster than before.

Fix another place where an "allocated" cmark_chunk was used.

Allows to reduce size of struct cmark_node later.

Introduce multi-purpose data/len members in struct cmark_node. This is mainly used to store literal text for inlines, code and HTML blocks. Move the content strbuf for blocks from cmark_node to cmark_parser. When finalizing nodes that allow inlines (paragraphs and headings), detach the strbuf and store the block content in the node's data/len members. Free the block content after processing inlines. Reduces size of struct cmark_node by 8 bytes.

nwellnhof · 2020-01-23T11:46:34Z

I changed the pull request to use realloc instead of malloc so now there aren't any changes to the public API.

jgm · 2020-01-23T16:26:05Z

Excellent, thanks.

jgm · 2020-01-24T18:44:16Z

We're getting a new fuzzing error:

  | AddressSanitizer:DEADLYSIGNAL
-- | --
  | =================================================================
  | ==1==ERROR: AddressSanitizer: SEGV on unknown address 0x00000000 (pc 0x081024b4 bp 0xff98d2e8 sp 0xff98ceb0 T0)
  | ==1==The signal is caused by a READ memory access.
  | ==1==Hint: address points to the zero page.
  | #0 0x81024b4 in __interceptor_strcmp /src/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc:443:25
  | #1 0x82478b1 in is_autolink cmark/src/commonmark.c:149:10
  | #2 0x82478b1 in S_render_node cmark/src/commonmark.c:420:9
  | #3 0x824ff5e in cmark_render cmark/src/render.c:172:10
  | #4 0x8244a94 in cmark_render_commonmark cmark/src/commonmark.c:479:10
  | #5 0x819b608 in LLVMFuzzerTestOneInput cmark/test/cmark-fuzz.c:24:10
  | #6 0x809fe66 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned int) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:556:15
  | #7 0x808c313 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned int) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:292:6
  | #8 0x8091a18 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned int)) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:774:9
  | #9 0x80b6827 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:19:10
  | #10 0xf7cd5636 in __libc_start_main
  | #11 0x80672e8 in _start

Since it just popped up, I suspect it has to do with these changes.
Can you investigate?

nwellnhof · 2020-01-25T10:03:47Z

Should be fixed with #329.

If it's OK, could you add me to the OSS-Fuzz auto_ccs?

jgm · 2020-01-25T17:48:22Z

I'd be happy to add you, but I can't figure out how!

nwellnhof · 2020-01-25T18:04:47Z

You'd have to submit a pull request to https://github.com/google/oss-fuzz, adding me to auto_ccs here: https://github.com/google/oss-fuzz/blob/master/projects/cmark/project.yaml. But I can submit the pull request myself and mention this thread.

jgm · 2020-01-25T19:28:40Z

Sure, go ahead!

@jgm

Approved by @jgm here: commonmark/cmark#326 (comment)

@jgm

Approved by @jgm here: commonmark/cmark#326 (comment)

…brackets-overflow Fix bug in fuzz harness

jgm reviewed Jan 20, 2020

View reviewed changes

nwellnhof added 8 commits January 23, 2020 12:37

Helper function to set C strings in nodes

990bd94

Use C string instead of chunk for link URL and title

d52d52f

Use zero-terminated C strings instead of cmark_chunks without storing the length. This introduces a few additional strlen computations, but overhead should be low. Allows to reduce size of struct cmark_node later.

Use C string instead of chunk for custom block contents

accc7e9

Reduces size of struct cmark_node by 8 bytes.

Use C string instead of chunk in renderer

0f61fdb

Fix another place where an "allocated" cmark_chunk was used.

Improve packing of struct cmark_list

5bb0931

Allows to reduce size of struct cmark_node later.

nwellnhof force-pushed the rework-node-struct branch from 42ff47c to 30c3095 Compare January 23, 2020 11:42

jgm merged commit f3f50b2 into commonmark:master Jan 23, 2020

nwellnhof added a commit to nwellnhof/oss-fuzz that referenced this pull request Jan 27, 2020

[cmark] Add myself to auto_ccs

673f6c4

Approved by @jgm here: commonmark/cmark#326 (comment)

nwellnhof mentioned this pull request Jan 27, 2020

[cmark] Add myself to auto_ccs google/oss-fuzz#3296

Merged

jonathanmetzman pushed a commit to google/oss-fuzz that referenced this pull request Jan 27, 2020

[cmark] Add myself to auto_ccs (#3296)

01d2f67

Approved by @jgm here: commonmark/cmark#326 (comment)

nwellnhof deleted the rework-node-struct branch August 24, 2020 15:49

kivikakk mentioned this pull request Mar 17, 2021

Make AST node content public outside of the crate kivikakk/comrak#175

Closed

ee7 mentioned this pull request Jun 30, 2022

Extremely high memory consumption #303

Closed

QuietMisdreavus pushed a commit to swiftlang/swift-cmark that referenced this pull request Apr 6, 2023

Merge pull request commonmark#326 from kevinbackhouse/fuzz-quadratic-…

ef63acf

…brackets-overflow Fix bug in fuzz harness

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework node struct #326

Rework node struct #326

nwellnhof commented Jan 19, 2020

jgm Jan 20, 2020

nwellnhof Jan 20, 2020

jgm Jan 20, 2020

jgm commented Jan 20, 2020

nwellnhof commented Jan 20, 2020

jgm commented Jan 20, 2020

nwellnhof commented Jan 23, 2020

jgm commented Jan 23, 2020

jgm commented Jan 24, 2020

nwellnhof commented Jan 25, 2020

jgm commented Jan 25, 2020

nwellnhof commented Jan 25, 2020

jgm commented Jan 25, 2020

Rework node struct #326

Rework node struct #326

Conversation

nwellnhof commented Jan 19, 2020

jgm Jan 20, 2020

Choose a reason for hiding this comment

nwellnhof Jan 20, 2020

Choose a reason for hiding this comment

jgm Jan 20, 2020

Choose a reason for hiding this comment

jgm commented Jan 20, 2020

nwellnhof commented Jan 20, 2020

jgm commented Jan 20, 2020

nwellnhof commented Jan 23, 2020

jgm commented Jan 23, 2020

jgm commented Jan 24, 2020

nwellnhof commented Jan 25, 2020

jgm commented Jan 25, 2020

nwellnhof commented Jan 25, 2020

jgm commented Jan 25, 2020