Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework node struct #326

Merged
merged 8 commits into from
Jan 23, 2020
Merged

Rework node struct #326

merged 8 commits into from
Jan 23, 2020

Conversation

nwellnhof
Copy link
Contributor

Fixes #309.

int marker_offset;
int padding;
int start;
cmark_delim_type delimiter;
unsigned char list_type;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just curious (as a C ignoramus) why this is needed. Does the compiler default to using more than one byte for a cmark_delim_type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C standard mandates a single implementation-defined type for enums. Compilers typically use int.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, that's good to know.

@jgm
Copy link
Member

jgm commented Jan 20, 2020

This looks good to me!

Query: does anything in this affect the public API?

@nwellnhof
Copy link
Contributor Author

Query: does anything in this affect the public API?

Yes, the addition of malloc in cmark_mem in the last commit. But this only affects people who use custom allocators which seems like a rarely used feature.

@jgm
Copy link
Member

jgm commented Jan 20, 2020

OK, can you put something very prominent in the commit message so I'll remember to highlight this API change in the next release?

Use zero-terminated C strings instead of cmark_chunks without storing
the length. The length of code literals will be readded in a later
commit. strlen overhead for code info should be negligible.

Reduces size of struct cmark_node by 8 bytes.
Use zero-terminated C strings instead of cmark_chunks without storing
the length. This introduces a few additional strlen computations,
but overhead should be low.

Allows to reduce size of struct cmark_node later.
Reduces size of struct cmark_node by 8 bytes.
Use zero-terminated C strings and a separate length field instead of
cmark_chunks. Literal inline text will now be copied from the parent
block's content buffer, slowing the benchmark down by 10-15%.

The node struct never references memory of other nodes now, fixing commonmark#309.
Node accessors don't have to check for delayed creation of C strings,
so parsing and iterating all literals using the public API should
actually be faster than before.
Fix another place where an "allocated" cmark_chunk was used.
Allows to reduce size of struct cmark_node later.
Introduce multi-purpose data/len members in struct cmark_node. This
is mainly used to store literal text for inlines, code and HTML blocks.

Move the content strbuf for blocks from cmark_node to cmark_parser.
When finalizing nodes that allow inlines (paragraphs and headings),
detach the strbuf and store the block content in the node's data/len
members. Free the block content after processing inlines.

Reduces size of struct cmark_node by 8 bytes.
@nwellnhof
Copy link
Contributor Author

I changed the pull request to use realloc instead of malloc so now there aren't any changes to the public API.

@jgm jgm merged commit f3f50b2 into commonmark:master Jan 23, 2020
@jgm
Copy link
Member

jgm commented Jan 23, 2020

Excellent, thanks.

@jgm
Copy link
Member

jgm commented Jan 24, 2020

We're getting a new fuzzing error:

  | AddressSanitizer:DEADLYSIGNAL
-- | --
  | =================================================================
  | ==1==ERROR: AddressSanitizer: SEGV on unknown address 0x00000000 (pc 0x081024b4 bp 0xff98d2e8 sp 0xff98ceb0 T0)
  | ==1==The signal is caused by a READ memory access.
  | ==1==Hint: address points to the zero page.
  | #0 0x81024b4 in __interceptor_strcmp /src/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc:443:25
  | #1 0x82478b1 in is_autolink cmark/src/commonmark.c:149:10
  | #2 0x82478b1 in S_render_node cmark/src/commonmark.c:420:9
  | #3 0x824ff5e in cmark_render cmark/src/render.c:172:10
  | #4 0x8244a94 in cmark_render_commonmark cmark/src/commonmark.c:479:10
  | #5 0x819b608 in LLVMFuzzerTestOneInput cmark/test/cmark-fuzz.c:24:10
  | #6 0x809fe66 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned int) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:556:15
  | #7 0x808c313 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned int) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:292:6
  | #8 0x8091a18 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned int)) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:774:9
  | #9 0x80b6827 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:19:10
  | #10 0xf7cd5636 in __libc_start_main
  | #11 0x80672e8 in _start

Since it just popped up, I suspect it has to do with these changes.
Can you investigate?

@nwellnhof
Copy link
Contributor Author

Should be fixed with #329.

If it's OK, could you add me to the OSS-Fuzz auto_ccs?

@jgm
Copy link
Member

jgm commented Jan 25, 2020

I'd be happy to add you, but I can't figure out how!

@nwellnhof
Copy link
Contributor Author

You'd have to submit a pull request to https://github.com/google/oss-fuzz, adding me to auto_ccs here: https://github.com/google/oss-fuzz/blob/master/projects/cmark/project.yaml. But I can submit the pull request myself and mention this thread.

@jgm
Copy link
Member

jgm commented Jan 25, 2020

Sure, go ahead!

nwellnhof added a commit to nwellnhof/oss-fuzz that referenced this pull request Jan 27, 2020
jonathanmetzman pushed a commit to google/oss-fuzz that referenced this pull request Jan 27, 2020
@nwellnhof nwellnhof deleted the rework-node-struct branch August 24, 2020 15:49
QuietMisdreavus pushed a commit to swiftlang/swift-cmark that referenced this pull request Apr 6, 2023
…brackets-overflow

Fix bug in fuzz harness
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inline nodes can reference text data of parent block
2 participants