-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cnode improvements and fixes #8
base: master
Are you sure you want to change the base?
Conversation
- Refactor tree building to not be recursive, so it doesn't segfault because of running out of stack space on really big files - Fix Cnode crashing on `<!---->` due to myhtml_node_text returning a null pointer - Fix Cnode benchmarks Cnode benchmarks on master: ``` benchmark name iterations average time wikipedia_hyperlink.html 97k 200 7817.54 µs/op w3c_html5.html 131k 100 13872.61 µs/op github_trending_js.html 341k 50 30911.76 µs/op ``` This branch: ``` benchmark name iterations average time wikipedia_hyperlink.html 97k 500 7045.92 µs/op w3c_html5.html 131k 200 10438.43 µs/op github_trending_js.html 341k 100 23686.21 µs/op ```
c_src/myhtml_worker.c
Outdated
if (tag_ns != MyHTML_NAMESPACE_HTML) | ||
{ | ||
// tag_ns_name_ptr is unmodifyable, copy it in our tag_ns_buffer to make it modifyable. | ||
tag_ns_buffer = malloc(tag_ns_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're in a new scope, so you can use a VLA. You also want to add a second byte for the nul terminator.
char tag_ns_buffer[tag_ns_len + tag_name_len + 2];
snprintf(tag_ns_buffer, sizeof tag_ns_buffer, "%s:%s", tag_ns_name_ptr, tag_name);
lowercase(tag_ns_buffer);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also want to add a second byte for the nul terminator.
Erlang binary strings are not null-terminated. Still had to add a second byte because snprintf
always inserts a null-byte in the end, please tell if there is a way to make it not do that/if I should just use multiple strcpy
s that were there before
Hello everybody, thank you for submitting the patches and engaging in the conversation. I will review those up until the end of this week. In general, I am very happy to see those improvements, and yes, my C is sadly not where it should be. |
would you be interested in further refactoring? |
while we are here, we should replace all the |
7a6372a
to
395d292
Compare
Done |
Not sure if this is directed at me or at @Overbryd , but I am up. |
@rinpatch highly appreciate the efforts. Would you be up for a call sometime this or next week? You can reach me at my Email address for that purpose, I would like to go over the changes and do a full review once you are done. I think its faster and we can have a better discussion that way. Other people are invited to join the call too of course. |
To be honest, calls stress me out, I would much rather communicate over text. You can reach me over IMs if you want faster discussion though, I am |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully endorse the main two changes here:
- using a stack to build the tree, rather than recursive function calls
I now understand this should be safer and faster by resizing the stack for 30 ETERMS at once.
- Prevent empty comments from blowing up
I kindly ask for one cosmetic change, to split headers/implementation from tstack.h
.
ETERM* tag; | ||
ETERM* attrs; | ||
tstack stack; | ||
tstack_init(&stack, 30); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through this change with a fellow programmer of mine, and now after following through and understanding, I think its wayyy smarter to use a stack here, than my previous implementation.
Makes perfect sense and should be much safer.
size_t size; | ||
} tstack; | ||
|
||
void tstack_init(tstack *stack, size_t initial_size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move the implementation of each function to c_src/tstack.c
and keep the headers in here?
@@ -1,4 +1,8 @@ | |||
defmodule Myhtmlex.SafeTest do | |||
use MyhtmlexSharedTests, module: Myhtmlex.Safe | |||
|
|||
test "doesn't segfault when <!----> is encountered" do | |||
assert {"html", _attrs, _children} = Myhtmlex.decode("<div> <!----> </div>") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important fix, thank you.
because of running out of stack space on really big files
<!---->
due to myhtml_node_text returning anull pointer
Cnode benchmarks on master:
This branch:
Apologies for stuffing all of this into one commit, I can split it into
separate patches if you don't wish some of it merged.