Skip to content

Commit

Permalink
[refac] rewrite parser based on events, rewrite filtering
Browse files Browse the repository at this point in the history
This is a giant rewrite, carried out over several months.

[wip] separating the filter code to a different class
wip
wip
wip
filter single quoted is working
refactor to filter processor wip
double quoted wip
double quoted wip
double quoted wip
double quoted wip
double quoted wip
double quoted wip
double quoted wip
double quoted seems to be working
double quoted wip
double quoted wip
double quoted wip
double quoted wip
double quoted wip
double quoted wip
double quoted working!
filter plain scalar wip
wip
filter plain scalar wip
wip
test filter processors
fix write in inplace::translate_esc
block literal wip
block literal wip
block literal wip
block literal wip
block literal wip
block literal wip
block literal working!
filter block folded wip
filter block folded wip
cleanup filter
filter locations are needed only for double quoted scalars
add FilterResult to encapsulate validity
prepare filter for using in parser
in-parser filtering wip
filter empty block literals
filter block folded ok
all filters working
moving filters to parse wip
fix block_folded
fixing block folded WIP
new filter: all tests passing!
fix sanitizer issues
refactor: harmonize parser filtering function names
wip ci fixes
coverage wip
filter arena no longer needed
double quoted filter wip
fix wip
fix wip
fix wip
wip: inplace mid-extending vs end-extending
all tests ok
wip
wip
wip2
wip
wip
wip doc
wip doc
wip anchor
fix newlines in emit of docs
wip ref
wip new parser
wip new parser
wip new parser
fix
wip new parser
wip new parser
wip new parser
wip new parser
wip new parser: tag directives
wip new parser: tag resolving
wip new parser: more sink edge cases
wip new parser: key containers working in the sink
prepare event sink stack
tree parse wip
cleanup event sink
tree parse wip
tree parse wip
tree parse wip
tree parse wip: now parsing simple flow seqs!
new parser wip: flow seqs: added anchor/ref parsing
new parser wip: seq flow goes on while there is a seq flow
new parser wip: seqimap events
new parser wip: seqimap parsing
new parser wip: now parsing flow maps!
wip
wip
new parser wip: block seqs wip
new parser wip: block maps wip
wip
wip
wip
map anchors ok
tags wip
anchors and tags now working
add tests for container keys
structure wip
key containers: working in events from yaml!
wip
wip
docs wip
qmrk wip
qmrk seq blck
qmrk wip
fix seqimap again
qmrk with tags
doc wip
doc wip
doc wip
doc wip
doc wip
doc wip
remove old parsing functions
fix
wip buffered events for container keys
ditto
ditto
ditto
ditto
container keys seem to be working
report error for container keys
flow key containers inside qmrk
remove unused functions
remove more unused functions
comments
wip
comments wip
wip
wip
wip
wip
most tests working
fix more tests
wip: refactor parser to not depend on tree
ditto
remove include dependencies
parser: do not use tree directly
fixes
fix annotations when starting child maps
more fixes
more fixes
more fixes
more fixes
block scalars
block scalars
fixes to scalars
wip
wip
wip
wip
add error location checks
wip
wip
sudden docs
sudden docs wip
sudden docs in block map/seq
first test cases for simple seq are working!
fixing test cases WIP
mark doc only on explicit docs or stream children
more progress
wip
wip
fixing indentless seqs wip
simple seqs are working!
nested_seqx2 working!
disable all un-refactored tests
fix empty_seq
fix empty map/file
empty scalar wip
fix empty scalars
fix test number
fix null vals and empty scalars
fix nested seq
map wip
map wip
fix maps!
fix nested maps!
fix map of seq
fix seq of map
fix sets
explicit key WIP
explicit key WIP
explicit key WIP
explicit key WIP
explicit keys working!
fix regressions
fix generic map seq tests
docs WIP
docs + indentation wip
remove unused functions
fix regressions
rename test_new_parser to test_parser_engine
docs working!
fix json
fix scalar names
anchors wip
anchors wip
anchors wip
anchors mostly working
anchors WIP
anchors/refs working!
move test lib files to a separate folder
tags wip
simple seq
simple seq
tag wip
tags working!
rename TestCase->TestCaseNode, into separate files
remove empty var
fix indentation
fix github_issues
fix github issues
single quoted wip
single quoted wip
single quoted is working!
double quoted wip
double quoted wip
fix plain scalar emit
literal scalar wip
literal scalar wip
literal scalar wip
literal scalar wip
literal scalar wip
move tags to separate source files
minor cleanup
block literal wip
block literal wip
add json parser
update benchmarks
improve json
fix compilation in clang
fix bm_emit
block literal wip
block literal wip
block literal wip
reference resolver
block literal wip
block literal working!
fix regressions
block folded wip
block folded wip
block folded wip
block folded wip
block folded wip
block folded wip
block folded wip
block folded wip: indented blocks
block folded wip
block folded wip
block folded wip
block folded working!
plain scalar wip
plain scalar wip
plain scalar working!
style wip
style wip
style wip
style wip
style WIP
scalar style wip
scalar style ok
fix regression of scalar plain
fix regression of double quoted wip
block literal wip (old)
double quoted wip
fix regression in double quoted
fix merge
add tests for merge
fix merge wip
fix vs compilation wip
parse overloads wip
parse overloads wip
parse overloads
fix merge for styles
fixes to quickstart wip
enable serialize test
improve test merge
fix test serialize
test tree wip
fix locations
test tree wip
test parser wip
fix test for yaml events (from tree)
refactor yaml event tests to use parameterized tests
event tests: use the scalar style information from the tree
event tests: use the container style information from the tree
event tests: working both from parser and tree
improve tag errors
fix tags wip
fix tags
fix bm
fix bm
fix test parser
fix tree wip
fix quickstart wip
fix test tree wip
fix some valgrind warnings
fix quickstart wip
fix tree & quickstart wip
fix docmaps with keyref as the first child
fix parsing into existing nodes
fix quickstart!
more fixes (~regressions from quickstart)
fix tool tests
fix test suite wip
fix test suite wip @215/1633
fix test suite wip @152/1633 91%
disable tests with container keys: 96/1633  94%
test suite wip
test suite parse: update missing errors
fix parsing of scalars starting with ?
fix skipping of whitespace in flow mode 47/1633 97%
fix missing anchor 45/1633 97%
fix neutral tag resolve 43/1633 97%
fix parse of yaml events 39/1633 98%
fix tags normalization 50/1633 97%
fix tags normalization 38/1633 98%
fix scalar with trailing colon : 36/1633 98%
exempt more missing errors. 32/1633 98%
30/1633 98%
22/1633 99%
18/1633 99%
backspace in dquo. 16/1633 99%
8/1633 99%
7/1633 99%
6/1633 99%
3/1633 99%
100% pass!
adding events parser to test suite and events tool
sneaky block container keys WIP
cleanup yaml-events
fix warning
wip
fix block key containers
test suite: fix event emitting WIP
100% tests pass!
fix missing doc UKK6
test suite: add tests comparing reference events and emitted events WIP
test suite: fix comparison of emitted events
100% test pass
enable tests for key containers. 100% pass!
enable error tests for event emitter. 100% pass!
update test suite exclusions
  • Loading branch information
biojppm committed Mar 26, 2024
1 parent 508f4c5 commit 6ebef32
Show file tree
Hide file tree
Showing 105 changed files with 31,824 additions and 13,416 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ build/
install/
.python-version
compile_commands.json
launch.json

# test files
/Testing/
Expand Down
12 changes: 12 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -36,17 +36,27 @@ c4_add_library(ryml
c4/yml/common.cpp
c4/yml/emit.def.hpp
c4/yml/emit.hpp
c4/yml/event_handler_tree.hpp
c4/yml/filter_processor.hpp
c4/yml/export.hpp
c4/yml/node.hpp
c4/yml/node.cpp
c4/yml/node_type.hpp
c4/yml/node_type.cpp
c4/yml/parse.hpp
c4/yml/parse.cpp
c4/yml/parse_engine.hpp
c4/yml/parse_engine.def.hpp
c4/yml/preprocess.hpp
c4/yml/preprocess.cpp
c4/yml/reference_resolver.hpp
c4/yml/reference_resolver.cpp
c4/yml/std/map.hpp
c4/yml/std/std.hpp
c4/yml/std/string.hpp
c4/yml/std/vector.hpp
c4/yml/tag.hpp
c4/yml/tag.cpp
c4/yml/tree.hpp
c4/yml/tree.cpp
c4/yml/writer.hpp
Expand All @@ -60,6 +70,8 @@ c4_add_library(ryml
INCORPORATE c4core
)

set_property(TARGET ryml PROPERTY CXX_STANDARD 17) # TO BE REMOVED!!!!!!!

if(RYML_WITH_TAB_TOKENS)
target_compile_definitions(ryml PUBLIC RYML_WITH_TAB_TOKENS)
endif()
Expand Down
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1008,7 +1008,6 @@ following situations:
requirement exists because checking for tabs introduces branching
into the parser's hot code and in some cases costs as much as 10%
in parsing time.
* Anchor names must not end with a terminating colon: eg `&anchor: key: val`.
* Non-unique map keys are allowed. Enforcing key uniqueness in the
parser or in the tree would cause log-linear parsing complexity (for
root children on a mostly flat tree), and would increase code size
Expand Down
2 changes: 1 addition & 1 deletion bm/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ function(ryml_add_bm_comparison_case target name case_file)
get_filename_component(case "${case_file}" NAME_WE) # case identifier
get_filename_component(ext "${case_file}" EXT) # prevent json readers from reading yml data
if(NOT ("${ext}" STREQUAL ".json"))
set(filter_json "yml|yaml")
set(filter_json "ryml_yaml|yaml")
endif()
c4_add_target_benchmark(${target} ${case}
FILTER "${filter_json}"
Expand Down
3 changes: 2 additions & 1 deletion bm/bm_common.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,8 @@ struct BmCase
c4::csubstr filename;
std::vector<char> src;
std::vector<char> in_place;
ryml::Parser ryml_parser;
ryml::EventHandlerTree ryml_evt_handler;
ryml::Parser ryml_parser{&ryml_evt_handler};
ryml::Tree ryml_tree;
bool is_json;
rapidjson::Document rapidjson_doc;
Expand Down
54 changes: 48 additions & 6 deletions bm/bm_emit.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ void bm_yamlcpp(bm::State& st)
s_bm_case->report(st);
}

void bm_ryml_ostream(bm::State& st)
void bm_ryml_yaml_ostream(bm::State& st)
{
c4::csubstr src = c4::to_csubstr(s_bm_case->src).trimr('\0');
ryml::Tree tree = ryml::parse_in_arena(s_bm_case->filename, src);
Expand All @@ -191,7 +191,21 @@ void bm_ryml_ostream(bm::State& st)
s_bm_case->report(st);
}

void bm_ryml_str(bm::State& st)
void bm_ryml_yaml_json_ostream(bm::State& st)
{
c4::csubstr src = c4::to_csubstr(s_bm_case->src).trimr('\0');
ryml::Tree tree = ryml::parse_in_arena(s_bm_case->filename, src);
std::string str;
std::ostringstream os;
for(auto _ : st)
{
os << ryml::as_json(tree);
str = os.str();
}
s_bm_case->report(st);
}

void bm_ryml_yaml_str(bm::State& st)
{
c4::csubstr src = c4::to_csubstr(s_bm_case->src).trimr('\0');
ryml::Tree tree = ryml::parse_in_arena(s_bm_case->filename, src);
Expand All @@ -203,7 +217,19 @@ void bm_ryml_str(bm::State& st)
s_bm_case->report(st);
}

void bm_ryml_str_reserve(bm::State& st)
void bm_ryml_yaml_json_str(bm::State& st)
{
c4::csubstr src = c4::to_csubstr(s_bm_case->src).trimr('\0');
ryml::Tree tree = ryml::parse_in_arena(s_bm_case->filename, src);
std::string str;
for(auto _ : st)
{
emitrs_json(tree, &str);
}
s_bm_case->report(st);
}

void bm_ryml_yaml_str_reserve(bm::State& st)
{
c4::csubstr src = c4::to_csubstr(s_bm_case->src).trimr('\0');
std::string str;
Expand All @@ -216,9 +242,25 @@ void bm_ryml_str_reserve(bm::State& st)
s_bm_case->report(st);
}

BENCHMARK(bm_ryml_str_reserve);
BENCHMARK(bm_ryml_str);
BENCHMARK(bm_ryml_ostream);
void bm_ryml_yaml_json_str_reserve(bm::State& st)
{
c4::csubstr src = c4::to_csubstr(s_bm_case->src).trimr('\0');
std::string str;
str.resize(2 * src.size());
ryml::Tree tree = ryml::parse_in_arena(s_bm_case->filename, src);
for(auto _ : st)
{
emitrs_json(tree, &str);
}
s_bm_case->report(st);
}

BENCHMARK(bm_ryml_yaml_str_reserve);
BENCHMARK(bm_ryml_yaml_json_str_reserve);
BENCHMARK(bm_ryml_yaml_str);
BENCHMARK(bm_ryml_yaml_json_str);
BENCHMARK(bm_ryml_yaml_ostream);
BENCHMARK(bm_ryml_yaml_json_ostream);
#ifdef RYML_HAVE_LIBFYAML
BENCHMARK(bm_fyaml_str_reserve);
BENCHMARK(bm_fyaml_str);
Expand Down
71 changes: 61 additions & 10 deletions bm/bm_parse.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ void bm_libfyaml_arena(bm::State& st)
}
#endif

void bm_ryml_arena(bm::State& st)
void bm_ryml_yaml_arena(bm::State& st)
{
c4::csubstr src = c4::to_csubstr(s_bm_case->src).trimr('\0');
for(auto _ : st)
Expand All @@ -160,7 +160,18 @@ void bm_ryml_arena(bm::State& st)
s_bm_case->report(st);
}

void bm_ryml_inplace(bm::State& st)
void bm_ryml_json_arena(bm::State& st)
{
c4::csubstr src = c4::to_csubstr(s_bm_case->src).trimr('\0');
for(auto _ : st)
{
ONLY_FOR_JSON;
ryml::Tree tree = ryml::parse_json_in_arena(s_bm_case->filename, src);
}
s_bm_case->report(st);
}

void bm_ryml_yaml_inplace(bm::State& st)
{
c4::substr src = c4::to_substr(s_bm_case->in_place).trimr('\0');
for(auto _ : st)
Expand All @@ -171,32 +182,72 @@ void bm_ryml_inplace(bm::State& st)
s_bm_case->report(st);
}

void bm_ryml_arena_reuse(bm::State& st)
void bm_ryml_json_inplace(bm::State& st)
{
c4::substr src = c4::to_substr(s_bm_case->in_place).trimr('\0');
for(auto _ : st)
{
ONLY_FOR_JSON;
s_bm_case->prepare(st, kResetInPlace);
ryml::Tree tree = ryml::parse_json_in_place(s_bm_case->filename, src);
}
s_bm_case->report(st);
}

void bm_ryml_yaml_arena_reuse(bm::State& st)
{
c4::csubstr src = c4::to_csubstr(s_bm_case->src).trimr('\0');
for(auto _ : st)
{
s_bm_case->prepare(st, kClearTree|kClearTreeArena);
s_bm_case->ryml_parser.parse_in_arena(s_bm_case->filename, src, &s_bm_case->ryml_tree);
parse_in_arena(&s_bm_case->ryml_parser, s_bm_case->filename, src, &s_bm_case->ryml_tree);
}
s_bm_case->report(st);
}

void bm_ryml_inplace_reuse(bm::State& st)
void bm_ryml_json_arena_reuse(bm::State& st)
{
c4::csubstr src = c4::to_csubstr(s_bm_case->src).trimr('\0');
for(auto _ : st)
{
ONLY_FOR_JSON;
s_bm_case->prepare(st, kClearTree|kClearTreeArena);
parse_json_in_arena(&s_bm_case->ryml_parser, s_bm_case->filename, src, &s_bm_case->ryml_tree);
}
s_bm_case->report(st);
}

void bm_ryml_yaml_inplace_reuse(bm::State& st)
{
c4::substr src = c4::to_substr(s_bm_case->in_place).trimr('\0');
for(auto _ : st)
{
s_bm_case->prepare(st, kResetInPlace|kClearTree|kClearTreeArena);
s_bm_case->ryml_parser.parse_in_place(s_bm_case->filename, src, &s_bm_case->ryml_tree);
parse_in_place(&s_bm_case->ryml_parser, s_bm_case->filename, src, &s_bm_case->ryml_tree);
}
s_bm_case->report(st);
}

void bm_ryml_json_inplace_reuse(bm::State& st)
{
c4::substr src = c4::to_substr(s_bm_case->in_place).trimr('\0');
for(auto _ : st)
{
ONLY_FOR_JSON;
s_bm_case->prepare(st, kResetInPlace|kClearTree|kClearTreeArena);
parse_json_in_place(&s_bm_case->ryml_parser, s_bm_case->filename, src, &s_bm_case->ryml_tree);
}
s_bm_case->report(st);
}

BENCHMARK(bm_ryml_inplace_reuse);
BENCHMARK(bm_ryml_arena_reuse);
BENCHMARK(bm_ryml_inplace);
BENCHMARK(bm_ryml_arena);
BENCHMARK(bm_ryml_yaml_inplace_reuse);
BENCHMARK(bm_ryml_json_inplace_reuse);
BENCHMARK(bm_ryml_yaml_arena_reuse);
BENCHMARK(bm_ryml_json_arena_reuse);
BENCHMARK(bm_ryml_yaml_inplace);
BENCHMARK(bm_ryml_json_inplace);
BENCHMARK(bm_ryml_yaml_arena);
BENCHMARK(bm_ryml_json_arena);
BENCHMARK(bm_libyaml_arena);
BENCHMARK(bm_libyaml_arena_reuse);
#ifdef RYML_HAVE_LIBFYAML
Expand Down
4 changes: 4 additions & 0 deletions bm/cases/bm-cases.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,14 @@ scalar_block_literal_singleline.yml
scalar_block_folded_multiline.yml
scalar_block_folded_singleline.yml
style_maps_blck_outer1000_inner10.yml
style_maps_flow_outer1000_inner10_json.yml
style_maps_blck_outer1000_inner100.yml
style_maps_flow_outer1000_inner100_json.yml
style_maps_flow_outer1000_inner10.yml
style_maps_flow_outer1000_inner100.yml
style_seqs_blck_outer1000_inner10.yml
style_seqs_blck_outer1000_inner100.yml
style_seqs_flow_outer1000_inner10.yml
style_seqs_flow_outer1000_inner10_json.json
style_seqs_flow_outer1000_inner100.yml
style_seqs_flow_outer1000_inner100_json.json
2 changes: 1 addition & 1 deletion ext/c4core
Loading

0 comments on commit 6ebef32

Please sign in to comment.