Improve performance of inlines parsing of large content #107

nfejzic · 2023-09-24T19:14:43Z

When constructing TokenResolver we consume all tokens lexed by inlines lexer into a single Vec. This causes a very large memory allocation when parsing inlines in large content.

unimarkup-rs/inline/src/lexer/resolver/mod.rs

Lines 75 to 84 in bcdd1ef

    
           pub(crate) fn new(iter: TokenIterator<'token>) -> Self { 
        
               let mut new = Self { 
        
                   curr_scope: 0, 
        
                   interrupted: Vec::default(), 
        
                   tokens: iter.map(RawToken::new).collect(), 
        
               }; 
        
               new.resolve(); 
        
               new 
        
           }

This should not happen very often (we're speaking of paragraph with hundreds of lines), but should be solved regardless.

One possible solution for this problem is to use tape-like data structure. The idea is, instead of directly using a vector of all tokens, use something like VecDeque and consume only those tokens that are necessary in order to resolve the first next token that should be returned by the TokenIterator.

The text was updated successfully, but these errors were encountered:

nfejzic self-assigned this Sep 24, 2023

nfejzic added the enhancement label Sep 24, 2023

mhatzl removed the enhancement label Oct 2, 2023

nfejzic mentioned this issue Oct 8, 2023

fix: avoid collecting iterator during token resolving #112

Merged

mhatzl closed this as completed in #112 Oct 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of inlines parsing of large content #107

Improve performance of inlines parsing of large content #107

nfejzic commented Sep 24, 2023

Improve performance of inlines parsing of large content #107

Improve performance of inlines parsing of large content #107

Comments

nfejzic commented Sep 24, 2023