Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to graphql-parser #138

Open
tailhook opened this issue Feb 5, 2018 · 15 comments
Open

Switch to graphql-parser #138

tailhook opened this issue Feb 5, 2018 · 15 comments

Comments

@tailhook
Copy link
Contributor

tailhook commented Feb 5, 2018

Hi,

Since there is no activity in #32 I've decided to write a separate crate for graphql-parser.

It's not based on code here but built using excellent combine crate (making the grammar more readable in my opinion). So in a weekend project I could build the parser for the whole graphql grammar. Which means it should be on par with this crate for features (I've not checked every feature, but we also have subscriptions and triple-quoted strings already implemented).

What do you think about integrating with the graphql-parser crate instead of keeping custom parser?

@Keats
Copy link

Keats commented Feb 6, 2018

That looks much better than my half-baked Pest parser (although I still prefer pest to combine but that's not important)!

It would be really neat for the Rust ecosystem to standardise on a single parser so at least every crate wanting to do something with GraphQL doesn't have to rewrite it and it has many eyes on it.

@theduke
Copy link
Member

theduke commented Feb 6, 2018

@tailhook I'll look over the code, but in principle, as long as the AST produced is decent and can easily be converted (without much cloning) I'd definitely be in favour.

This also concers https://github.com/nrc/graphql and @nrc.
We've already talked about sharing fundamental infrastructure like the parser.

We'd also need the parser to handle parsing the GraphQL schema definition language.

@theduke
Copy link
Member

theduke commented Feb 6, 2018

@tailhook also, it's important to have great error reporting for parsing errors and to be able to map each ast node to the query location.

@tailhook
Copy link
Contributor Author

tailhook commented Feb 6, 2018

This also concers https://github.com/nrc/graphql and @nrc.
We've already talked about sharing fundamental infrastructure like the parser.

Sure, here is the issue: nrc/graphql#36

it's important to have great error reporting for parsing errors

I think it's good enough. Of course, we have positions of all the errors. And "unexpected A, expected B, C". I'm not sure what else to say. Perhaps, there aren't much test for error reporting yet.

Currently, we don't have decent API for the error besides: format me a string, but it should be easy to add just need to figure out how it might look like.

and to be able to map each ast node to the query location.

We have positions for all important elements. The approach is different that one here, but we can discuss details.

We'd also need the parser to handle parsing the GraphQL schema definition language.

Surely, on a todo list. Does seem like another weekend project in it's size. The good thing that tokenizer may be reused entirely.

@theduke
Copy link
Member

theduke commented Feb 6, 2018

@tailhook all sounds great, I'll look over the code!

Would you be interested in moving the parser into the graphql-rust organization?

It would be cool to have all GraphQL related stuff under one hood, and it would ensure that someone else can maintain the parser if you need a break or whatever.

That's not in any way a requirement for me, though.

ps: I've written a demo parser in combine too, I think it's a good choice as in the the current parser combinator ecosystem.

@tailhook
Copy link
Contributor Author

tailhook commented Feb 6, 2018

Would you be interested in moving the parser into the graphql-rust organization?

I'm fine with that. Should it be done now?

@tailhook
Copy link
Contributor Author

SDL is implemented and pushed to crates.io as v0.2.0

@Keats
Copy link

Keats commented Feb 26, 2018

Is there anything blocking that?

@theduke
Copy link
Member

theduke commented Feb 26, 2018

@Keats I haven't had time to look over the code.

We will probably want to have a custom-tailored ast for Juniper.
I'm perfectly fine with getting an AST from the parser and mapping over it to get the custom one.
It should be as efficient as possible to do so, though.

Especially wrt tokio, where sharing the AST between possible lots of resolvers (futures) should be as cheap as possible.

@mhallin had some concerns, so we probably want his input here.

@mhallin
Copy link
Member

mhallin commented Feb 26, 2018

Hello, and thanks for bringing this up to the table!

While the parser part looks good, I've got some reservations for giving up control over the AST data structures to a general-purpose library. The issue I see is that we won't be able to model the data the way that suits a server application best.

For example, our AST of today uses Cow<&str> references everywhere into the original query source to avoid allocating tons of very small strings. This works today because the main execute function executes the entire query and returns when everything has been resolved. To support async code, where the AST needs to outlive the scope of the execute call, our probable AST of tomorrow has a special string type that works like a &str with a reference counted parent string.

I believe you could switch over graphql-parser to use this code, or build even better versions of all of this, but my reluctance comes from the fact that we now can do that without having to think about other use cases than ours.

@tailhook
Copy link
Contributor Author

Well, AST in graphql-parser is currently owned, so works fine for async code right now.

Then we can try to benchmark and optimize things. Usually, rust/jemalloc is very good, so allocations themselves are rarely problem. Given that request processing does lots of validation and probably has a bunch of hashmaps the overhead of allocations may be amortized. Also if serde_json can afford owned strings I think graphql library can too.

Another point of view is that size of Spanning<SharedStr> is 72 bytes. I would rather try some kind of "inline string" library instead of that, given that most strings like type names and field names are much smaller than that. (But I'm not sure there is established "inline string" library in the community)

@theduke
Copy link
Member

theduke commented Feb 26, 2018

IMO it's fine if we start from what's there and optimize later, especially if @tailhook is fine with moving the repo into the graphql-rust org.

I also agree that the AST is probably a minuscule part of the work that happens when executing a resolver.

But:
Let's consider this relatively straight-forward query, executed with a traditional relational database under the hood, and something similar to dataloader:

query {
    users(page: 2, pagesize: 100) {
        id
        username
        posts(max: 20) {
          name
          comments(limit: 5) {
                id
                title
          }
        }
        likes(max: 50) {
            post {
                id
                title
            }
            time
        }
    }
}

Let's assume that the posts, posts.comments, likes, likes.posts queries are run as futures.

That means:
users: 1 + 100 clones
posts: 1 + 100
posts.comments: 100 * 20
likes: 1 + 100
likes.post: 100 * 50

Now let's think about 100 of those queries running concurrently...
That's a lot of cloning of at least nested parts of the AST.
Execution of the futures can be batched quite easily, but the creation of the futures much less so, and those futures will need to know the query.

So I think this can become a significant overhead.

There are a lot of possible optimizations, like just unsafely passing around references to the full AST, using an Arc of the initial query string like mhallin has tried, using some kind of per-query interner, ...

But it's something that we'll need to think about.

@tailhook
Copy link
Contributor Author

Execution of the futures can be batched quite easily, but the creation of the futures much less so, and those futures will need to know the query.

I expect this to work by wrapping an Arc around a part of query. I think current SelectionSet which is basically a Vec<Selection> will be more like Arc<FnvHashSet<Field>> where fragments are already populated. So each resolver doesn't need to repeat fragment logic, and subselections can be referred to on their own.

I.e. the AST after validation/optimization is a bit different from what we have right after parsing anyway.

@theduke
Copy link
Member

theduke commented Apr 9, 2019

Some progress: graphql-rust/graphql-parser#19

@theduke theduke changed the title Use graphql-parser for parsing query language Switch to graphql-parser May 16, 2019
@tosti007
Copy link

@tailhook considering this issue was opened more than 2 years ago. Would the progress on graphql-parser be sufficient to fully replace the the current parser? I.e. what is the current status of this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants