Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Swagger #390

Closed
qstokkink opened this issue Dec 20, 2018 · 19 comments
Closed

Use Swagger #390

qstokkink opened this issue Dec 20, 2018 · 19 comments
Labels
priority: low Should be addressed at some point in the future

Comments

@qstokkink
Copy link
Collaborator

qstokkink commented Dec 20, 2018

We should use Swagger for our REST endpoints. See below for the original exploration of this idea.

Old

Nobody wants to write documentation. Instead, we should automatically generate this. Also, nobody likes to do precondition and postcondition checks. We can do both as follows.

First we can scan all the distinct endpoints available by instantiating a RootEndpoint and inspecting its children. We can then automatically generate documentation from a .restapi variable tied to the REST API handler functions (render_POST, render_GET, etc.).

We can attach and generate the .restapi variable to these functions statically by using annotations/wrapper functions. Specifically we should expose:

  • RESTInput, which takes an option, a jsontype and a description argument.
  • RESTOutput, which takes a conditional_lambda and a output_format argument.

RESTInput

The first input a RESTInput takes is the option. This option specifies what option in the request URI is being described. The jsontype then specifies the JSON datatype, which can be a combination of:

  • number/boolean/null
  • string with specified encoding, for example STR_TYPE["UTF-8"] or STR_TYPE["ASCII"]
  • array/object
  • tuple of (jsontype, description) which adds additional documentation to an inner construction

Lastly, each input should have a description. This is not very useful when using RESTInput for runtime precondition checking, but mostly for humans reading generated documentation.

RESTOutput

The RESTOutput takes a conditional_lambda, which is a lambda function taking the input and returning True if its output format should be applied. I have created a function for converting lambda bytecode to human readable form already, so this is a multi-purpose argument (both for checking postconditions programmatically and for generating documentation). Lastly the output_format specifies what the output matches to (or should match to). The output_format follows the same rules as the jsontype for the RESTInput. The description is once again to explain in human terms what the output is used for.

Example

class NetworkEndpoint(resource.Resource):
    """
    This endpoint is responsible for handing all requests regarding the state of the network.
    """

    def __init__(self, session):
        resource.Resource.__init__(self)
        self.session = session

    # Takes no RESTInput
    @RESTOutput(lambda input: input is None,
                {
                    "peers": {
                        (STR_TYPE["BASE64"], "The sha1 of the peer's public key."): {
                            "ip": STR_TYPE["BASE64"],
                            "port": NUMBER_TYPE,
                            "public_key": STR_TYPE["BASE64"],
                            "services": [(STR_TYPE["BASE64"], "The sha1 of the Community's public key.")]
                        }
                    }
                },
                "All of the known peers and their services.")
    def render_GET(self, request):
        return json.dumps({"peers": self.render_peers()})

Would generate the following documentation:



NetworkEndpoint

This endpoint is responsible for handing all requests regarding the state of the network.

Input

OptionTypeDescription

Output

GivenOutputAnnotation
input is None{
    "peers": {
        string (base64):
            "ip": string (base64),
            "port": number,
            "public_key": string (base64),
            "services": [
                string (base64)
            ]
        }
    }
}

The sha1 of the peer's public key.




The sha1 of the Community's public key.





@qstokkink qstokkink added the priority: medium Enhancements, features or exotic bugs label Dec 20, 2018
@ichorid
Copy link
Contributor

ichorid commented Dec 20, 2018

Automating development procedures is always a great idea, especially if it reduces code complexity. However, I wonder how would that fit in our current development process? At this moment, we have REST endpoints documentation and examples embedded in the source files. If we opt to use this, we'll have to remove the stuff from the code. In addition, each pull request will have to get an additional hook for documentation generation. And the documentation will have to be kept out of the main source tree, lest we face the problem of patching pull requests 'on the go'.

How do you see the real place of this in our development stack?

@qstokkink
Copy link
Collaborator Author

At this moment, we have REST endpoints documentation and examples embedded in the source files. If we opt to use this, we'll have to remove the stuff from the code.

We can keep the documentation, but right now that documentation is just for humans. With this new system I hope that we can use the same specification for both (a) generating human documentation and (b) runtime pre- and postcondition violations. The former, generating human documentation, would gain nothing over the status quo for programmers (except in IPv8, which has virtually no documentation on its REST endpoints). An added benefit is that we can export the documentation to others (god forbid, like Javascript developers) without shipping the source code of the service though. Most importantly the latter, being able to check preconditions and postconditions at runtime, can be a great help in maturing our code. Right now any violations of our own REST specification will remain unnoticed by the service itself, any violations will have to be caught by a higher layer.

In addition, each pull request will have to get an additional hook for documentation generation.

Yep.

How do you see the real place of this in our development stack?

Most of this is based on the whole theory of provable execution/security/software engineering. We know that precondition/postcondition specification programming is fallible. How this will work out in practice, I do not know. However, I'm bored, on vacation and in the mood for something adventurous. So, I'll implement this and report back once I know more about the practical side of this and whether or not we should want it at all 😃.

@ichorid
Copy link
Contributor

ichorid commented Dec 21, 2018

Being a great fan of pre- post- condition checking, I totally support this one. Though, we need @xoriole's opinion on how to best integrate it in our development process

@qstokkink
Copy link
Collaborator Author

After reflecting on my own answer a bit, I realized I basically want the REST API to guarantee the following two things at runtime:

  • (Any valid input) implies ((valid output) or (an error))
  • (Any invalid input) implies (an error)

Of course validity in this context only applies to the typechecking of the data.

@qstokkink
Copy link
Collaborator Author

Well, the NetworkEndpoint prototype for runtime precondition and postcondition checking is working: qstokkink@d057bf4#diff-963a914f2490b6181d59564ab191dbc1

I still need to finish the documentation generator. Smooth sailing so far.

@qstokkink

This comment has been minimized.

@qstokkink

This comment has been minimized.

@qstokkink
Copy link
Collaborator Author

I added a specification to the OverlayStatisticsEndpoint:

@RESTOutput(lambda input: True,
                ({
                     "statistics": [
                         {
                             (STR_TYPE["ASCII"], "The Community name."): {
                                 (STR_TYPE["ASCII"], "The message name."): {
                                     "identifier": (NUMBER_TYPE, "The message number."),
                                     "num_up": NUMBER_TYPE,
                                     "num_down": NUMBER_TYPE,
                                     "bytes_up": NUMBER_TYPE,
                                     "bytes_down": NUMBER_TYPE,
                                     "first_measured_up": NUMBER_TYPE,
                                     "first_measured_down": NUMBER_TYPE,
                                     "last_measured_up": NUMBER_TYPE,
                                     "last_measured_down": NUMBER_TYPE
                                 }
                             }
                         }
                     ]
                 },
                 "The statistics per message per community, if available."))
    def render_GET(self, _):
        ...

    @RESTInput("enable", (BOOLEAN_TYPE, "Whether to enable or disable the statistics."))
    @RESTInput("overlay_name", (STR_TYPE["ASCII"], "Class name of the overlay."))
    @RESTInput("all", (BOOLEAN_TYPE, "Update applies to all overlays."))
    @RESTOutput(lambda input: True,
                {
                    "success": (BOOLEAN_TYPE, "Whether the request succeeded, see error."),
                    "error": (STR_TYPE["ASCII"], "The available information in case of no success.")
                })
    def render_POST(self, request):
        ...

Here's what the generated OverlayStatisticsEndpoint documentation looks like (generated from the webpage with wkhtmltopdf):

OverlayStatisticsEndpoint.pdf

@qstokkink qstokkink self-assigned this Dec 23, 2018
@qstokkink

This comment has been minimized.

@qstokkink

This comment has been minimized.

@qstokkink
Copy link
Collaborator Author

And here is the complete, corrected and auto-generated IPv8 REST API specification:

IPv8 REST API.pdf

@qstokkink
Copy link
Collaborator Author

qstokkink commented Jan 20, 2019

Lessons learned:

  • Our current API is not beautiful enough.
  • If we rework it anyway and we no longer require custom REST API generator code to deal with our own interesting constructions, we might as well switch to a Python Swagger library.

We will have to check out the sizable list of Python Swagger libraries and see what works best for our Twisted-based stack (https://github.com/gangverk/flask-swagger seems like a good contender).

I'll close #397 in favor of this plan.

@qstokkink qstokkink removed their assignment Jan 20, 2019
@qstokkink qstokkink changed the title Automatically generate REST API specification Use Swagger Jan 21, 2019
@DanGraur
Copy link
Contributor

DanGraur commented Jul 22, 2019

After taking a long look into the world of web frameworks for Python I have reached a few conclusions. Moreover, I've also written 4 prototypes to exemplify how some candidate frameworks would work in a real application. There is a prototype for each of the following frameworks: Flask, Klein, Quart, and Quart-Trio (with Trio as the asynchronous framework). These prototypes are in a public repository. Each of the prototypes has a module called server.py, which holds the handlers for HTTP requests, implemented with the help of the aforementioned frameworks. Bar Flask (which is purely synchronous), all prototypes handle 5 types of requests: echo (both sync and async), a simple calculator (both sync and async), a random sentence generator (only async). Flask implements the echo and the calculator handlers, but only in a synchronous fashion. There is a README for each of the prototypes which details the dependencies, as well as instructions on how to start up the demos. It should be mentioned that each demo has a caller.py module, which sends requests to the server programmatically (although it is equally possible to forward requests manually via a browser). Where possible, this also starts up the server itself, and lets it run indefinitely. Where this is not possible, another module is employed: start.py. Regardless, the following are my conclusions following this survey:

  • Flask: is a synchronous framework which can only be hooked up to Twisted via an WSGI (more specifically the WSGIResource in Twisted). Since this framework is synchronous, its request handlers cannot make use of asynchronous code. Thus, considering the importance of asynchronicity for the Tribler ecosystem, I wouldn't recommend we go with this, since it is unsuitable for our needs.

  • Klein: is an asynchronous web framework built on top of Twisted. It is in fact maintained by the Twisted devs, and located in the Twisted repo. Since there is a high degree of integration between the two frameworks, changing our current API to Klein would not require a large amount of effort. In fact, the endpoints can largely remain the same, although there will be no need of Twisted's Resource class. Hence, the code will be simplified. However, the decision of using this framework is largely based on one important factor: do we intend to keep using Twisted, or do we intend to switch to some other asynchronous framework? If the answer is yes, we do intend to keep Twisted for the foreseeable future then using Klein might be worthwhile. If the answer is no, we intend to change to something other than Twisted then this would be wasted effort, and using this framework would not be advisable.

  • Quart: is an asynchronous web framework based on Asyncio (as it uses the Asyncio event loop). Quart is essentially designed to be Flask for asynchronous handlers, so using it to set up handlers is a straight forward task. The asynchronous code inside the handlers is based on the async/await constructs (introduced in Python 3.5) and Asyncio (it makes no use of Twisted), so this will intuitively mean that all our current handlers will have to be switched to Asyncio. However, as I have seen on the Technology Roadmap Ticket there is an intention to do so quite soon. This would suggest that Quart is the way to go in case the decision to move to Asyncio is certain (Klein would not be viable anyway since Twisted will be phased out).

  • Quart-Trio: I've seen a lot of interest for Trio both online, and in some of the discussions here in the Tribler discussion boards on GitHub. As a short intro, Trio is an alternative to both Asyncio and Twisted. It's gained a large following in the Python community for the straight-forward nature in which it allows its users to implement some types of asynchronous tasks. Since Trio is not really a web-framework, I employed Quart-Trio, which is a small extension of Quart so that it makes use of the Trio event loop, rather than the Asyncio event loop (as a side note, the two event loops are incompatible, hence, Trio and Asyncio cannot be used together). Quart-Trio is meant to work like Quart, and in fact, it still requires that it is installed, since it uses some of its components (such as Blueprints, which are required in order to enable multi-module applications). Besides some framework specific constructs, such as nurseries in the case of Trio, there are no major differences in the handlers for Quart-Trio and Quart (in both cases async and await are used as one would expect). I also used the asks framework here to forward asynchronous requests, since this framework is implemented based on the Trio event loop. I should however mention, that Quart-Trio requires Python 3.7. This framework should be used if the Trio framework is chosen as the next asynchronous engine of the Tribler ecosystem, otherwise I don't think it's a good time investment. Moreover, Quart-Trio is a relatively young project (7 months old), and has little following currently, but it is maintained by the same dev that maintains regular Quart, so that's a plus.

I should also mention that the initial goal of this task was to look through the stack of Swagger integrations, and identify those which provide support for the current asynchronous nature of the Tribler ecosystem, and also seamlessly add support for the OpenAPI Specification. Unfortunately, there are no such integrations available. And the Swagger part will have to be developed in-house.

As a conclusion, I think the decision of which web framework to use depends entirely on which asynchronous framework is to be used. If Twisted is here to stay, then Klein is the choice. If a move is made towards Asyncio, then Quart is probably our best way forward. Finally, if Trio is favored, then Quart-Trio is the way to go. I do encourage those interested in getting a glimpse of how the endpoints and frameworks (both the web and the aysnc ones) are used, and generally how these multi-module applications should be wired up for each of the discussed frameworks, to look into the prototype repository.

@ichorid
Copy link
Contributor

ichorid commented Jul 22, 2019

BTW, quart uses decorators for routes the same way klein and flask do. @DanGraur , I wonder why didn't you use the decorators in quart and quart-trio examples?

@ichorid
Copy link
Contributor

ichorid commented Jul 22, 2019

Indeed a googling for swagger quart netted some promising results! Apparently, we don't have to build the swagger+quart integration completely on our own.

@DanGraur
Copy link
Contributor

True, decorators are an option, but the decorators are essentially a shorthand to add_url_rule (see the Flask source code), so using either has the same effect.

I essentially didn't use decorators in those cases, since I instantiated the Blueprint objects as fields rather than static variables. Since they're fields I need to use them with the self. prefix, which doesn't work with decorators. If, however, static variables are used, then decorators are a possibility (since no self is required). I did this in the klein example, but not in the others (I think my intention was to have a bit o diversity in how these frameworks can be used, but I wrote that quite a while back, so I'm not sure if this was really my intention 🙂).

@qstokkink qstokkink added the priority: low Should be addressed at some point in the future label Jul 31, 2019
@qstokkink
Copy link
Collaborator Author

This will coincide with the move to Python 3, as such I will put it on hold.

@ichorid
Copy link
Contributor

ichorid commented Sep 9, 2019

At the moment, @egbertbouman almost completed porting Tribler to asyncio. He used aiohttp to reimplement the endpoints because it requires almost no other dependencies.

@qstokkink
Copy link
Collaborator Author

With the merge of #612, we now have Swagger docs.

@qstokkink qstokkink removed priority: medium Enhancements, features or exotic bugs on hold labels Nov 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: low Should be addressed at some point in the future
Development

No branches or pull requests

3 participants