Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any plans to give an easy way to make DSL compatible with elasticsearch_async #556

Closed
pfreixes opened this issue Jan 12, 2017 · 8 comments
Closed

Comments

@pfreixes
Copy link

Hi,

I've seen that you have been working on a project [1] to proxy the official client in aims to achieve a pure asynchronous behavior fully compatible with asyncio. I was wondering if there is any plan to do the same for the DSL project.

Even though is possible to use an asynchronous connection, as the following snippet shows, the principal interface implementation does not allow the developer to use the await statement to get the response. The intermediate object Response is not designed to do so.

client = AsyncElasticsearch(hosts=['localhost'])
async def test():
    s = Search(using=client, index="my-index") \
        .filter("term", category="search") \
        .query("match", title="python")   \
        .query(~Q("match", description="beta"))

    response = await s.execute()

Is there an plan to implement all of this stuff necessary to make the library compatible with asynchronous environments ?

[1] https://github.com/elastic/elasticsearch-py-async

@a-urth
Copy link

a-urth commented Jan 12, 2017

Some thoughts - actually it mighty be possible without breaking api. if there was a way that search/request object won't use client/connection inside it and be operated just for query construction and not execution, then it would be possible to use it with any other client, for example async one. And we could just wrap response from client in the same manner Search object does it. Of course open question is scan and some other features, but it seems to be easier and faster than just forking and reimplementing api in async manner.

@honzakral
Copy link
Contributor

I currently have no plan to implement this purely for time- and expertise- constraints. I would be very happy to help review and merge a solution if someone came up with it.

I think it should just be a matter of implementing a subclass of Response that will be able to work with the future returned by AsyncElasticsearch.search. You can, even now, override the class used on a per request basis by calling the response_class method (0).

I think the proper way to implement this would be to create AsyncSearch class that would default to such AsyncResponse and perhaps add corresponding async_search methods to DocType and Index classes. My only concern (from my lack of expertise in asyncio) is how easy it would be to make this python 2 compatible since I am not ready (though I would love to! :) ) to make elasticsearch-dsl python 3 only.

0 - http://elasticsearch-dsl.readthedocs.io/en/latest/api.html#elasticsearch_dsl.Search.response_class

@pfreixes
Copy link
Author

The compatibility can reach more or less easily. As an example have a look to this MR [1].

As you said the idea is work with an adhoc class for the response_class in aims to deal with the asynchronous client, handling either Futures - just thinking in elasticsearch_async - or coroutines.

Maybe the tricky thing here is how to get support for that without replace the Search for a new class that it might be annoying for developers.

Let me think a bit and I will try to come back with a solution.

[1] https://github.com/rkern/line_profiler/pull/75/files

@pfreixes
Copy link
Author

I have a proof of concept that mixes asyncronous and syncronous behaviour using an intermediate Future when the object returned by the client is awaitable. Just as a resume the execute function might get the following shape:

    def execute(self, ignore_cache=False):
        """
        Execute the search and return an instance of ``Response`` wrapping all
        the data.

        :arg response_class: optional subclass of ``Response`` to use instead.
        """
        if ignore_cache or not hasattr(self, '_response'):
            es = connections.get_connection(self._using)
            response = es.search(
                index=self._index,
                doc_type=self._doc_type,
                body=self.to_dict(),
                **self._params
            )

            if inspect.isawaitable(response):
                f = Future()
                def _build_response(task):
                    self._response = self._response_class(
                        self,
                        response.result()
                    )
                    f.set_result(self._response)
                response.add_done_callback(_build_response)
                return f
            else:
                self._response = self._response_class(
                    self,
                    response
                )
        return self._response

Leaving aside readability u other stuff this code allows the library to keep the interface without the need of create derivated classes that will be used ad-hoc for asynchronous environments.

Then what ? Refine a bit the code and cover all of these places where there is an interaction with the client to test if the object returned is awaitable or not. Worth mentioning that the awaitable function is available from Python 3.5, since this version asyncio API almost didn't change, in fact in Python 3.6 has been marked as unbreakable and Python community is gonna keep the contract, something that didn't happen with older versions.

My suggestion here is kept this implementation only for Python 3.5, previous versions since 2.X to 3.4 will work with the regular and synchronous implementation only. It can be achieved with a simple test of the Python version and introducing a if statement to decide if the asynchronous version is supported or not.

And last but not least, I should be able to prepare a proper MR the following days.

Cheers,

@pfreixes
Copy link
Author

pfreixes commented Jan 15, 2017

@honzakral I have a proposal, follow this link [1].

This commit only enables the support for asynchronous drivers for the Search.execute method, later when we will come up with an agreement the whole library can be covered to get support for asynchronous clients.

Basically, the commit incorporates the following ideas:

  • A new constant called ASYNC_SUPPORTED created
  • Different function implementation having in mind if the library is running in a ASYNC_SUPPORTED or not.
  • The use of none compatible statements such as async, await is forbidden in the most upper layer. If these are needed, hide the use of them to another file and import if ASYNC_SUPPORTED is True
  • All asynchronous tests are placed at _test_async.py to avoid be gathered by the runner.
  • Tests from _test_async are incorporated to each module if and only if ASYNC_SUPPORTED is True

Comments will be wellcomed.

[1] https://github.com/Skyscanner/elasticsearch-dsl-py/commit/814fff7d953362b3d16915b986e7eba2cf574721

@pfreixes
Copy link
Author

This last commit [1] improves a bit the current implementation, right now there is no need to implement two different versions of each ES-Dsl function that needs to deal with both client implementations. As a result, maintainability becomes easier than before.

[1] https://github.com/Skyscanner/elasticsearch-dsl-py/commit/be83cb2b2c15c64c4bcd45fc14bdf0705f0350ad

@pfreixes
Copy link
Author

pfreixes commented Jan 22, 2017

I have a couple of red flags. Not a big deal but they must be resolved.

First of all about the helpers, such as scan one implemented by the elasticsearch-py synchronous implementation and not implemented by elasticsearch_async. I can see that ES-dsl publishes an interface that uses this method. Seeing that, what I can do here is just publish a new PR to the elasticsearch_async project to get support for the asynchronous version of this method. Even thoguth won't be a bad idea implement all helpers in asynchronous flavor, but it will take time and can be done later in other MR.

About integration tests, first of all, how can I run them? Once say that, will be nice implement a minimum set of integration tests using an asynchronous client which they will cover the interfaces that are compatible with asynchronous clients, explicitly using elasticsearch_async client. I guess that some of the dependencies to - again - helpers can be implemented in that integration tests.

Any thoughts?

@sethmlarson
Copy link
Contributor

Closing this as a duplicate of #1355

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants