-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support --boxed on Windows #270
Comments
Yeah you hit the main reason why pytest performs the collection on each worker as opposed to performing the collection on the master node and then distributing the items (see the bottom of OVERVIEW for a more detailed explanation). Possibly a a solution would be to override |
Hi @nicoddemus, |
Also it is worth mentioning pytest-mp, although it is in its early stages it might be closer to do what you need. |
another note |
another general note, currently pytest items are structured in a way that makes it impossible to correctly de-serialize them with a simple serialization scheme, major internal re-factorings in pytest would be needed to even enable it |
@AlexanderTyrrell Did you try with If you are looking into multiprocessing and complex objects serialization you might want to have a look at osBrain (disclaimer: I am the main author). It is like multiprocessing but has integrated and configurable dill/cloudpickle/pickle/json/raw serialization. It integrates Pyro4 as well, so for simple tasks it is very easy to configure and access the agents (processes) remotely to retrieve results from the workers. For more complex architectures it uses pyzmq for message passing between agents. Should work on Windows, although I would recommend the latest development version for that. |
@Peque thanks for sharing While it would allow to pickle function objects, the main problem still remains: we need to send fixture and config objects between master and workers and keep them synchronized, in other words, changes to a fixture or config object made in master or a worker must be reflected back to the master and all other workers. |
@nicoddemus In osBrain we also implemented what we call "channels" (advanced or more specific communication patterns implemented with basic ZMQ sockets), although they will not be documented until next release (probably during February). One of them is a synchronized PUB-SUB pattern between a master and many workers, in which the master shares an object/state with the workers sending updates on any change (and workers can notify the master on any change for it to publish the update back to all the other workers). If you are thinking about using "shared memory" that can also be achieved easily as agents are by default guaranteed to be accessed remotely only by one other agent at a time (that is easy when you use message passing everywhere). So you do not have to worry about locks, race conditions etc.. But still, I do not know the details nor probably understand well the problem, maybe it is more complex. If I find some spare time I might look at it, as it sounds like a problem that could be easier to implement with osBrain than with bare Python. |
@Peque thanks for the explanation, what you have in osBrain is certainly interesting! However we should probably prototype this somewhere to see if the synchronization between workers and master in this case improves performance; also it would require profound changes inside xdist, which must also be taken in consideration. |
@Peque execnet is already handling the multi process part in a similar manner, explicitly using a limited serialization because pretending that complex remote objects can just work like normal objects is a leaky abstraction that easily topples over ^^ - in particular when latency or contention comes into play |
@RonnyPfannschmidt Yeah, the remote-object-like-normal-objects part (the Pyro4 part of osBrain) is optional and just simplifies configuration and prototyping. For the most part, and for more complex architectures, it is all message passing with ZMQ under the hood, which still simplifies things as it is very flexible (i.e.: multiple communication patterns and you can very easily change the transport layer so remote and local processes can communicate in the same way). Will have a look at execnet to better understand how it works, thanks! I may find some useful ideas there. 😊 |
For the record: osBrain 0.6.0 is out with some documentation on channels (and in particular, on synced pub-sub channels). Just in case someone wants to try that approach. |
I am developing a C/C++ module for Python (3.6 64-bit) under Windows, and would like to use the pytest + xdist framework to write tests for it. Each test should run in its own process, in case there is a crash in one of the tests. xdist supports this with the --boxed option, but unfortunately this does not work under Windows (as xdist uses os.fork provided by _process/forkedfunc.py to spawn subprocesses).
I tried extending this using the multiprocessing package, but without success so far. It requires serializing the pytest.Item object (multiprocessing uses pickle to exchange functions and objects between processes), which seems far from simple given its complexity. My idea is to convert pytest.Item to a serializable object (e.g. SerializableItem), which can be used to generate a proper pytest.Item in the subprocess. I got as far as reaching the Pytest runner with the generated Item, but I am stuck with generating pytest.Item._request.
Has anyone tried a similar approach? Maybe it is the wrong way to go given the complexity of the Item class.
I also tried to serialize pytest.Item using dill, which went much further than pickle, but the whole object could not be serialized (some part of the pytest.Config as far as I can tell).
Any help with this would be very welcome :-)
The text was updated successfully, but these errors were encountered: