-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CyNetDiff Submission #175
Comments
Editor in Chief checksHi there! Thank you for submitting your package for pyOpenSci review. Below are the basic checks that your package needs to pass to begin our review. If some of these are missing, we will ask you to work on them before the review process begins. Please check our Python packaging guide for more information on the elements below.
Editor commentsI think the documentation is a little bare currently to get started with the review. I'd like to see the information you wrote in this submission issue in the project repository itself. For example:
This is a nice elevator pitch to rework and include in the README. I'd like to see more content in the documentation as well. The API section is great, but the basic example lacks context and does not really show case what CyNetDiff is great for. I want to share with you this resource: Diátaxis framework.
|
Thank you for the feedback! I think those changes are reasonable and will significantly enhance the package, I'll try and expand the documentation by Monday so we can get the ball rolling on the review before too long 🚀 |
@Batalex I have just published a release with the following changes:
|
Thanks, this looks great. From what I can tell, we are good to go. I'll get started on finding the perfect editor for this review 🐈⬛ |
@Batalex I know we don't have an editor yet, but is it too early to begin suggesting reviewers? We've already found someone that might be a good candidate for this. |
That's fantastic! |
For sure! I'll be sure to email this person as well, but this is @jeremyzhangsq |
Shiqi has agreed to review and filled out the form! |
@Batalex just wanted to check back in and see if there was anything we could do to help find a suitable editor for this. We're not in a huge rush or anything, but we want to submit this package to a conference before too long. |
Hello @eliotwrobson, |
@Batalex Thank you for your detailed response. We don't quite have a target conference in mind, but we're starting more work on a longer version of what will be our paper for this. I mainly asked because we may be able to help search for an editor as well, depending on what the preferred qualifications are. For example, Shiqi has published papers related to this, and might be able to serve, depending on how heavy the workload is, and how difficult it would be to find two other reviewers. Regarding this general area, anyone who has worked on projects related to influence maximization or even just general graph theory might know enough to get the ball rolling. |
I can edit this, although I just was the editor for the |
@sneakers-the-rat we would greatly appreciate you serving as editor 🙂 |
Happy to do it, i'll be able to get started searching for reviewers tmrw :). (i'll let the EiC do the formal assignment) |
Thanks a lot @sneakers-the-rat and thanks to @Batalex who did the hard work as EIC here! |
Awesome! @sneakers-the-rat just to fill you in, we've already identified a reviewer (@/jeremyzhangsq) who has papers published in this area. So the other reviewer doesn't necessarily need to be a subject area expert, and I think it would be appropriate to have someone with more knowledge of Cython packaging / best practices, since that was the part I was most unfamiliar with when writing this package. |
hi there team! i'm just checking in on this review to see the status. it looks like there's been no activity for almost 3 months. Is there anything that i can do to support moving things forward? |
@sneakers-the-rat please don't apologize! It's so understandable! If you can take this over that would be wonderful!!! Thank you Jonny! 😊 |
Thanks all for getting the ball rolling! I'm not sure how much @/jeremyzhangsq checks GitHub, so I'll go ahead and ping him again via email. |
A quick couple of questions @lwasser /@sneakers-the-rat:
|
@Kai-Striega 👋🏻 i just saw your discord dm's! So sorry it took me a few days to respond. I answered you there, please do go through the entire template. I'm happy to either provide a mentor session OR to get someone else in our community to do so as well. You're also in great hands here with @sneakers-the-rat so i don't want to step on any toes more than I already have!! so please just let me know how I can best support everyone here 🚀 |
@eliotwrobson please see my review below. I'm happy to expand/talk about any of the points I have made. @sneakers-the-rat / @lwasser I haven't received a mentor yet, but had the available time to do it today. Please let me know if I am required to change/update anything. Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Readme file requirements
The README should include, from top to bottom:
NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)
UsabilityReviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Functionality
For packages also submitting to JOSS
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted. The package contains a
Final approval (post-review)
Estimated hours spent reviewing: 8 Review CommentsI have several review comments below. These are ordered by importance. Requires a C++ compiler to build/lack of wheelsWheels are provided only CPython 3.11 on Linux. This requires all other possible installers to compile the project from source. This is fairly simple, however it provides an extra step/requirements which is possible to avoid. My recommendation is to provide wheels for all officially supported Python versions on all major platforms. Cython code generationThe project currently caches the file
This is echoed by the official Cython user guide
Pseudo random number generationThe package currently uses C's from libc.stdlib cimport rand, RAND_MAX
cdef double RAND_SCALE = 1.0 / RAND_MAX
cdef inline double next_rand() nogil:
return rand() * RAND_SCALE There are several issues with
My recommendation is to adopt a more suitable random number generation engine. In the Cython world there are two ways I can recommend of doing this:
Note that (2) would also allow you to be compliant with SPEC-7. Minor Cython related improvementsNote these points are minor and should not be considered blocking. Use of
|
@Kai-Striega thank you for your detailed comments! I have a couple of questions, but I'll try to get started addressing some of these items soon.
Do you know of any documentation that has discussion of how to do this? I agree it is much more convenient to not need a compiler installed, but I wasn't sure how to set this up.
This makes sense to me, I could swear I read the opposite somewhere but I must have been mistaken.
Thank you for the references! I had actually thought about whether there was a standard way to handle the random number generation. I think I will follow option (2) and add the SPEC-7 compliance.
Is there a recommended type annotation that expresses what works with the Cython memoryview? It would be nice to add this annotation along with support for numpy arrays. |
You're welcome. Feel free to message if you have any more questions. I'll try answer your questions below.
AFAIK, there isn't much. You could look at how SciPy builds their wheels for nightly builds. This should include all the info you need. I'm happy to help out with this, but might be a bit slow at the moment. Note that SciPy's CI is far more complex than I would expect for this project.
This isn't cut and dry. In the past, it was advised to do the opposite. This has changed over the last decade or so. Some people still prefer to include the
You're welcome :)
Unfortunately, not that I've come across. The problem is that Cython memoryviews support anything that supports the Buffer Protocol this could include custom objects. My usual tactic is to use ArrayLike with a Python level wrapper, check if the value is scalar and then convert it to an array. This isn't great. An alternative option would be to create your own Protocol with all the buffer protocol methods. |
@Kai-Striega I've been able to implement most of the above changes without too many issues. I'm starting to take a look at the spec7 documentation and the examples you linked. Would it be better to provide client code with a wrapper class like the one you linked here that then gets passed to the model class that executes simulations? Or should client code directly pass in a numpy generator? From the timing code in the notebook you linked, it seems like it would be much faster to expose this wrapper class to client code, rather than having a numpy generator passed in. |
I'm at work right now, and can't respond in the length I'd like. I'll edit this over the next few days to expand on my thoughts. TLDR: Neither. SPEC-7 allows for a couple of different options. Here is the officially recommended way from collections.abc import Sequence
import numpy as np
SeedLike = int | np.integer | Sequence[int] | np.random.SeedSequence
RNGLike = np.random.Generator | np.random.BitGenerator
def my_func(*, rng: RNGLike | SeedLike | None = None):
"""My function summary.
Parameters
----------
rng : `numpy.random.Generator`, optional
Pseudorandom number generator state. When `rng` is None, a new
`numpy.random.Generator` is created using entropy from the
operating system. Types other than `numpy.random.Generator` are
passed to `numpy.random.default_rng` to instantiate a `Generator`.
"""
rng = np.random.default_rng(rng)
... I would use this as a wrapper for your Cython functions. |
Short update that I've addressed most of the items from the review comments in a new release (v0.1.14) besides the rng seeding that is still being discussed and adding more builds. Not quite ready for a re-review till we address the above. Also emailed shiqi, and looks like he is traveling now but will be back to review in early November 👍🏽 |
Catching up - here are the things i see raised in the review: making a checklist mostly for tidyiness sake, not trying to list out requirements - tell me which of these are taken care of/still outstanding
Thanks for the detailed review :) I learned some things about cython here too. If it would be helpful, you might consider opening up the remaining items as issues that link back to this one just to keep track of what's left todo/use issues as a cheap form of threading. Re: building for multiple platforms, this sounds like something we would want to have in the packaging guide, so if you figure something out, maybe we put instructions there? |
@sneakers-the-rat thanks for writing that out. I've removed all the instances of name mangling I could find, and you are correct that removing generated artifacts is done. For the statement of need, would you like something written that can go on the pyos site, or that also to be added to the library documentation? A simple description is that this package is a faster version of ndlib. |
While this is noble, I think a unified approach to building will be hard, especially when extension modules are involved. It will depend on which languages are used, which platforms you build for, how complicated you want to get, and so on... In particular, I feel that Poetry doesn't handle extension modules particularly well. There is SPIN which much of the scientific Python ecosystem seems to be standardising on, however this would require a complete rewrite of CyNetDiff's infrastructure.
If you're going to make claims that it's faster, it would be great to show it. I think this should be required if you make performance claims. |
Fair! i was thinking of it as a starting point for a more general guide, but let's leave that aside so we don't muddy this review :) Moved that over here: pyOpenSci/python-package-guide#422
Agreed that we should be able to show a benchmark to claim this. @eliotwrobson to answer your earlier question, yes this would be for the pyos website, where ideally the goal is that we build out the website so that it's possible to see a package and its 'neighbors,' "the family of tools that do things like x," and much like how open reviews on eLife allow readers to know the context of a paper and how people in that field think of it, it would be great (but not required) to provide that same context for software. "why this and not that? etc." If benchmarks are not in reach at the moment, a simple "see also: ndlib" would be fine |
For the benchmark, some light benchmarks are included in this example notebook: https://github.com/eliotwrobson/CyNetDiff/blob/main/examples/visualizations.ipynb I'm not sure if there's a more standard way of doing this? We also include some discussion of these benchmarks in our paper: https://www.vldb.org/pvldb/vol17/p4409-umrawal.pdf Regarding building, although changing / documenting something to work with Poetry is probably not the most productive given what others have said, I think having some example related to this in the packaging guide would be extremely useful. I had a very hard time even figuring out how to get started writing a package that used Cython, and this would be extremely useful to me on another project I want to get started on before too long. |
If you have published benchmarks already, then that's good enough for me for a claim like that - we can just cite them :) Also am i reading this right, that an identical operation took 14s with ndlib and 0.067s with cynetdiff? that's.... certainly a perf improvement. well done. Also i forgot to say earlier, i was asking for a context statement for the website but i also generally (i.e. outside of the purposes of the review) think they are good to have in documentation. If you think you've improved on something, I don't think it's disrespectful to compare yourself to it esp on something objective like 'the time it takes to perform an identical operation,' and it's helpful to everyone to have better linking between what's out there. One example i love is how the first two sections of the dask dataframes 'best practices' page are "use pandas" (and then clarifications about when you would want to use dask) |
I'm happy with the benchmarks as-is, for the purpose of a formal review, I think there is room for improvement.
There are tools out there to do this for you. The one that seems to be favoured by NumPy/SciPy (where I do most of my FOSS stuff) is air speed velocity. There are several other tools also available, but I don't know too much about them.
I've tried building Cython projects with Poetry and actually gave up. I found it very hard/confusing, so just getting your package to build should be considered an achievement. My go to for building Cython projects is now SPIN. This uses Meson which has an inbuilt Cython Module. |
@sneakers-the-rat I think that even this simple recommendation alone would be a useful thing to add to the packaging guide 👍🏽 Regarding the remaining review items, I need to refactor another feature for this package to implement something for a paper I'm working on. I think the other reviewer is going to be able to take a look soon, so I'll wait for his comments before attacking the rest of these items. |
Great - if it would be possible to open issues for each of the requested changes linking back to this issue (either of you), then I (or you) can make a list in the OP of completed/outstanding work to do so that the second reviewer can know whats already been raised/is being addressed and they can focus their review elsewhere, or comment on those ongoing issues! |
@sneakers-the-rat just made associated issues and linked them back here (they are the top 4 in the current issues list). |
@eliotwrobson check this out:D Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Readme file requirements
The README should include, from top to bottom:
NOTE: If the README has many more badges, you might want to consider using a table for badges: [see this example](https://github.com/ropensci/drake). Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)
UsabilityReviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Functionality
For packages also submitting to JOSS
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted. The package contains a
Final approval (post-review)
Estimated hours spent reviewing: 6 hours Review CommentsI have to admit that this project really surprised me. Previously, the IM community mainly used C++ for implementation for the sake of efficiency. However, it may cause inconvenience for real-world development and deployment (from my own experience). Thus, I believe this project will help our community a lot. Thanks to other reviewers’ comments, I can quickly catch up with the progress. Overall, this project meets my standards except two minors to improve.
An example of the LT variant that requires customized thresholds and weights is this:
|
Thanks to both reviewers for the feedback! Have a few more urgent things to take care of but will try to get through some of these changes over the upcoming holiday 👍🏽 EDIT: A little behind schedule (whoops), but will get to everything mentioned here before too long. |
Finally had a chance to go over things again, and I managed to get a PR for the most significant item so far (SPEC7 compliance): eliotwrobson/CyNetDiff#28 @Kai-Striega if you have time, could you glance at this PR briefly? No need to spend too much time looking through the internals, I just want to double check that we're doing everything as expected in the spec. As for some of the other comments, I think that we have a sufficient amount of customization in the model creation for now (edge weights are accepted as arguments by both models, and adding custom thresholds for the linear threshold model won't give a performance boost over pure-python unless we adopt a custom model of randomly generating thresholds in each simulation). So the only remaining items are the following
|
@eliotwrobson I'll take a look, hopefully today.
I think having wheels for multiple platforms is important. If you don't supply them, the user has to figure out how to compile your project on their own. i.e. you're moving the burden of having to figure out how to compile it on their platform from yourself to each individual user. I'm happy to help with this a bit if you need it, basically for NumPy/SciPy we build it as a CI job. You can see how NumPy does it by looking in the .github/workflows/ directory. |
@Kai-Striega Thank you! This would be greatly appreciated 🙏🏽 🙏🏽 I'd be open to switching the build system as well if that is necessary, I'm just currently not sure how to set up automated publishing with respect to this. |
Submitting Author: Name (@eliotwrobson)
All current maintainers: (@eliotwrobson, @abhishekumrawal)
Package Name: CyNetDiff
One-Line Description of Package: A performance-focused library implementing algorithms for simulating network diffusion processes, written in Cython.
Repository Link: https://github.com/eliotwrobson/CyNetDiff
Version submitted: v0.1.13
EIC: @Batalex
Editor: @sneakers-the-rat, @lwasser
Reviewer 1: @Kai-Striega
Reviewer 2: @jeremyzhangsq
Archive: TBD
JOSS DOI: TBD
Version accepted: TBD
Date accepted (month/day/year): TBD
Code of Conduct & Commitment to Maintain Package
Description
Network diffusion processes aim to model the spread of information through social networks, represented using graphs. Experimental work involving these models usually involves simulating these processes many times over large graphs, which can be computationally very expensive. At the same time, being able to conduct experiments using a high-level language like Python is helpful to researchers, as this gives greater flexibility in developing research software. To address both of these concerns, CyNetDiff is a Cython module implementing the independent cascade and linear threshold models, two of the most popular network diffusion models. Development has been focused on performance, while still giving an intuitive, high-level interface to assist in research tasks.
Scope
Please indicate which category or categories.
Check out our package scope page to learn more about our
scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):
Domain Specific
Community Partnerships
If your package is associated with an
existing community please check below:
For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):
This is aimed at researchers working in areas related to network diffusion and influence maximization, and specifically at optimizing the most computationally expensive part of this process. This should enable researchers to conduct experiments on larger graphs than would be possible with a pure-Python package. For a recent work doing experiments that fit the use cases of this package, see https://arxiv.org/abs/2207.08937
There is a previous package filling a similar use case called ndlib: https://github.com/GiulioRossetti/ndlib
Our package differs as it was developed with a focus on performance, and with lesser emphasis on visualization
and flexibility (for example, we do not have a way of defining custom models). Using code compiled with Cython
allows our package to handle much larger graphs than are possible with a pure-Python package like ndlib.
@tag
the editor you contacted:#165, @Batalex
Continuing off of the discussion there, the main goals of the review for me are to finalize the interface and add related features / utility code to aid usability. In terms of concrete features, the algorithms already implemented are enough for a full release.
Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication Options
Is it possible to add the paper submission after this review is completed? Would like the content of the paper to incorporate any functionality that gets added during the review process.
JOSS Checks
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.Note: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.
Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Confirm each of the following by checking the box.
Please fill out our survey
submission and improve our peer review process. We will also ask our reviewers
and editors to fill this out.
P.S. Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
The editor template can be found here.
The review template can be found here.
Review issues
Footnotes
Please fill out a pre-submission inquiry before submitting a data visualization package. ↩
The text was updated successfully, but these errors were encountered: