CyNetDiff Submission #175

eliotwrobson · 2024-04-22T03:28:38Z

Submitting Author: Name (@eliotwrobson)
All current maintainers: (@eliotwrobson, @abhishekumrawal)
Package Name: CyNetDiff
One-Line Description of Package: A performance-focused library implementing algorithms for simulating network diffusion processes, written in Cython.
Repository Link: https://github.com/eliotwrobson/CyNetDiff
Version submitted: v0.1.13
EIC: @Batalex
Editor: @sneakers-the-rat, @lwasser
Reviewer 1: @Kai-Striega
Reviewer 2: @jeremyzhangsq
Archive: TBD
JOSS DOI: TBD
Version accepted: TBD
Date accepted (month/day/year): TBD

Code of Conduct & Commitment to Maintain Package

I agree to abide by pyOpenSci's Code of Conduct during the review process and in maintaining my package after should it be accepted.
I have read and will commit to package maintenance after the review as per the pyOpenSci Policies Guidelines.

Description

Include a brief paragraph describing what your package does:

Network diffusion processes aim to model the spread of information through social networks, represented using graphs. Experimental work involving these models usually involves simulating these processes many times over large graphs, which can be computationally very expensive. At the same time, being able to conduct experiments using a high-level language like Python is helpful to researchers, as this gives greater flexibility in developing research software. To address both of these concerns, CyNetDiff is a Cython module implementing the independent cascade and linear threshold models, two of the most popular network diffusion models. Development has been focused on performance, while still giving an intuitive, high-level interface to assist in research tasks.

Scope

Please indicate which category or categories.
Check out our package scope page to learn more about our
scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):
- Data retrieval
- Data extraction
- Data processing/munging
- Data deposition
- Data validation and testing
- Data visualization¹
- Workflow automation
- Citation management and bibliometrics
- Scientific software wrappers
- Database interoperability

Domain Specific

Geospatial
Education

Community Partnerships

If your package is associated with an
existing community please check below:

Astropy:My package adheres to Astropy community standards
Pangeo: My package adheres to the Pangeo standards listed in the pyOpenSci peer review guidebook

For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):
- Who is the target audience and what are scientific applications of this package?

This is aimed at researchers working in areas related to network diffusion and influence maximization, and specifically at optimizing the most computationally expensive part of this process. This should enable researchers to conduct experiments on larger graphs than would be possible with a pure-Python package. For a recent work doing experiments that fit the use cases of this package, see https://arxiv.org/abs/2207.08937

Are there other Python packages that accomplish the same thing? If so, how does yours differ?

There is a previous package filling a similar use case called ndlib: https://github.com/GiulioRossetti/ndlib
Our package differs as it was developed with a focus on performance, and with lesser emphasis on visualization
and flexibility (for example, we do not have a way of defining custom models). Using code compiled with Cython
allows our package to handle much larger graphs than are possible with a pure-Python package like ndlib.

If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:

#165, @Batalex

Continuing off of the discussion there, the main goals of the review for me are to finalize the interface and add related features / utility code to aid usability. In terms of concrete features, the algorithms already implemented are enough for a full release.

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

does not violate the Terms of Service of any service it interacts with.
uses an OSI approved license.
contains a README with instructions for installing the development version.
includes documentation with examples for all functions.
contains a tutorial with examples of its essential functions and uses.
has a test suite.
has continuous integration setup, such as GitHub Actions CircleCI, and/or others.

Publication Options

Is it possible to add the paper submission after this review is completed? Would like the content of the paper to incorporate any functionality that gets added during the review process.

Do you wish to automatically submit to the Journal of Open Source Software? If so:

JOSS Checks

The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
The package is deposited in a long-term repository with the DOI:

Note: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Confirm each of the following by checking the box.

I have read the author guide.
I expect to maintain this package for at least 2 years and can help find a replacement for the maintainer (team) if needed.

Please fill out our survey

Last but not least please fill out our pre-review survey. This helps us track
submission and improve our peer review process. We will also ask our reviewers
and editors to fill this out.

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The editor template can be found here.

The review template can be found here.

Review issues

Please fill out a pre-submission inquiry before submitting a data visualization package. ↩

The text was updated successfully, but these errors were encountered:

Batalex · 2024-04-26T18:49:30Z

Editor in Chief checks

Hi there! Thank you for submitting your package for pyOpenSci review. Below are the basic checks that your package needs to pass to begin our review. If some of these are missing, we will ask you to work on them before the review process begins.

Please check our Python packaging guide for more information on the elements below.

Initial onboarding survey was filled out
We appreciate each maintainer of the package filling out this survey individually. 🙌
Thank you authors in advance for setting aside five to ten minutes to do this. It truly helps our organization. 🙌

Editor comments

I think the documentation is a little bare currently to get started with the review. I'd like to see the information you wrote in this submission issue in the project repository itself. For example:

Network diffusion processes aim to model the spread of information through social networks, represented using graphs. Experimental work involving these models usually involves simulating these processes many times over large graphs, which can be computationally very expensive. At the same time, being able to conduct experiments using a high-level language like Python is helpful to researchers, as this gives greater flexibility in developing research software. To address both of these concerns, CyNetDiff is a Cython module implementing the independent cascade and linear threshold models, two of the most popular network diffusion models. Development has been focused on performance, while still giving an intuitive, high-level interface to assist in research tasks.

This is a nice elevator pitch to rework and include in the README.

I'd like to see more content in the documentation as well. The API section is great, but the basic example lacks context and does not really show case what CyNetDiff is great for. I want to share with you this resource: Diátaxis framework.
I am not asking that you follow the full framework to the letter, and it would be a huge task to do so, but I think there is some virtue in thinking about the kind of documentation you aim to showcase.
The documentation is the first thing a potential user will take a look at. As such, it needs to quickly guide the reader to:

what does it do? (= should I try it?)
how does it work? (= ok, I think this is the tool I need, how do I proceed?)

eliotwrobson · 2024-04-26T21:31:36Z

Thank you for the feedback! I think those changes are reasonable and will significantly enhance the package, I'll try and expand the documentation by Monday so we can get the ball rolling on the review before too long 🚀

eliotwrobson · 2024-04-29T06:58:09Z

@Batalex I have just published a release with the following changes:

Updated the elevator pitch and example in the README
Added two longer extended examples to the documentation site

Batalex · 2024-05-02T16:49:07Z

Thanks, this looks great. From what I can tell, we are good to go. I'll get started on finding the perfect editor for this review 🐈‍⬛

eliotwrobson · 2024-05-10T23:39:00Z

@Batalex I know we don't have an editor yet, but is it too early to begin suggesting reviewers? We've already found someone that might be a good candidate for this.

Batalex · 2024-05-11T12:09:34Z

@Batalex I know we don't have an editor yet, but is it too early to begin suggesting reviewers? We've already found someone that might be a good candidate for this.

That's fantastic!
Could you please orient them to our signups form? https://forms.gle/GHfxvmS47nQFDcBM6

eliotwrobson · 2024-05-11T15:55:00Z

For sure! I'll be sure to email this person as well, but this is @jeremyzhangsq

eliotwrobson · 2024-05-14T16:46:20Z

Shiqi has agreed to review and filled out the form!

eliotwrobson · 2024-06-28T18:23:07Z

@Batalex just wanted to check back in and see if there was anything we could do to help find a suitable editor for this. We're not in a huge rush or anything, but we want to submit this package to a conference before too long.

Batalex · 2024-06-29T17:17:07Z

Hello @eliotwrobson,
Unfortunately, I do not have an editor at the moment. There are two reasons for that: first, we are so successful our current editors have their hands full currently (let me brag a little). Second, the domain is more niche than others, so we (or more specifically I) cannot rely on personal connections or previous involvements to find someone with a relevant expertise. What would be your target conference? I'll see what I can do to help accelerate the process.

eliotwrobson · 2024-06-29T20:46:32Z

@Batalex Thank you for your detailed response. We don't quite have a target conference in mind, but we're starting more work on a longer version of what will be our paper for this. I mainly asked because we may be able to help search for an editor as well, depending on what the preferred qualifications are. For example, Shiqi has published papers related to this, and might be able to serve, depending on how heavy the workload is, and how difficult it would be to find two other reviewers.

Regarding this general area, anyone who has worked on projects related to influence maximization or even just general graph theory might know enough to get the ball rolling.

sneakers-the-rat · 2024-06-30T01:16:45Z

I can edit this, although I just was the editor for the automata package and maybe the authors want a break from me ;)

eliotwrobson · 2024-06-30T01:29:07Z

@sneakers-the-rat we would greatly appreciate you serving as editor 🙂

sneakers-the-rat · 2024-07-02T22:44:29Z

Happy to do it, i'll be able to get started searching for reviewers tmrw :). (i'll let the EiC do the formal assignment)

cmarmo · 2024-07-03T20:27:41Z

i'll let the EiC do the formal assignment

Thanks a lot @sneakers-the-rat and thanks to @Batalex who did the hard work as EIC here!
Happy review to all!

eliotwrobson · 2024-07-04T17:44:22Z

Awesome! @sneakers-the-rat just to fill you in, we've already identified a reviewer (@/jeremyzhangsq) who has papers published in this area. So the other reviewer doesn't necessarily need to be a subject area expert, and I think it would be appropriate to have someone with more knowledge of Cython packaging / best practices, since that was the part I was most unfamiliar with when writing this package.

lwasser · 2024-10-01T17:35:41Z

hi there team! i'm just checking in on this review to see the status. it looks like there's been no activity for almost 3 months. Is there anything that i can do to support moving things forward?

lwasser · 2024-10-19T16:38:56Z

@sneakers-the-rat please don't apologize! It's so understandable!

If you can take this over that would be wonderful!!! Thank you Jonny! 😊

eliotwrobson · 2024-10-19T17:56:43Z

Thanks all for getting the ball rolling! I'm not sure how much @/jeremyzhangsq checks GitHub, so I'll go ahead and ping him again via email.

Kai-Striega · 2024-10-20T02:15:07Z

A quick couple of questions @lwasser /@sneakers-the-rat:

as previously discussed, my review has focused primarily on the Cython aspects of the project. Do you require me to review the remaining points on the review template too? I'm happy to do this, but I need to be aware that it's required.
I've look through the package and written plenty of notes but haven't typed them into the review template yet. As this is my first time reviewing, would we be able to organise a mentoring session to ensure I do a great job?

lwasser · 2024-10-20T18:55:41Z

@Kai-Striega 👋🏻 i just saw your discord dm's! So sorry it took me a few days to respond. I answered you there, please do go through the entire template. I'm happy to either provide a mentor session OR to get someone else in our community to do so as well. You're also in great hands here with @sneakers-the-rat so i don't want to step on any toes more than I already have!! so please just let me know how I can best support everyone here 🚀

Kai-Striega · 2024-10-26T06:38:43Z

@eliotwrobson please see my review below. I'm happy to expand/talk about any of the points I have made.

@sneakers-the-rat / @lwasser I haven't received a mentor yet, but had the available time to do it today. Please let me know if I am required to change/update anything.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README.
Installation instructions: for the development version of the package and any non-standard dependencies in README.
- Instructions provided, however I had difficulty following them. See comments below.
Vignette(s) demonstrating major functionality that runs successfully locally.
- Vignette in README and examples/ directory
Function Documentation: for all user-facing functions.
Examples for all user-facing functions.
- No examples for user facing documentation page
- No examples in models.pyi file with rest of documentation
Community guidelines including contribution guidelines in the README or CONTRIBUTING.
Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements
The package meets the readme requirements below:

Package has a README.md file in the root directory.

The README should include, from top to bottom:

The package name
Badges for:
- Continuous integration and test coverage,
- Docs building (if you have a documentation website),
- A repostatus.org badge,
- Python versions supported,
- Current package version (on PyPI / Conda).
  - PyPi supported and badged
  - conda not yet available, however this package seems appropriate for conda

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Short description of package goals.
Package installation instructions
Any additional setup required to use the package (authentication tokens, etc.)
Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
- Link to examples/ directory missing in README
Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
- Vignettes/examples are sufficient
Link to your documentation website.
If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
- Not provided, have not done a lit review to check if this is required.
Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider whether:

Package documentation is clear and easy to find and use.
The need for the package is clear
All functions have documentation and associated examples for use
- Functions/methods have documentation, but not examples in the documentation.
The package is easy to install
- See comments in reviewer comments about wheels and required C++ compiler

Functionality

For packages also submitting to JOSS

The package has an obvious research application according to JOSS's definition in their submission requirements.

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

A short summary describing the high-level functionality of the software
Authors: A list of authors with their affiliations
A statement of need clearly stating problems the software is designed to solve and its target audience.
References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 8

Review Comments

I have several review comments below. These are ordered by importance.

Requires a C++ compiler to build/lack of wheels

Wheels are provided only CPython 3.11 on Linux. This requires all other possible installers to compile the project from source. This is fairly simple, however it provides an extra step/requirements which is possible to avoid. My recommendation is to provide wheels for all officially supported Python versions on all major platforms.

Cython code generation

The project currently caches the file src/cynetdiff/models.cpp explicitly. I usually recommended not to ship the .c/.cpp files directly. There are several problems with shipping .c/.cpp files:

They are not guaranteed to be forward compatible with future CPython C-API or Cython versions. The CPython C-API seems to be changing fairly frequently at the moment.
It's possible that Cython generated code will not work on certain platforms e.g. arm if built for x86

This is echoed by the official Cython user guide

it is not recommended to include generated files in source distributions. Instead, require Cython at build-time to generate the C/C++ files

Pseudo random number generation

The package currently uses C's rand function. From models.pyx:

from libc.stdlib cimport rand, RAND_MAX
cdef double RAND_SCALE = 1.0 / RAND_MAX

cdef inline double next_rand() nogil:
    return rand() * RAND_SCALE

There are several issues with rand:

There are no guarantees about the algorithm, and therefore the quality of the randomness, used to generate random numbers
It uses global state which can lead to problems with multithreaded applications. While the current project doesn't use any parallelism, this could be significant if CyNetDiff-1 is implemented.

My recommendation is to adopt a more suitable random number generation engine. In the Cython world there are two ways I can recommend of doing this:

C++'s <random> header via the Cython bindings
Adopt NumPy's random number generation. This is officially documented and an example can be seen here

Note that (2) would also allow you to be compliant with SPEC-7.

Minor Cython related improvements

Note these points are minor and should not be considered blocking.

Use of `array.array`

The package uses array.array both for type hints and Cython typing in several places. Consider using Cython's memoryview instead. memoryviews provide the advantage of being available to anything that provides the Buffer Protocol, such as NumPy arrays, which I suspect are used significantly more frequently than array.arrays are by the packages target audience.

Missing potential bounds checks when running cythonize

[1/1] Cythonizing src/cynetdiff/models.pyx
warning: src/cynetdiff/models.pyx:97:51: Use boundscheck(False) for faster access
warning: src/cynetdiff/models.pyx:107:35: Use boundscheck(False) for faster access
warning: src/cynetdiff/models.pyx:135:39: Use boundscheck(False) for faster access
warning: src/cynetdiff/models.pyx:137:38: Use boundscheck(False) for faster access
warning: src/cynetdiff/models.pyx:141:34: Use boundscheck(False) for faster access
warning: src/cynetdiff/models.pyx:261:39: Use boundscheck(False) for faster access
warning: src/cynetdiff/models.pyx:263:45: Use boundscheck(False) for faster access
warning: src/cynetdiff/models.pyx:264:34: Use boundscheck(False) for faster access
warning: src/cynetdiff/models.pyx:268:38: Use boundscheck(False) for faster access
warning: src/cynetdiff/models.pyx:284:52: Use boundscheck(False) for faster access

It may be possible to improve performance by disabling bounds checking in these situations.
I have not verified if this actually improves performance in these particular circumstances

Review resizing `array.array`

    for i in range(m):
        influence_arr.append(1.0 / in_degrees[self.edges[i]])

This computations resizes influence_arr multiple times, leading to multiple memory allocations. It would be more efficient to use .extend rather than .append.

Use of name mangling

Starting a name with __ introduces name mangling in Python. In C/C++ these are reserved for the compilers internal keywords. Name mangling can make life difficult. CyNetDiff uses name mangling at several points (e.g. __activation_suceeds and __advance_model). I'd be careful with these.

eliotwrobson · 2024-10-27T06:03:19Z

@Kai-Striega thank you for your detailed comments! I have a couple of questions, but I'll try to get started addressing some of these items soon.

Requires a C++ compiler to build/lack of wheels

Do you know of any documentation that has discussion of how to do this? I agree it is much more convenient to not need a compiler installed, but I wasn't sure how to set this up.

Cython code generation

This makes sense to me, I could swear I read the opposite somewhere but I must have been mistaken.

Pseudo random number generation

Thank you for the references! I had actually thought about whether there was a standard way to handle the random number generation. I think I will follow option (2) and add the SPEC-7 compliance.

Use of array.array

Is there a recommended type annotation that expresses what works with the Cython memoryview? It would be nice to add this annotation along with support for numpy arrays.

Kai-Striega · 2024-10-27T06:22:53Z

@Kai-Striega thank you for your detailed comments! I have a couple of questions, but I'll try to get started addressing some of these items soon.

You're welcome.

Feel free to message if you have any more questions. I'll try answer your questions below.

Do you know of any documentation that has discussion of how to do this? I agree it is much more convenient to not need a compiler installed, but I wasn't sure how to set this up.

AFAIK, there isn't much. You could look at how SciPy builds their wheels for nightly builds. This should include all the info you need. I'm happy to help out with this, but might be a bit slow at the moment.

Note that SciPy's CI is far more complex than I would expect for this project.

This makes sense to me, I could swear I read the opposite somewhere but I must have been mistaken.

This isn't cut and dry. In the past, it was advised to do the opposite. This has changed over the last decade or so. Some people still prefer to include the .c/cpp files. However, this isn't recommended anymore.

Thank you for the references! I had actually thought about whether there was a standard way to handle the random number generation. I think I will follow option (2) and add the SPEC-7 compliance.

You're welcome :)

Is there a recommended type annotation that expresses what works with the Cython memoryview? It would be nice to add this annotation along with support for numpy arrays.

Unfortunately, not that I've come across. The problem is that Cython memoryviews support anything that supports the Buffer Protocol this could include custom objects.

My usual tactic is to use ArrayLike with a Python level wrapper, check if the value is scalar and then convert it to an array. This isn't great.

An alternative option would be to create your own Protocol with all the buffer protocol methods.

eliotwrobson · 2024-10-28T00:11:18Z

@Kai-Striega I've been able to implement most of the above changes without too many issues. I'm starting to take a look at the spec7 documentation and the examples you linked. Would it be better to provide client code with a wrapper class like the one you linked here that then gets passed to the model class that executes simulations? Or should client code directly pass in a numpy generator? From the timing code in the notebook you linked, it seems like it would be much faster to expose this wrapper class to client code, rather than having a numpy generator passed in.

Kai-Striega · 2024-10-28T00:20:33Z

I'm at work right now, and can't respond in the length I'd like. I'll edit this over the next few days to expand on my thoughts.

TLDR: Neither. SPEC-7 allows for a couple of different options.

Here is the officially recommended way

from collections.abc import Sequence
import numpy as np


SeedLike = int | np.integer | Sequence[int] | np.random.SeedSequence
RNGLike = np.random.Generator | np.random.BitGenerator


def my_func(*, rng: RNGLike | SeedLike | None = None):
    """My function summary.

    Parameters
    ----------
    rng : `numpy.random.Generator`, optional
        Pseudorandom number generator state. When `rng` is None, a new
        `numpy.random.Generator` is created using entropy from the
        operating system. Types other than `numpy.random.Generator` are
        passed to `numpy.random.default_rng` to instantiate a `Generator`.
    """
    rng = np.random.default_rng(rng)

    ...

I would use this as a wrapper for your Cython functions.

eliotwrobson · 2024-10-29T03:40:46Z

Short update that I've addressed most of the items from the review comments in a new release (v0.1.14) besides the rng seeding that is still being discussed and adding more builds. Not quite ready for a re-review till we address the above. Also emailed shiqi, and looks like he is traveling now but will be back to review in early November 👍🏽

sneakers-the-rat · 2024-10-29T04:15:13Z

Catching up - here are the things i see raised in the review:

making a checklist mostly for tidyiness sake, not trying to list out requirements - tell me which of these are taken care of/still outstanding

docs: examples
statement of need: we don't have a hard statement of need requirement, and i don't like novelty tests so it's not that, but it would be nice to have some statement for the pyos site that puts this package in context with other packages - "similar to x except y, etc." kind of stuff
packaging: wheels for multiple platforms (looks ongoing)
packaging: versioning generated artifacts (looks like this is done?)
function: use of c's rand function (looks like this is ongoing)
function: name mangling

Thanks for the detailed review :) I learned some things about cython here too.

If it would be helpful, you might consider opening up the remaining items as issues that link back to this one just to keep track of what's left todo/use issues as a cheap form of threading.

Re: building for multiple platforms, this sounds like something we would want to have in the packaging guide, so if you figure something out, maybe we put instructions there?

eliotwrobson · 2024-10-29T18:14:23Z

@sneakers-the-rat thanks for writing that out. I've removed all the instances of name mangling I could find, and you are correct that removing generated artifacts is done. For the statement of need, would you like something written that can go on the pyos site, or that also to be added to the library documentation? A simple description is that this package is a faster version of ndlib.

Kai-Striega · 2024-10-29T22:12:56Z

Re: building for multiple platforms, this sounds like something we would want to have in the packaging guide, so if you figure something out, maybe we put instructions there?

While this is noble, I think a unified approach to building will be hard, especially when extension modules are involved. It will depend on which languages are used, which platforms you build for, how complicated you want to get, and so on...

In particular, I feel that Poetry doesn't handle extension modules particularly well. There is SPIN which much of the scientific Python ecosystem seems to be standardising on, however this would require a complete rewrite of CyNetDiff's infrastructure.

A simple description is that this package is a faster version of ndlib.

If you're going to make claims that it's faster, it would be great to show it. I think this should be required if you make performance claims.

sneakers-the-rat · 2024-10-30T00:08:40Z

While this is noble, I think a unified approach to building will be hard

Fair! i was thinking of it as a starting point for a more general guide, but let's leave that aside so we don't muddy this review :) Moved that over here: pyOpenSci/python-package-guide#422

I think this should be required if you make performance claims.

Agreed that we should be able to show a benchmark to claim this. @eliotwrobson to answer your earlier question, yes this would be for the pyos website, where ideally the goal is that we build out the website so that it's possible to see a package and its 'neighbors,' "the family of tools that do things like x," and much like how open reviews on eLife allow readers to know the context of a paper and how people in that field think of it, it would be great (but not required) to provide that same context for software. "why this and not that? etc." If benchmarks are not in reach at the moment, a simple "see also: ndlib" would be fine

eliotwrobson · 2024-10-30T05:07:46Z

For the benchmark, some light benchmarks are included in this example notebook: https://github.com/eliotwrobson/CyNetDiff/blob/main/examples/visualizations.ipynb

I'm not sure if there's a more standard way of doing this? We also include some discussion of these benchmarks in our paper: https://www.vldb.org/pvldb/vol17/p4409-umrawal.pdf

Regarding building, although changing / documenting something to work with Poetry is probably not the most productive given what others have said, I think having some example related to this in the packaging guide would be extremely useful. I had a very hard time even figuring out how to get started writing a package that used Cython, and this would be extremely useful to me on another project I want to get started on before too long.

sneakers-the-rat · 2024-10-30T06:15:55Z

If you have published benchmarks already, then that's good enough for me for a claim like that - we can just cite them :)

Also am i reading this right, that an identical operation took 14s with ndlib and 0.067s with cynetdiff? that's.... certainly a perf improvement. well done.

Also i forgot to say earlier, i was asking for a context statement for the website but i also generally (i.e. outside of the purposes of the review) think they are good to have in documentation. If you think you've improved on something, I don't think it's disrespectful to compare yourself to it esp on something objective like 'the time it takes to perform an identical operation,' and it's helpful to everyone to have better linking between what's out there. One example i love is how the first two sections of the dask dataframes 'best practices' page are "use pandas" (and then clarifications about when you would want to use dask)

Kai-Striega · 2024-11-01T23:00:52Z

@eliotwrobson

For the benchmark, some light benchmarks are included in this example notebook: https://github.com/eliotwrobson/CyNetDiff/blob/main/examples/visualizations.ipynb

I'm happy with the benchmarks as-is, for the purpose of a formal review, I think there is room for improvement.

I'm not sure if there's a more standard way of doing this? We also include some discussion of these benchmarks in our paper: https://www.vldb.org/pvldb/vol17/p4409-umrawal.pdf

There are tools out there to do this for you. The one that seems to be favoured by NumPy/SciPy (where I do most of my FOSS stuff) is air speed velocity. There are several other tools also available, but I don't know too much about them.

Regarding building, although changing / documenting something to work with Poetry is probably not the most productive given what others have said, I think having some example related to this in the packaging guide would be extremely useful. I had a very hard time even figuring out how to get started writing a package that used Cython, and this would be extremely useful to me on another project I want to get started on before too long.

I've tried building Cython projects with Poetry and actually gave up. I found it very hard/confusing, so just getting your package to build should be considered an achievement. My go to for building Cython projects is now SPIN. This uses Meson which has an inbuilt Cython Module.

eliotwrobson · 2024-11-04T21:38:00Z

I've tried building Cython projects with Poetry and actually gave up. I found it very hard/confusing, so just getting your package to build should be considered an achievement. My go to for building Cython projects is now SPIN. This uses Meson which has an inbuilt Cython Module.

@sneakers-the-rat I think that even this simple recommendation alone would be a useful thing to add to the packaging guide 👍🏽

Regarding the remaining review items, I need to refactor another feature for this package to implement something for a paper I'm working on. I think the other reviewer is going to be able to take a look soon, so I'll wait for his comments before attacking the rest of these items.

sneakers-the-rat · 2024-11-04T23:20:47Z

Great - if it would be possible to open issues for each of the requested changes linking back to this issue (either of you), then I (or you) can make a list in the OP of completed/outstanding work to do so that the second reviewer can know whats already been raised/is being addressed and they can focus their review elsewhere, or comment on those ongoing issues!

eliotwrobson · 2024-11-05T19:22:10Z

@sneakers-the-rat just made associated issues and linked them back here (they are the top 4 in the current issues list).

jeremyzhangsq · 2024-11-16T15:27:39Z

@eliotwrobson check this out:D

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README.
Installation instructions: for the development version of the package and any non-standard dependencies in README.
Vignette(s) demonstrating major functionality that runs successfully locally.
Function Documentation: for all user-facing functions.
Examples for all user-facing functions.
Community guidelines including contribution guidelines in the README or CONTRIBUTING.
Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements
The package meets the readme requirements below:

Package has a [README.md](http://readme.md/) file in the root directory.

The README should include, from top to bottom:

The package name
Badges for:
- Continuous integration and test coverage,
- Docs building (if you have a documentation website),
- A [repostatus.org](https://www.repostatus.org/) badge,
- Python versions supported,
- Current package version (on PyPI / Conda).
  - Yes for PyPI

NOTE: If the README has many more badges, you might want to consider using a table for badges: [see this example](https://github.com/ropensci/drake). Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Short description of package goals.
Package installation instructions
Any additional setup required to use the package (authentication tokens, etc.)
Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the [README.md](http://readme.md/) file.
- Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
Link to your documentation website.
If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
- The authors have compared this package with NDlib, a wide-accepted library for network diffusion. This package is more efficient (See the PVLDB paper).
Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider whether:

Package documentation is clear and easy to find and use.
The need for the package is clear
All functions have documentation and associated examples for use
The package is easy to install

Functionality

For packages also submitting to JOSS

The package has an obvious research application according to JOSS's definition in their [submission requirements](http://joss.theoj.org/about#submission_requirements).

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching [JOSS's requirements](http://joss.theoj.org/about#paper_structure) with:

A short summary describing the high-level functionality of the software
Authors: A list of authors with their affiliations
A statement of need clearly stating problems the software is designed to solve and its target audience.
References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 6 hours

Review Comments

I have to admit that this project really surprised me. Previously, the IM community mainly used C++ for implementation for the sake of efficiency. However, it may cause inconvenience for real-world development and deployment (from my own experience). Thus, I believe this project will help our community a lot. Thanks to other reviewers’ comments, I can quickly catch up with the progress. Overall, this project meets my standards except two minors to improve.

I suggest showing some key evaluation results in the README, such as how fast CyNetDiff can be compared to NDlib.
To adapt other variants of IC and LT models in the future, I suggest to provide args about initial node thresholds and edge weights in

def linear_threshold(
    G: t.Union[nx.Graph, nx.DiGraph], seeds: t.Iterable[int], steps: int = 0
) -> t.List[t.List[int]]:

An example of the LT variant that requires customized thresholds and weights is this:

Nicola Barbieri and Francesco Bonchi. 2014. Influence maximization with viral
product design. In ICDM.

eliotwrobson · 2024-11-21T03:15:02Z

Thanks to both reviewers for the feedback! Have a few more urgent things to take care of but will try to get through some of these changes over the upcoming holiday 👍🏽

EDIT: A little behind schedule (whoops), but will get to everything mentioned here before too long.

eliotwrobson · 2024-12-14T20:20:37Z

Finally had a chance to go over things again, and I managed to get a PR for the most significant item so far (SPEC7 compliance): eliotwrobson/CyNetDiff#28

@Kai-Striega if you have time, could you glance at this PR briefly? No need to spend too much time looking through the internals, I just want to double check that we're doing everything as expected in the spec.

As for some of the other comments, I think that we have a sufficient amount of customization in the model creation for now (edge weights are accepted as arguments by both models, and adding custom thresholds for the linear threshold model won't give a performance boost over pure-python unless we adopt a custom model of randomly generating thresholds in each simulation). So the only remaining items are the following

More examples in the documentation.
ASV / something showing more current benchmarks.
Releasing wheels on multiple platforms (I probably won't do this because I can't figure out how).

Kai-Striega · 2024-12-14T21:29:34Z

@eliotwrobson I'll take a look, hopefully today.

Releasing wheels on multiple platforms (I probably won't do this because I can't figure out how).

I think having wheels for multiple platforms is important. If you don't supply them, the user has to figure out how to compile your project on their own. i.e. you're moving the burden of having to figure out how to compile it on their platform from yourself to each individual user.

I'm happy to help with this a bit if you need it, basically for NumPy/SciPy we build it as a CI job. You can see how NumPy does it by looking in the .github/workflows/ directory.

eliotwrobson · 2024-12-14T21:41:09Z

I'm happy to help with this a bit if you need it, basically for NumPy/SciPy we build it as a CI job. You can see how NumPy does it by looking in the .github/workflows/ directory.

@Kai-Striega Thank you! This would be greatly appreciated 🙏🏽 🙏🏽 I'd be open to switching the build system as well if that is necessary, I'm just currently not sure how to set up automated publishing with respect to this.

eliotwrobson added 0/pre-review-checks New Submission! labels Apr 22, 2024

github-project-automation bot added this to peer-review-status Apr 22, 2024

Batalex mentioned this issue Apr 24, 2024

Presubmission inquiry: CyNetDiff #165

Closed

14 tasks

Batalex removed the New Submission! label May 4, 2024

Batalex self-assigned this May 20, 2024

Batalex moved this to seeking-editor in peer-review-status May 31, 2024

Batalex added 0/seeking-editor and removed 0/pre-review-checks labels Jun 4, 2024

sneakers-the-rat self-assigned this Jul 2, 2024

cmarmo added 1/editor-assigned and removed 0/seeking-editor labels Jul 3, 2024

lwasser moved this from seeking-editor to under-review in peer-review-status Jul 3, 2024

This was referenced Oct 27, 2024

Responses to review comments eliotwrobson/CyNetDiff#22

Merged

Adopt SPEC-7 Standard eliotwrobson/CyNetDiff#21

Open

This was referenced Nov 5, 2024

Add examples for user facing functions eliotwrobson/CyNetDiff#25

Open

Add benchmarks eliotwrobson/CyNetDiff#26

Open

Provide wheels for other platforms eliotwrobson/CyNetDiff#27

Open

CyNetDiff Submission #175

CyNetDiff Submission #175

Comments

eliotwrobson commented Apr 22, 2024 • edited by sneakers-the-rat Loading

Code of Conduct & Commitment to Maintain Package

Description

Scope

Domain Specific

Community Partnerships

Technical checks

Publication Options

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

Please fill out our survey

Editor and Review Templates

Review issues

Footnotes

Batalex commented Apr 26, 2024 • edited Loading

Editor in Chief checks

Editor comments

eliotwrobson commented Apr 26, 2024

eliotwrobson commented Apr 29, 2024

Batalex commented May 2, 2024

eliotwrobson commented May 10, 2024

Batalex commented May 11, 2024

eliotwrobson commented May 11, 2024

eliotwrobson commented May 14, 2024

eliotwrobson commented Jun 28, 2024

Batalex commented Jun 29, 2024

eliotwrobson commented Jun 29, 2024

sneakers-the-rat commented Jun 30, 2024

eliotwrobson commented Jun 30, 2024

sneakers-the-rat commented Jul 2, 2024 • edited Loading

cmarmo commented Jul 3, 2024

eliotwrobson commented Jul 4, 2024

lwasser commented Oct 1, 2024

lwasser commented Oct 19, 2024

eliotwrobson commented Oct 19, 2024

Kai-Striega commented Oct 20, 2024 • edited Loading

lwasser commented Oct 20, 2024

Kai-Striega commented Oct 26, 2024 • edited Loading

Package Review

Documentation

Usability

Functionality

For packages also submitting to JOSS

Final approval (post-review)

Review Comments

Requires a C++ compiler to build/lack of wheels

Cython code generation

Pseudo random number generation

Minor Cython related improvements

Use of array.array

Missing potential bounds checks when running cythonize

Review resizing array.array

Use of name mangling

eliotwrobson commented Oct 27, 2024

Kai-Striega commented Oct 27, 2024

eliotwrobson commented Oct 28, 2024

Kai-Striega commented Oct 28, 2024

eliotwrobson commented Oct 29, 2024

sneakers-the-rat commented Oct 29, 2024

eliotwrobson commented Oct 29, 2024

Kai-Striega commented Oct 29, 2024 • edited Loading

sneakers-the-rat commented Oct 30, 2024

eliotwrobson commented Oct 30, 2024 • edited Loading

sneakers-the-rat commented Oct 30, 2024

Kai-Striega commented Nov 1, 2024 • edited Loading

eliotwrobson commented Nov 4, 2024

sneakers-the-rat commented Nov 4, 2024

eliotwrobson commented Nov 5, 2024

jeremyzhangsq commented Nov 16, 2024

Package Review

Documentation

Usability

Functionality

For packages also submitting to JOSS

Final approval (post-review)

Review Comments

eliotwrobson commented Nov 21, 2024 • edited Loading

eliotwrobson commented Dec 14, 2024

eliotwrobson commented Apr 22, 2024 •

edited by sneakers-the-rat

Loading

Batalex commented Apr 26, 2024 •

edited

Loading

sneakers-the-rat commented Jul 2, 2024 •

edited

Loading

Kai-Striega commented Oct 20, 2024 •

edited

Loading

Kai-Striega commented Oct 26, 2024 •

edited

Loading

Use of `array.array`

Review resizing `array.array`

Kai-Striega commented Oct 29, 2024 •

edited

Loading

eliotwrobson commented Oct 30, 2024 •

edited

Loading

Kai-Striega commented Nov 1, 2024 •

edited

Loading

eliotwrobson commented Nov 21, 2024 •

edited

Loading