JOSS review: (OPTIONAL DEVELOPMENT!) Add parallel processing to speed up get_sites()? #33

kanishkan91 · 2023-07-30T20:56:33Z

The get_sites() function is used to derive site specific info. It seems to call an API function. It can take very long to get multiple sites at once especially if the user provides an age filter.

Authors can consider just adding a parallelization here since this is just a fetch function. This could be as simple as below,

` library(parallel)
library(data.table)
#create cores and save to a cluster object

list_of_sites <- c(A,B,C)

parLapply(cluster,
list_of_sites)->processed_sites

processed_sites <- rbindlist(processed sites)

`

Above is a crude example, but can work well. Again just an optional suggestion

openjournals/joss-reviews#5561

SimonGoring · 2023-08-09T00:57:43Z

With respect to parallelization, this is a great suggestion, however, at present the R package depends on a while loop to parse through the set of results (based on paging through offset/limit values).

Because of this, we send off smaller calls to get_sites() requesting 50 sites at a time until a result-set comes in empty (the case when the offset > the result set).

The Neotoma API does not, at present, provide a method for understanding the size of the result set, and so there's not a simple way to solve this problem across the set of queries passed to the API servers.

My sense is that there are ways around this (possibly using the future package?) but at present I don't think there's a simple solution.

That said, it might be easier to do this for the build_sites() methods (for example), and we'll look into it, but I'd like to put this option beyond the scope of the current review for JOSS.

kanishkan91 mentioned this issue Jul 30, 2023

[REVIEW]: neotoma2: An R package to access data from the Neotoma Paleoecology Database openjournals/joss-reviews#5561

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JOSS review: (OPTIONAL DEVELOPMENT!) Add parallel processing to speed up get_sites()? #33

JOSS review: (OPTIONAL DEVELOPMENT!) Add parallel processing to speed up get_sites()? #33

kanishkan91 commented Jul 30, 2023

SimonGoring commented Aug 9, 2023

JOSS review: (OPTIONAL DEVELOPMENT!) Add parallel processing to speed up get_sites()? #33

JOSS review: (OPTIONAL DEVELOPMENT!) Add parallel processing to speed up get_sites()? #33

Comments

kanishkan91 commented Jul 30, 2023

SimonGoring commented Aug 9, 2023