Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gripes about LagrangianParticles: construction, analysis, useability #3919

Open
glwagner opened this issue Nov 12, 2024 · 11 comments
Open

Gripes about LagrangianParticles: construction, analysis, useability #3919

glwagner opened this issue Nov 12, 2024 · 11 comments

Comments

@glwagner
Copy link
Member

I'm wondering if anyone thinks that there are some big improvements that can be made to LagrangianParticles. Here is a list of issues that come to mind:

  • it is inconvenient to initialize the locations x, y, z independently to instantiate LagrangianParticles(; x, y, z). This is error-prone; for example x, y, z have to all be the same length. I think I would rather have a utility that initializes some number of particles, and then set! the coordinates of all or selected particles.

  • I shouldn't have to provide the locations of particles in Flat directions -- it makes no sense. But right now I do have to write something like z = zeros(Np) for Np particles in a two-dimensional simulation...

  • Output is not friendly. First of all there is no utility like FieldTimeSeries. Second of all, the first thing I think of to do is extract a particular trajectory. But the way the output is constructed, its a little convoluted to extract the trajectory of a single particle. All of the positions at a particular moment in time are saved in a NamedTuple of arrays. So when I load one snapshot, I get all of the particles. To construct the entire trajectory of just one particle, I have to load all of the data! This seems problematic for huge datasets especially. Note that analysis is typically done per-trajectory. So if trying to batch an analysis of a huge dataset of trajectories, I would want to be able to load some subset of trajectories at a time. This is precluded by the current design.

Does anyone else have these issues or am I the only one? Are there issues that I missed?

@glwagner
Copy link
Member Author

Here's an example of the kind of convoluted code required to get a trajectory (2D):

trajectoriesfilename = "trajectories.jld2"
file = jldopen(trajectoriesfilename)
saveiters = parse.(Int, keys(file["timeseries/t"]))
data = [file["timeseries/particles/$iter"] for iter in saveiters]
close(file)

Nt = length(data)
function trajectory(data, p)
    Nt = length(data)
    x = [data[n].x[p] for n = 1:Nt]
    y = [data[n].y[p] for n = 1:Nt]
    return x, y
end

xp1, yp1 = trajectory(data, 1)

Maybe there is a better way though.

@ali-ramadhan
Copy link
Member

ali-ramadhan commented Nov 12, 2024

You're not alone!

I think I would rather have a utility that initializes some number of particles, and then set! the coordinates of all or selected particles.

I like this idea. It would be nice if it was also easy to add and remove particles. Right now it's actually easy to add particles with append! but removing particles on the GPU is non-trivial. Maybe it'll always require a copy of some kind but here's an example callback I use:

function remove_particles!(simulation)
    particles = simulation.model.particles
    properties = particles.properties

    z_mixed_layer = -50.0
    p_to_keep = properties.z .>= z_mixed_layer

    # Get the underlying arrays
    arrays = components(properties)

    # Compact each array
    for (key, arr) in pairs(arrays)
        kept_values = arr[p_to_keep]
        CUDA.resize!(arr, length(kept_values))
        copyto!(arr, kept_values)
    end

    return nothing
end

I shouldn't have to provide the locations of particles in Flat directions

True. I don't find it too annoying but it is not consistent with the rest of the user interface. Also not the most efficient use of memory.

Output is not friendly.

Also agree here.

I think NetCDF output for particles is easier to work with since it's a multi-dimensional array, so pulling out a trajectory is just indexing. Not sure how it's chunked (if at all) though so huge data might be a problem.

But it shouldn't be hard to add something nice for JLD2. ParticleTimeSeries?

@ali-ramadhan
Copy link
Member

ali-ramadhan commented Nov 12, 2024

On that note, OceanBioME.jl has a set! for particles: https://github.com/OceanBioME/OceanBioME.jl/blob/main/src/Particles/set.jl

@jagoosw
Copy link
Collaborator

jagoosw commented Nov 12, 2024

I agree that particles aren't the most usable. As well as set! that Ali mentioned, OceanBioME particles also default initialise themselves to have x, y, and z set to zero(n) and the user writes BiogeochemicalParticles(n, grid; ...) BiogeochemicalParticles(n; grid, ...) (will change all of the ;grids to grid; in OceanBioME soon).

So you just write e.g. :

n = 10

particles = BiogeochemicalParticles(n; grid, biogeochemistry = SomeParticleBGC())

set!(particles, x = [i for i in 1:n])

For output reading, I agree there is room for a much better interface. It might be less storage efficient but it might be easier to comprehend files if they were structured:

particle1/
    timeseries/
        x
        y
        z
        some_field
particle2/
    timeseries/
        x
        y
        z
        some_field
etc.

And then we could do something like FieldDataSet(path, particle_number)?

@glwagner
Copy link
Member Author

So if I just use particles = BiogeochemicalParticles(n; grid) with no biogeochemistry then this is better than Oceananigans' LagrangianParticles?

@jagoosw
Copy link
Collaborator

jagoosw commented Nov 13, 2024

So if I just use particles = BiogeochemicalParticles(n; grid) with no biogeochemistry then this is better than Oceananigans' LagrangianParticles?

I think that wouldn't work although I've never tried

@ali-ramadhan
Copy link
Member

Something I just thought about but I can see there being use cases for advecting multiple groups of particles (each for a different purpose).

One group of particles could be inactive Lagrangian particles just to visualize the flow, while another group could be biogeochemical particles from OceanBioME.jl (assuming OceanBioME/OceanBioME.jl#231 happens).

@glwagner
Copy link
Member Author

I feel like the fact that its worth it to implement a completely independent BiogeochemicalParticles with its own user interface indicates there are even deeper problems with LagrangianParticles. Why can't we always use LagrangianParticles?

@jagoosw
Copy link
Collaborator

jagoosw commented Nov 14, 2024

Something I just thought about but I can see there being use cases for advecting multiple groups of particles (each for a different purpose).

One group of particles could be inactive Lagrangian particles just to visualize the flow, while another group could be biogeochemical particles from OceanBioME.jl (assuming OceanBioME/OceanBioME.jl#231 happens).

I have thought before that the current particle interface (in both Oceananigans and OceanBioME) does make it quite challenging to have multiple different types of particles which I could also see being useful.

@jagoosw
Copy link
Collaborator

jagoosw commented Nov 14, 2024

I feel like the fact that its worth it to implement a completely independent BiogeochemicalParticles with its own user interface indicates there are even deeper problems with LagrangianParticles. Why can't we always use LagrangianParticles?

I think the reason I initially did this was to try and make the coupling between the particle and tracer bgcs work, but I think there are definitely other ways it could be done and the BiogeochemicalParticles go in the normal particles slot

@glwagner
Copy link
Member Author

The issue in the long run from an engineering perspective is that it encourages fragmentation of the code. So if some other package besides OceanBioME needs a similar kind of particle, they may have to reimplement it. Then if that project gets a bunch of funding and software developers, they could advance their own implementation, and OceanBioME doesn't reap the benefits of that. That to me is the idea of open source is to allow people to collaborate where there is a common interest. It does require some degree of centralization of design to support the collaboration though. And it can require more investment up front, vs getting something working quick-and-dirty.

That's actually the approach we took with biogeochemistry overall. It is possible to implement biogeochemistry as a forcing, and there's no need to have a property attatched to the models. But the purpose of adding that slot was to create an environment of collaboration between different implementations of biogeochemistry. So considering all of that work, the generation of an independent particles functionality indicates that this effort was only partially a success --- we did not succeed in motivating all parties to continue along the path of building shared capabilities in Oceananigans.

I'm wondering if we need to write something about this, or find a way to make sure that the vision is more clear. Do we think there is a need for this? Or perhaps it is too ambitious, and things like this are bound to happen no matter how much effort we put into trying to prevent it?

I think there is also scope to pursue quick prototyping rather than taking a more linear comprehensive path from the beginning, which is supported by Julia's design. But it seems like the danger of this is that the second step (comprehensive future-proof design) doesn't actually occur, because when the prototype is demonstrated, work does on, and technical debt is inherited. So a desire to do quick prototypes also must be coupled with strong discipline to also implement the long-term plan with the prototype has finished serving its purpose (ideally I think prototypes should be worked on for no more than a week or so).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants