Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mongodb file size limitations #26

Closed
thempel opened this issue Mar 20, 2017 · 3 comments · Fixed by #35
Closed

mongodb file size limitations #26

thempel opened this issue Mar 20, 2017 · 3 comments · Fixed by #35

Comments

@thempel
Copy link
Member

thempel commented Mar 20, 2017

I just came across a pymongo.errors.DocumentTooLarge error on a very small dataset, probably because I accidentally produced a huge transition matrix. We might need to solve this problem sooner or later, especially because the discrete trajectories become larger and larger with time...

@jhprinz
Copy link
Contributor

jhprinz commented Mar 21, 2017

Agreed, I need to think about the best way to do that.

  1. Store the complete file, but then you cannot search or do cool stuff like with the other objects in it.
  2. Break the large object down into seperate parts. That already works, but you need to know now to use subobjects in the Model you return. This also allows to access subparts easily. Example. We create a DiscretizedTrajectory objects and instead of writing an array of n_traj x length you write ntraj separate objects. and then only store references. Could still be too small...
  3. Run the picking of frames on the cluster and only return the new frames. That should also be possible already, but you need to write a function that does that.

@thempel
Copy link
Member Author

thempel commented Mar 21, 2017

Hmm. What about option 1 with additionally copying the file into the working directory on the user's machine? I assume it wouldn't be usable within the DB, but it could be loaded into the script/notebook the user is using as numpy array.
About option 2, I'm a bit sceptical because in my experience, there will be a lot of (potentially useless...) MSMs which we don't really need to store. So chopping-up everything and storing it in the DB might just artificially blow things up.

@jhprinz
Copy link
Contributor

jhprinz commented Mar 21, 2017

What about option 1 with additionally copying the file into the working directory on the user's machine?

That would be no problem I guess. Good idea. It will require some thinking about the implementation, you might even not have to write it to disk.

Actually I just checked. This is really super simple...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants