Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

jsipfs add -Qr <directory-containing_1206 _files> doesn't work #2310

Closed
hhfeng opened this issue Jul 30, 2019 · 12 comments · Fixed by #2372
Closed

jsipfs add -Qr <directory-containing_1206 _files> doesn't work #2310

hhfeng opened this issue Jul 30, 2019 · 12 comments · Fixed by #2372
Labels
exp/expert Having worked on the specific codebase is important kind/bug A bug in existing code (including security flaws) P0 Critical: Tackled by core team ASAP status/ready Ready to be worked

Comments

@hhfeng
Copy link

hhfeng commented Jul 30, 2019

Version: 0.35.0
Platform: Linux ip-XXX-aws #42-Ubuntu SMP Fri May 17 13:47:10 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Type: Bug

Severity: High

Description: I have a folder containing 1200+ files and takes 4.1G disk space. When I did

jsipfs add -Qr <folder_name>

It returns a Hash QmedigS67VJr71NZs7U1DBhrWXmBFwEHRVTWHQ6f624PMm. But
jsipfs ls /ipfs/QmedigS67VJr71NZs7U1DBhrWXmBFwEHRVTWHQ6f624PMm
doesn't return anything (I expect it to return a list of 1200+ files).
Local gateway also doesn't work.

Steps to reproduce the error:

See above

@alanshaw alanshaw added kind/bug A bug in existing code (including security flaws) exp/expert Having worked on the specific codebase is important P0 Critical: Tackled by core team ASAP status/ready Ready to be worked labels Jul 30, 2019
@alanshaw
Copy link
Member

@hhfeng are you able to add directories with fewer files or are you seeing this issue for all directories you try to add?

The latest JS IPFS version is 0.37.0-rc.0 - would you mind installing it to verify if the problem still exists on this new version?

@hhfeng
Copy link
Author

hhfeng commented Jul 30, 2019 via email

@alanshaw
Copy link
Member

No, 0.37 is not out yet, but please try the RC!

@hhfeng
Copy link
Author

hhfeng commented Jul 30, 2019 via email

@hhfeng
Copy link
Author

hhfeng commented Aug 1, 2019

This is only js-ipfs issue. I tested using go-ipfs, and it worked fine.

@alanshaw
Copy link
Member

alanshaw commented Aug 1, 2019

@hhfeng npm i [email protected]

@achingbrain any chance you can look into this one?

@achingbrain
Copy link
Member

I can't replicate this. I've tried adding:

  • A folder with 1206 empty files
  • A folder with 1206 small files with contents from /dev/random
  • A folder with 1206 files with contents from /dev/random totalling about 5GB

All work as expected, producing the same hashes as go-ipfs and I can list the contents of them.

@hhfeng can you tell us more about your setup? Are you running a daemon? Is there anything out of the ordinary about the files you are adding - contents, names, etc?

I can't get the contents of QmedigS67VJr71NZs7U1DBhrWXmBFwEHRVTWHQ6f624PMm, I'm guessing there's nothing serving it?

@hhfeng
Copy link
Author

hhfeng commented Aug 6, 2019 via email

@alanshaw
Copy link
Member

alanshaw commented Aug 6, 2019

@hhfeng JS IPFS 0.37 is out now so installing should be a simple as npm i [email protected]. I'd really like to get to the bottom of this so any more information you can provide would be really beneficial. Thank you!

@hhfeng
Copy link
Author

hhfeng commented Aug 7, 2019 via email

@achingbrain
Copy link
Member

Linux sometimes displays Killed when the kernel has killed a process that's consuming too much memory - what kind of machine are you running this on?

@hhfeng
Copy link
Author

hhfeng commented Aug 7, 2019 via email

achingbrain added a commit that referenced this issue Aug 20, 2019
Given a `CID`, the `dag._recursiveGet` method returns a list of all
descendents of the node with the passed `CID`.  This can cause
enormous memory useage when importing large datasets.

Where this method is invoked the results are either a) disgarded or
b) used to calculate the `CID`s of the nodes which is then bad for memory
*and* CPU usage.

This PR removes the buffering and `CID` recalculating for a nice speedup
when adding large datasets.

fixes #2310
alanshaw pushed a commit that referenced this issue Aug 21, 2019
Given a `CID`, the `dag. _getRecursive` method returns a list of all descendents of the node with the passed `CID`.  This can cause enormous memory usage when importing large datasets.

Where this method is invoked the results are either a) disgarded or b) used to calculate the `CID`s of the nodes which is then bad for memory *and* CPU usage.

This PR removes the buffering and `CID` recalculating for a nice speedup when adding large datasets.

In my (non-representative, may need all the other unfinished async/iterator stuff) testing, importing folder of 4MB files totalling about 5GB files with content from `/dev/urandom` into a fresh repo with a daemon running in the background is now:

```
go-ipfs
real	3m43.741s
user	0m31.955s
sys	0m31.959s
```

```
js-ipfs
real	3m40.725s
user	0m7.352s
sys	0m4.489s
```

Which is nice.

fixes #2310
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
exp/expert Having worked on the specific codebase is important kind/bug A bug in existing code (including security flaws) P0 Critical: Tackled by core team ASAP status/ready Ready to be worked
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants