-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dataset] IGB-HOM dataset wrong number of edges #55
Comments
I will update the edge file on the s3 bucket with the local copy soon. It should have the right number of edges (3995777033). Thanks for bringing this to our attention. |
Thank you. After the dataset being updated, I should be able to download through the original link? |
Hi, if its urgent please use this file as a temporary solution. This is the last 268,681,203 edges. I will upload the |
Is the edges you are updating a simple remove self edge followed by adding self edge of each node? |
No these should be edges between different nodes. You can run this |
Thank you. Also, I assume het dataset has different paper__cites__paper edges from hom? Will you also update the igb-het paper__cites__paper? |
Both the datasets have the same paper nodes and paper_edges. The het dataset just has more types of nodes and types of edges. You can reuse the same edges for both datasets. |
Also, I don't know if this is a coincidence. but if you try to run |
I believe the number
Thanks for pointing it out so I could take a second look. You shouldn't need to use the extra edges as that shouldn't be part of the final dataset. Please use the |
Describe the bug
The num edges in paper is 3995777033 in paper but the actual number of edges I download is 3727095830.
To Reproduce
Below is the download command:
wget https://igb-public-awsopen.s3.amazonaws.com/IGBH/processed/paper__cites__paper/edge_index.npy
Then, "array = np.load("/path/to/dataset", mmap_mode='r+')" to load the downloaded file and check "arr.shape"
Expected behavior
The shape should be (3727095830, 2), which does not match 3995777033 reported in paper.
This is the link to the paper: https://arxiv.org/pdf/2302.13522
Screenshots
This is the IGB-HOM info table:
Software information:
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: