-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndexError: list index out of range (running current master
branch) when running poppunk_assign
#273
Comments
Our error message isn't working properly here and should be giving you the list of missing samples, which we need to fix. But ultimately this is being caused by a mismatch in samples between clusters, network and database. Can you check the number in each of these inputs? You can also use |
Thanks for getting back to me John. The output of
There is 42483 lines in There is 42483 lines in And according to the above error output of PopPUNK, it seems the network has 42482 samples as well:
As I did not create this version of the database, I am wondering is there a way to deduce whether it is a database issue, or a general PopPUNK bug? |
Thanks for checking Harry.
It's hard to say I think – of course in general assignment works in this version (it's in the CI) – I suspect it's a database/files issue, but that's not to say that PopPUNK isn't at fault for picking up a bad default or being unhelpful with the error. Any idea where the 42209 in the original error is coming from? How many lines does the |
It seems you have a good hunch on the root cause. The line count of I have checked with Steph and she ran the command (no
It seems in this case, Should I ask her to rerun the command as
Or is there a simpler fix? If I understand the issue correctly, I am wondering why |
I think the command to try should be:
Let's see if that works. I think our expected use is that everything for the DB is in one folder and not versioned like this, so that when you point to the DB directory it has everything in it that's needed. The easier thing to do is just copy the fit from v6 over to the v8 directory, but I appreciate which files are needed isn't clear. If this is indeed the issue I will try and update the documentation to be clearer. |
Thank you John! I misunderstood the usage of I can confirm the following command can be completed successfully
So if I want to have everything in a single database directory and forgo the need to use If we want to maintain versioned database, the best way forward seems to be copy the database directory, and in the newly created directory run the below command?
On another note, I notice apart from the files listed in https://poppunk.readthedocs.io/en/latest/query_assignment.html#downloading-a-database, there are a few extra files in the database directory that are not mentioned in the doc:
Are they neccessary? As they double the size of the database directory (there are non-
|
@johnlees |
It would be the
That's one option. You could also output to a new directory without overwrite
I think as long as you keep the |
Yes I think this will be most of the cost. Have you looked at the |
Thanks for the info! I will try to consolidate the files into a single directory and/or removing unnecessary files in the published database for assignment, then report back to you. |
It seems the full description of I think it is fine for now, I would just run PopPUNK for all samples at once and split the results. Will only need to reconsider this approach if someone complains about the stability of PopPUNK (for now, it seems |
@johnlees Based on
I tried to replace the original I notice that when I point to
This might be related to when
subsequently, the replaced
therefore, simply replacing two I would be grateful if you could suggest what I should do next. As by replacing files one by one, I am afraid I might revert everything back to |
I think I need to do two things here (in #279):
|
Sounds good to me. Thanks John! |
Sorry, there was a difficult issue with one of the dependencies which took a while to resolve. I think i273 should be ok, but I wanted to run a final test before merging it. I won't have time to do that for a while, so I would go ahead and use that branch, it should fix your issue (and if it doesn't then I'll know what to further change!) |
Sure. I will test it out and let you know how it goes. Thanks! |
Sadly, it is still not working properly. I have noticed there is another line blocking the However, even with that fix in place, Still leads to the following error:
It also seems to render the partially overwritten database unusable. If I only remove the But then original issue remains, without Please let me know if you have any idea on this @johnlees. Thanks! |
I think the solution should be the following:
@nickjcroucher does this seem reasonable to you? |
That sounds great John, if you and Nick agree on that and push out an update, I am more than happy to test again. |
@johnlees - yes, I agree, that behaviour makes sense - should we accompany the database updating with some kind of versioning as well? Or have some other way of identifying/archiving the "parent" database, and replacing it with the latest one? Just thinking about continuous updating as a backend to online systems, but I'm sure there are better thought through approaches that already exist. |
@johnlees I am just wondering if you have an estimation on when the new update process will be available? Thanks for all the amazing work so far! |
Realistically end of this year at best. We could talk about this (in person) if we need it urgently |
Thanks for the update! I will check with my team regarding its urgency. |
This was fixed in #290 |
Versions
Current master branch e555990 / 2.6.1
Command used and output returned
python3 PopPUNK/poppunk_assign-runner.py --db ../../sl28/PopPUNK2/GPS_v8 --query qfile.txt --external-clustering ../../sl28/PopPUNK2/GPS_v8_external_clusters.csv --output output --threads 32
Describe the bug
Crash after network loading
The text was updated successfully, but these errors were encountered: