Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing results of tasks at the end of a run #1159

Closed
gaow opened this issue Jan 5, 2019 · 12 comments
Closed

Missing results of tasks at the end of a run #1159

gaow opened this issue Jan 5, 2019 · 12 comments

Comments

@gaow
Copy link
Member

gaow commented Jan 5, 2019

At the end of #1156, it seems my tasks are all re-evaluated but the pipeline stopped with an error message claiming about 6K tasks "cannot be found". The error message is attached:

failed_to_get_results.txt

Not sure what this is about. The status of these tasks are missing if you check on the cluster. One possibliyti is I currently added a stop_if like:

[step]
input: ...
...
stop_if(is_empty(_input))

task: ...

and I do have a few thousands files empty. But it should be skipped before job submission? Other than that I'm not sure why things are missing, or what jobs exactly is it checking.

In any case the consequence is that the execution stopped, leaving behind a scary message of large number of missing tasks. There is no single failure reported though (all my R computations were done without an error at least according to SoS)

@BoPeng
Copy link
Contributor

BoPeng commented Jan 5, 2019

The message looks like some tasks from the master tasks are not returned for some reason. I will try to reproduce it.

@BoPeng
Copy link
Contributor

BoPeng commented Jan 5, 2019

There are 6283 failed IDs, does not roughly match the number of empty files?

@gaow
Copy link
Member Author

gaow commented Jan 5, 2019

There are 6283 failed IDs, does not roughly match the number of empty files?

I'm not sure. But I assume we can create a MWE to test it? I'll do it after dinner.

@gaow
Copy link
Member Author

gaow commented Jan 5, 2019

Indeed, here is a MWE:

[1]
output: [f'{x+1}.txt' for x in range(5)]
for i in range(5):
  name = f'{i+1}.txt'
  if i not in [0,1,2]:
    path(name).touch()
  else:
    with open(name, 'w') as f:
      f.write('test it')

[2]
input: group_by = 1
output: f'{_input:n}.out'
stop_if(_input.stat().st_size==0)
#task:
task: trunk_size = 80
bash: expand = True
  cat {_input} > {_output}

the outcome:

INFO: Running 1: 
INFO: output:   1.txt 2.txt... (5 items)
INFO: Running 2: 
ERROR: [2]: Failed to get results for tasks 355a8b8ff29f1aeb, c30ee3b182bf6851, 6b2b5f7015c811fd

The problem is still with trunk_size. If you remove it, there will not be an issue.

@BoPeng
Copy link
Contributor

BoPeng commented Jan 5, 2019

So basically stop_if leaves a bunch of holes in the trunk manager....

BoPeng pushed a commit that referenced this issue Jan 5, 2019
@BoPeng BoPeng closed this as completed Jan 5, 2019
@gaow
Copy link
Member Author

gaow commented Jan 5, 2019

Sorry let me reopen this ticket with an additional step in the MWE above, to demonstrate there are still some "holes" somewhere:

[1]
output: [f'{x+1}.txt' for x in range(5)]
for i in range(5):
  name = f'{i+1}.txt'
  if i not in [0,1,2]:
    path(name).touch()
  else:
    with open(name, 'w') as f:
      f.write('test it')

[2]
input: group_by = 1
output: f'{_input:n}.out'
stop_if(_input.stat().st_size==0)
bash: expand = True
  cat {_input} > {_output}

[3]
output: f'{_input:n}.csv'
_output.touch()

Notice here I did not use task. The error message:

[GW] sos run test.sos 
INFO: Running 1: 
INFO: output:   1.txt 2.txt... (5 items)
INFO: Running 2: 
INFO: output:   1.out 2.out... (3 items in 5 groups)
INFO: Running 3: 
ERROR: [3]: Failed to process step output (f'{_input:n}.csv'): Output .csv from substep 4 overlaps with output from a previous substep

i'm on current master

@gaow gaow reopened this Jan 5, 2019
@BoPeng
Copy link
Contributor

BoPeng commented Jan 5, 2019

This is because there are two empty groups so you have two .csv.

@gaow
Copy link
Member Author

gaow commented Jan 5, 2019

Yes I understand that ... this is another type of hole that we should take care of.

@BoPeng
Copy link
Contributor

BoPeng commented Jan 5, 2019

This is the consequence of stop_if resetting _output. We should either add something like skip_if or do something like

[3]
stop_if(not _input)

@gaow
Copy link
Member Author

gaow commented Jan 5, 2019

So we are back to discussing #1132 ?

@BoPeng
Copy link
Contributor

BoPeng commented Jan 5, 2019

I guess so because in any case collapsing step_output (thus changing the number of groups) does not sound like a good idea.

@gaow
Copy link
Member Author

gaow commented Jan 5, 2019

I added this line to my workflow for now:

input: [x for x in output_from(-1) if x], group_by = 1

it helps. I will close this ticket and add more info at #1132.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants