Missing results of tasks at the end of a run #1159

gaow · 2019-01-05T00:29:23Z

At the end of #1156, it seems my tasks are all re-evaluated but the pipeline stopped with an error message claiming about 6K tasks "cannot be found". The error message is attached:

failed_to_get_results.txt

Not sure what this is about. The status of these tasks are missing if you check on the cluster. One possibliyti is I currently added a stop_if like:

[step]
input: ...
...
stop_if(is_empty(_input))

task: ...

and I do have a few thousands files empty. But it should be skipped before job submission? Other than that I'm not sure why things are missing, or what jobs exactly is it checking.

In any case the consequence is that the execution stopped, leaving behind a scary message of large number of missing tasks. There is no single failure reported though (all my R computations were done without an error at least according to SoS)

The text was updated successfully, but these errors were encountered:

BoPeng · 2019-01-05T01:15:24Z

The message looks like some tasks from the master tasks are not returned for some reason. I will try to reproduce it.

BoPeng · 2019-01-05T01:34:30Z

There are 6283 failed IDs, does not roughly match the number of empty files?

gaow · 2019-01-05T02:16:07Z

There are 6283 failed IDs, does not roughly match the number of empty files?

I'm not sure. But I assume we can create a MWE to test it? I'll do it after dinner.

gaow · 2019-01-05T03:07:37Z

Indeed, here is a MWE:

[1]
output: [f'{x+1}.txt' for x in range(5)]
for i in range(5):
  name = f'{i+1}.txt'
  if i not in [0,1,2]:
    path(name).touch()
  else:
    with open(name, 'w') as f:
      f.write('test it')

[2]
input: group_by = 1
output: f'{_input:n}.out'
stop_if(_input.stat().st_size==0)
#task:
task: trunk_size = 80
bash: expand = True
  cat {_input} > {_output}

the outcome:

INFO: Running 1: 
INFO: output:   1.txt 2.txt... (5 items)
INFO: Running 2: 
ERROR: [2]: Failed to get results for tasks 355a8b8ff29f1aeb, c30ee3b182bf6851, 6b2b5f7015c811fd

The problem is still with trunk_size. If you remove it, there will not be an issue.

BoPeng · 2019-01-05T03:28:26Z

So basically stop_if leaves a bunch of holes in the trunk manager....

gaow · 2019-01-05T04:08:24Z

Sorry let me reopen this ticket with an additional step in the MWE above, to demonstrate there are still some "holes" somewhere:

[1]
output: [f'{x+1}.txt' for x in range(5)]
for i in range(5):
  name = f'{i+1}.txt'
  if i not in [0,1,2]:
    path(name).touch()
  else:
    with open(name, 'w') as f:
      f.write('test it')

[2]
input: group_by = 1
output: f'{_input:n}.out'
stop_if(_input.stat().st_size==0)
bash: expand = True
  cat {_input} > {_output}

[3]
output: f'{_input:n}.csv'
_output.touch()

Notice here I did not use task. The error message:

[GW] sos run test.sos 
INFO: Running 1: 
INFO: output:   1.txt 2.txt... (5 items)
INFO: Running 2: 
INFO: output:   1.out 2.out... (3 items in 5 groups)
INFO: Running 3: 
ERROR: [3]: Failed to process step output (f'{_input:n}.csv'): Output .csv from substep 4 overlaps with output from a previous substep

i'm on current master

BoPeng · 2019-01-05T04:13:56Z

This is because there are two empty groups so you have two .csv.

gaow · 2019-01-05T04:15:58Z

Yes I understand that ... this is another type of hole that we should take care of.

BoPeng · 2019-01-05T04:18:13Z

This is the consequence of stop_if resetting _output. We should either add something like skip_if or do something like

[3]
stop_if(not _input)

gaow · 2019-01-05T04:28:33Z

So we are back to discussing #1132 ?

BoPeng · 2019-01-05T04:29:38Z

I guess so because in any case collapsing step_output (thus changing the number of groups) does not sound like a good idea.

gaow · 2019-01-05T16:29:07Z

I added this line to my workflow for now:

input: [x for x in output_from(-1) if x], group_by = 1

it helps. I will close this ticket and add more info at #1132.

BoPeng pushed a commit that referenced this issue Jan 5, 2019

Handling cases when there are holes in slots of trunk_size #1159

4bd9c5a

BoPeng pushed a commit that referenced this issue Jan 5, 2019

Test for #1159

a3016a3

BoPeng closed this as completed Jan 5, 2019

gaow reopened this Jan 5, 2019

gaow closed this as completed Jan 5, 2019

gaow mentioned this issue Jan 5, 2019

The stop/fail/warn_if series of actions #1132

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing results of tasks at the end of a run #1159

Missing results of tasks at the end of a run #1159

gaow commented Jan 5, 2019

BoPeng commented Jan 5, 2019

BoPeng commented Jan 5, 2019

gaow commented Jan 5, 2019

gaow commented Jan 5, 2019

BoPeng commented Jan 5, 2019

gaow commented Jan 5, 2019

BoPeng commented Jan 5, 2019

gaow commented Jan 5, 2019

BoPeng commented Jan 5, 2019 •

edited

Loading

gaow commented Jan 5, 2019

BoPeng commented Jan 5, 2019

gaow commented Jan 5, 2019

Missing results of tasks at the end of a run #1159

Missing results of tasks at the end of a run #1159

Comments

gaow commented Jan 5, 2019

BoPeng commented Jan 5, 2019

BoPeng commented Jan 5, 2019

gaow commented Jan 5, 2019

gaow commented Jan 5, 2019

BoPeng commented Jan 5, 2019

gaow commented Jan 5, 2019

BoPeng commented Jan 5, 2019

gaow commented Jan 5, 2019

BoPeng commented Jan 5, 2019 • edited Loading

gaow commented Jan 5, 2019

BoPeng commented Jan 5, 2019

gaow commented Jan 5, 2019

BoPeng commented Jan 5, 2019 •

edited

Loading