Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Purge retired batches from the batch table #3148

Merged
merged 3 commits into from
May 20, 2019
Merged

Purge retired batches from the batch table #3148

merged 3 commits into from
May 20, 2019

Conversation

lfield
Copy link
Contributor

@lfield lfield commented May 15, 2019

Fixes #3005

The change adds the option --batches to delete all retired batches from the batch table. It is an alternative implementation to #3101 which does this in a separate php script.

@codecov
Copy link

codecov bot commented May 15, 2019

Codecov Report

Merging #3148 into master will decrease coverage by <.01%.
The diff coverage is 0%.

@@            Coverage Diff            @@
##           master   #3148      +/-   ##
=========================================
- Coverage    6.04%   6.04%   -0.01%     
=========================================
  Files          36      36              
  Lines        5967    5972       +5     
=========================================
  Hits          361     361              
- Misses       5606    5611       +5
Impacted Files Coverage Δ
db/boinc_db.h 0% <0%> (ø) ⬆️
db/boinc_db.cpp 0.4% <0%> (-0.01%) ⬇️
db/boinc_db_types.h 0% <0%> (ø) ⬆️

@TheAspens
Copy link
Member

@lfield - two questions for you.

Question 1
The db_purge utility has the option to write workunit and result files into files for archive purposes. Since the function to purge batch records is similar to that for purging workunit and result records, do you think that the archive function needs to be added?

Question 2
I want to confirm the meaning of the state: BATCH_STATE_RETIRED.

lib/common_defs.h says:

// values of batch.state
// see html/inc/common_defs.inc
//
#define BATCH_STATE_INIT            0
#define BATCH_STATE_IN_PROGRESS     1
#define BATCH_STATE_COMPLETE        2
    // "complete" means all workunits have either
    // a canonical result or an error
#define BATCH_STATE_ABORTED         3
#define BATCH_STATE_RETIRED         4
    // input/output files can be deleted,
    // result and workunit records can be purged.

The explanation below BATCH_STATE_RETIRED implies that once this state is reached - there are actions that should be taken. However, looking elsewhere, for example html/inc/submit_util.inc

function retire_batch($batch) {
    $wus = BoincWorkunit::enum("batch=$batch->id");
    $now = time();
    foreach ($wus as $wu) {
        $wu->update(
            "assimilate_state=".ASSIMILATE_DONE.", transition_time=$now"
        );
    }
    $batch->update("state=".BATCH_STATE_RETIRED);
}

It looks like the workunits are marked as assimilated first (so that they are processed for delete and purging). So it might be the case that it only reaches the BATCH_STATE_RETIRED state after all related workunits are assimilated.

Is the comment in lib/common_defs.h misleading? I.e should it state that BATCH_STATE_RETIRED means that all workunits are assimilated and will be deleted and purged? Or should there be a new state BATCH_STATE_READY_TO_PURGE?

Let me know your perspective

NOTE: WCG does not use the batch table, so I apologize if I misunderstand.

@davidpanderson
Copy link
Contributor

The meaning of "retired" can vary between projects,
but the basic idea is that there's a person (say, a nanoHUB user) who submitted the batch.
They see (through a web interface) a list of their batches.
They may want to keep these around for a while, e.g. to compare results or runtimes.
When they're not interested in a batch anymore, they retire it.
I.e. retiring a batch is a user action, not something that's done automatically.

But that's just one use case. Other projects might want to have completed
batches retired (and purged) automatically.

@ChristianBeer
Copy link
Member

Question 1
The db_purge utility has the option to write workunit and result files into files for archive purposes. Since the function to purge batch records is similar to that for purging workunit and result records, do you think that the archive function needs to be added?

Archiving results might be a good idea for projects that want to keep track of how many batches a user submitted. That might be needed for some kind of accounting system. Whether it is to bill the user or just create a diagram at the end of the year that shows the usage fractions per user or per batch.

@lfield
Copy link
Contributor Author

lfield commented May 20, 2019

Our use of the batch table is limited. An automated tool is submitting the jobs and we have one work unit per batch.

@TheAspens
In response to Question 1, we don't need the archive. If this is needed it can always be added at a later date once the requirement has been identified.

For question 2, I don't see a contradiction. In common_defs.h, BATCH_STATE_RETIRED means this is not required anymore so the files and db records can be deleted. In submit_util.inc the work units are set to ASSIMILATE_DONE so that they can be deleted.

@TheAspens
Copy link
Member

@lfield - I can agree with that being a feature that can be added later as needed.

All my questions are answered.

I've updated the documentation page here: https://boinc.berkeley.edu/trac/wiki/DbPurge

@TheAspens TheAspens merged commit 16aa138 into master May 20, 2019
@AenBleidd AenBleidd deleted the db_purge_batch branch May 21, 2019 20:58
@AenBleidd AenBleidd added this to the Server Release 1.2.0 milestone Aug 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

db_purge should also purge the batch table
5 participants