Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserving the backlog of items created in the last year #2566

Closed
1 task done
pgwillia opened this issue Oct 20, 2021 · 6 comments
Closed
1 task done

Preserving the backlog of items created in the last year #2566

pgwillia opened this issue Oct 20, 2021 · 6 comments
Assignees

Comments

@pgwillia
Copy link
Member

pgwillia commented Oct 20, 2021

  • @pgwillia is going to investigate if there is a queue for pmpy ready to go or how the back log will be resolved.

Kenton was asking if once we release our pushmi_pullyu fix if the workflow will just work again or if there is anything we have to do to get it to preserve the backlog? I'm going to take a look at what is currently in the production queue, and if the format is correct. I'll also check how we might kick off proper jobs for anything that hasn't been preserved if that turns out to be the case.

Originally posted by @pgwillia in #2524 (comment)

@pgwillia pgwillia self-assigned this Oct 20, 2021
@pgwillia
Copy link
Member Author

The queue will have just the ids on it
image
Which is the cause for our issues with pushmi_pullyu

$ pushmi_pullyu
Loading PushmiPullyu 2.0.1
Running in ruby 2.7.4p191 (2021-07-07 revision a21a3b7d23) [x86_64-linux]
Starting processing, hit Ctrl-C to stop
Traceback (most recent call last):
        8: from /home/pjenkins/.asdf/installs/ruby/2.7.4/bin/pushmi_pullyu:23:in `<main>'
        7: from /home/pjenkins/.asdf/installs/ruby/2.7.4/bin/pushmi_pullyu:23:in `load'
        6: from /home/pjenkins/.asdf/installs/ruby/2.7.4/lib/ruby/gems/2.7.0/gems/pushmi_pullyu-2.0.1/exe/pushmi_pullyu:7:in `<top (required)>'
        5: from /home/pjenkins/.asdf/installs/ruby/2.7.4/lib/ruby/gems/2.7.0/gems/pushmi_pullyu-2.0.1/lib/pushmi_pullyu/cli.rb:38:in `run'
        4: from /home/pjenkins/.asdf/installs/ruby/2.7.4/lib/ruby/gems/2.7.0/gems/pushmi_pullyu-2.0.1/lib/pushmi_pullyu/cli.rb:54:in `start_server'
        3: from /home/pjenkins/.asdf/installs/ruby/2.7.4/lib/ruby/gems/2.7.0/gems/pushmi_pullyu-2.0.1/lib/pushmi_pullyu/cli.rb:214:in `run_tick_loop'
        2: from /home/pjenkins/.asdf/installs/ruby/2.7.4/lib/ruby/gems/2.7.0/gems/pushmi_pullyu-2.0.1/lib/pushmi_pullyu/cli.rb:185:in `run_preservation_cycle'
        1: from /home/pjenkins/.asdf/installs/ruby/2.7.4/lib/ruby/gems/2.7.0/gems/json-2.6.0/lib/json/common.rb:216:in `parse'
/home/pjenkins/.asdf/installs/ruby/2.7.4/lib/ruby/gems/2.7.0/gems/json-2.6.0/lib/json/common.rb:216:in `parse': 859: unexpected token at 'b4b61a-bdc0-4e27-9970-e754b789615c' (JSON::ParserError)

The reason for this is because pushmi_pullyu is expecting both the type and uuid
https://github.com/ualbertalib/pushmi_pullyu/blob/c0bccd20ed049afa3d26de0b9896f87855a053c2/lib/pushmi_pullyu/cli.rb#L184-L190
Which would look like this (from our master branch)
image

Each time pushmi_pullyu is started and fails the top item is removed from the queue. We cannot use the existing queue to reliably clear the backlog. In fact we must ensure it is empty before we start pushmi_pullyu.

@pgwillia
Copy link
Member Author

We do have a task that will preserve all

desc 'queue all items and theses in the system for preservation'
task :preserve_all_items_and_theses, [:batch_size] => :environment do |_, args|
desired_batch_size = args.batch_size.to_i ||= 1000
puts 'Adding all Items and Theses to preservation queue...'
Item.find_each(batch_size: desired_batch_size) { |item| item.tap(&:preserve) }
Thesis.find_each(batch_size: desired_batch_size) { |item| item.tap(&:preserve) }
puts 'All Items and Theses have been added to preservation queue!'
end

But this made it a no op: e363d15#diff-18a0ef032f862c5c6e3b8c95e7326c439bb62039b96dd19b57dbdc786b8dbcad

and the batch size logic is not working. So this needs some improvement, but could be useful.

@pgwillia
Copy link
Member Author

If we didn't want to push all items, we could use an existing scope:
Item.updated_on_or_after(DATE_THAT_PRESERVATION_STOPPED_WORKING).each {|item| item.push_entity_for_preservation}

@pgwillia
Copy link
Member Author

To delete the zset/queue

queue_name = Rails.application.secrets.preservation_queue_name
$queue ||= ConnectionPool.new(size: 1, timeout: 5) { Redis.current }
$queue.with {|connection| connection.del queue_name }

@pgwillia
Copy link
Member Author

@henryzhang87 identified 2021-01-19 as the date of the last preservation event.

We could run
bundle exec rails jupiter:preserve_all_items_and_theses[2021-01-19] to queue all created or updated items on or after that date.

@pgwillia
Copy link
Member Author

Confirmed to work with era-test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant