Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow UpdateCachedAppealsAttributesJob to complete and send warnings #15482

Merged
merged 6 commits into from
Oct 26, 2020

Conversation

yoomlam
Copy link
Contributor

@yoomlam yoomlam commented Oct 22, 2020

Resolves #15416

Description

If a BGS connection error occurs when getting the POA representative, log a warning and allow UpdateCachedAppealsAttributesJob to complete without failure and log the warnings to Slack.

The UpdateCachedAppealsAttributesJob runs many times per day. The warning logs should provide more info to determine if we want to automatically retry for certain situations.

There may be other errors caused by BGS, but we'll just rescue from Errno::ECONNRESET for now.

Acceptance Criteria

  • Recover from the SSL_connect error
  • Code compiles correctly and tests pass

Testing Plan

It's hard to test unless we know which appeal causes the BGS connection error.

In prod, monkey-patch class with code changes, skipping cache_ama_appeals and limiting to first few legacy appeals:

TEST_LIMIT=1500
class UpdateCachedAppealsAttributesJob 
  def perform
    RequestStore.store[:current_user] = User.system_user
    ama_appeals_start = Time.zone.now
    ### Ignore this for testing: cache_ama_appeals
    datadog_report_time_segment(segment: "cache_ama_appeals", start_time: ama_appeals_start)

    legacy_appeals_start = Time.zone.now
    cache_legacy_appeals
    datadog_report_time_segment(segment: "cache_legacy_appeals", start_time: legacy_appeals_start)

    datadog_report_runtime(metric_group_name: METRIC_GROUP_NAME)
  rescue StandardError => error
    log_error(@start_time, error)
  else
    puts "warnings: #{warning_msgs.count}"
    log_warning unless warning_msgs.empty?
  end

### overridding to limit number of appeals
  def cache_legacy_appeals
    # Avoid lazy evaluation bugs by immediately plucking all VACOLS IDs. Lazy evaluation of the LegacyAppeal.find(...)
    # was previously causing this code to insert legacy appeal attributes that corresponded to NULL ID fields.
    puts "cache_legacy_appeals"
    legacy_appeals = LegacyAppeal.includes(:available_hearing_locations)
      .where(id: open_appeals_from_tasks(LegacyAppeal.name)).limit(10) ### limit to a few
    all_vacols_ids = legacy_appeals.pluck(:vacols_id).flatten
    puts "vacols_ids count: #{all_vacols_ids.count}"

    cache_postgres_data_start = Time.zone.now
    cache_legacy_appeal_postgres_data(legacy_appeals)
    datadog_report_time_segment(segment: "cache_legacy_appeal_postgres_data", start_time: cache_postgres_data_start)

    ### cache_vacols_data_start = Time.zone.now
    ### cache_legacy_appeal_vacols_data(all_vacols_ids)
    ### datadog_report_time_segment(segment: "cache_legacy_appeal_vacols_data", start_time: cache_vacols_data_start)
  end

  def cache_legacy_appeal_postgres_data(legacy_appeals)
    # this transaction times out so let's try to do this in batches
### limit to a subset of appeals; may need to increase in order to get to an appeal that causes a BGS error
    legacy_appeals.first(TEST_LIMIT).in_groups_of(POSTGRES_BATCH_SIZE, false) do |batch_legacy_appeals| 
      values_to_cache = batch_legacy_appeals.map do |appeal| 
        puts "batch_legacy_appeals: #{appeal.id}"
        regional_office = RegionalOffice::CITIES[appeal.closest_regional_office]
        {
          vacols_id: appeal.vacols_id,
          appeal_id: appeal.id,
          appeal_type: LegacyAppeal.name,
          closest_regional_office_city: regional_office ? regional_office[:city] : COPY::UNKNOWN_REGIONAL_OFFICE,
          closest_regional_office_key: regional_office ? appeal.closest_regional_office : COPY::UNKNOWN_REGIONAL_OFFICE,
          docket_type: appeal.docket_name, # "legacy"
          power_of_attorney_name: poa_representative_name_for(appeal),
          suggested_hearing_location: appeal.suggested_hearing_location&.formatted_location
        }
      end

      puts "import values_to_cache: #{values_to_cache.count}"
      CachedAppeal.import values_to_cache, on_duplicate_key_update: { conflict_target: [:appeal_id, :appeal_type],
                                                                      columns: [
                                                                        :closest_regional_office_city,
                                                                        :closest_regional_office_key,
                                                                        :vacols_id,
                                                                        :docket_type,
                                                                        :power_of_attorney_name,
                                                                        :suggested_hearing_location
                                                                      ] }
      increment_appeal_count(batch_legacy_appeals.length, LegacyAppeal.name)
    end
  end

  # bypass PowerOfAttorney model completely and always prefer BGS cache
  def poa_representative_name_for(appeal)
    bgs_poa = fetch_bgs_power_of_attorney_by_file_number(appeal.veteran_file_number)
    # both representative_name calls can result in BGS connection error
    bgs_poa&.representative_name || appeal.representative_name
  rescue Errno::ECONNRESET => error
    puts "RESCUED error for: #{appeal.id}"
    warning_msgs << "#{appeal.class.name} #{appeal.id}: #{error}" if warning_msgs.count < 100
    nil
  end

  def warning_msgs
    @warning_msgs ||= []
  end

  def log_warning
    slack_msg = "[WARN] UpdateCachedAppealsAttributesJob first 100 warnings: \n#{warning_msgs.join("\n")}"
    slack_service.send_notification(slack_msg)
  end
end

# Then run it. Keep an eye on #appeals-job-alerts Slack channel
UpdateCachedAppealsAttributesJob.perform_now

You may see a bunch of ROLLBACKs due to BGS like the following, but the job should keep running.

[2020-10-22 12:13:57 -0400]   BgsPowerOfAttorney Load (1.6ms)  SELECT  "bgs_power_of_attorneys".* FROM "bgs_power_of_attorneys" WHERE "bgs_power_of_attorneys"."file_number" = $1 LIMIT $2 ...
[2020-10-22 12:13:57 -0400]    (1.0ms)  BEGIN
[2020-10-22 12:13:57 -0400]    (1.1ms)  ROLLBACK

and

[2020-10-22 12:16:28 -0400] FINISHED BGS: fetch poa for file number: ...
[2020-10-22 12:16:28 -0400]    (2.6ms)  ROLLBACK

@yoomlam yoomlam added Product: caseflow-queue Integration: BGS BGS integration Team: Echo 🐬 Source: Sentry Alert created because of a Sentry alert Priority: Medium Blocking issue w/workaround, or "second in" priority for new work. labels Oct 22, 2020
@yoomlam yoomlam self-assigned this Oct 22, 2020
@codeclimate
Copy link

codeclimate bot commented Oct 22, 2020

Code Climate has analyzed commit a394649 and detected 0 issues on this pull request.

View more on Code Climate.

@ferristseng
Copy link
Contributor

@yoomlam Just a heads up, we are making a big change to the UpdateCachedAppealsAttributesJob to move a lot of the logic into a Service class so we can reuse the code (see #15487). Looking at this, there will definitely be merge conflicts, which I'd be happy to handle!

Copy link
Contributor

@lomky lomky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! :shipit:

@yoomlam yoomlam added the Ready-to-Merge This PR is ready to be merged and will be picked up by va-bot to automatically merge to master label Oct 26, 2020
@va-bot va-bot merged commit cdbe69c into master Oct 26, 2020
@va-bot va-bot deleted the yoom/15416-handle-bgs-connection-error branch October 26, 2020 20:55
va-bot pushed a commit that referenced this pull request Oct 28, 2020
Resolves #15426 

### Description

- Moved `FetchHearingLocationsForVeteransJob` logic to `GeomatchService`
- Moved `UpdateCachedAppealsAttributesJob` logic to `CachedAppealService`
  - Refactored some existing logic to get rid of some CodeClimate warnings. The remaining ones I think we can ignore
- Add `Hearings::GeomatchAndCacheAppealJob`, and start it when completing `ChangeHearingRequestTypeTask`
- Update and add tests
- Resolves merge conflicts from #15482 and #15478

### Acceptance Criteria
- [x] When a user completes the "Convert hearing to virtual" action on a Travel Board hearing, it immediately shows up in the geo-matched RO queue

### Testing Plan
1. Login as BVASYELLOW
2. Go to http://localhost:3000/queue/appeals/1986897
3. Work the change appeal type task
4. Go to the schedule veteran page for St. Petersburg, and ensure the case shows up
va-bot pushed a commit that referenced this pull request Oct 30, 2020
…nue processing (#15537)

UpdateCachedAppealsAttributesJob [failed](https://dsva.slack.com/archives/CN3FQR4A1/p1604044974000100) [due to BGS `Savon::HTTPError: HTTP error (408): stream timeout`](https://sentry.prod.appeals.va.gov/department-of-veterans-affairs/caseflow/issues/12047/events/903946/?environment=production)

### Description
Rescue from the Savon::HTTPError and log it as a warning so that the job can complete.

### Acceptance Criteria
- [ ] Code compiles correctly

### Testing Plan
Can do something like the test plan described in #15482, but it's hard to replicate the error other than what's done in the rspec test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Integration: BGS BGS integration Priority: Medium Blocking issue w/workaround, or "second in" priority for new work. Product: caseflow-queue Ready-to-Merge This PR is ready to be merged and will be picked up by va-bot to automatically merge to master Source: Sentry Alert created because of a Sentry alert Team: Echo 🐬
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Recover from SSL_connect error in UpdateCachedAppealsAttributesJob
4 participants