Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working version of Aip api #1441

Merged
merged 27 commits into from
Feb 5, 2020

Conversation

lagoan
Copy link
Contributor

@lagoan lagoan commented Jan 27, 2020

This PR includes the AIP API in working condition.

Changes will need to be made after some discussions with the metadata team regarding the predicates we use for each API file used to create the AIP.

Feedback is welcome from everyone.

Simple tests checking if access is managed correctly.

AIP V1 API follows the same pundit policies as the Items controller for
its show view.
Update the acts_as_rdfable gem
Add entry points for aip item aip aip
Add File attachment policies
Moved the logic from the items controller to the base class
AIPBaseController so we can serve the information required by PMPY
from the different objects we may serve from jupiter.

This commit has problems with obtaining the rdf annotations from files.
ActsAsRdfable.add_annotation_bindings! is not modifying the class to
include the rdf_annotations method, still need to figure out why that is
happening.
As part of the WIP strings were hardcoded in the AIP API controller to
define the namespaces that are included for each metadata file. These
harcoded strings have now been changed to use the strings from the
rdf-vocab gem and terms.yml file.

The harcoded strings represented namespaced that need to be repalced
after further research is done by the metadata team.
As part of the WIP the AIP API controller included hardcoded strings for
ontology namespaces that were treated as placeholders. These strings
have now been removed instead the namespaces from the rdf-vocab gem and
terms.yml file.

These last namespaces that were still being represented by hardcoded
strings will need to be replaced to reflect the changes moving away from
fedora. The new ontologies will be provided by the metadata team.
The AIP API will be the communication layer between Jupiter phase 2 an
the PushmiPullyu client which creates the AIP file for long term
archival. AIP was built using the information provided from fedora, as
we move away from fedora and onto PostgreSQL we need to provide an
alternative.

This API provides the information required for the AIP file in a simpler
manner avoiding fedora's overhead.
@lagoan lagoan requested a review from mbarnett January 27, 2020 17:18
@ualbertalib-bot
Copy link

1 Warning
⚠️ This PR is too big! Consider breaking it down into smaller PRs.

Generated by 🚫 Danger

A migration from previous work was still on local branch after it was
added to main development branch. Everything else should be OK.
Copy link
Contributor

@mbarnett mbarnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks solid overall. Concerns largely revolve around efficiency of continuing to have PMPY download files when it could just operate off of the SAN, tests violating encapsulation when making assumptions about ActiveStorage internals, and tests assuming the AIP API isn't private.

config/routes.rb Outdated Show resolved Hide resolved
test/controllers/aip/v1/aip_filesets_controller_test.rb Outdated Show resolved Hide resolved
test/controllers/aip/v1/aip_filesets_controller_test.rb Outdated Show resolved Hide resolved
test/controllers/aip/v1/aip_items_contoller_test.rb Outdated Show resolved Hide resolved

graph = get_n3_graph(url)
assert_response :success
assert_equal true, graph.graph?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here (and everywhere else that graph.graph? is being used as a test) this is probably a bit too cursory? It's probably worth testing that the graph actually contains what you expect, because any trivial graph will pass this test eg)

RDF::Graph.new.graph?
=> true

We probably shouldn't pass the test if some random bug leads us to successfully doing no work

Copy link
Contributor Author

@lagoan lagoan Jan 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are simple tests for now. As I wrote in the comment above:

  # Basic tests checking if response has n3 serialization.
  # This should be changed to verify the response content is correct.

We will be sure of the response content after some discussions with the metadata team. I need to make this clearer.

From @mbarnett
If this is coming soon we might as well wait to make those tests useful before merging this. If we're expecting it's going to take a while before we have an idea of what graph contents to expect we need to create a ticket so that this work doesn't get lost

@lagoan
This is currently in the works, will add a ticket so that it does not get lost

test/fixtures/active_storage_blobs.yml Outdated Show resolved Hide resolved
app/controllers/aip/v1/aip_controller.rb Outdated Show resolved Hide resolved
app/policies/item_policy.rb Show resolved Hide resolved
app/policies/thesis_policy.rb Show resolved Hide resolved
test/test_helper.rb Outdated Show resolved Hide resolved
Add model route constraint

A route constraint will bounce a request for an ubnknown model faster
than catchin it on the AIP API controller.

Change AIP API to be private

The AIP API will be closed off from the public and will only be used
by admins for maintanence and the PMPY client.

Rename model and asset references to entity

There was a discrepancy in the naming conventions for parameters,
methods, and variables throughout the AIP API that referenced objects
in the system.

References to these objects will be named entity in order to avoid
naming issues with rails commonly used words and OOP definitions.

Improve way helper module is loaded

Move module from concerns directory to a support directory and handle
it as a helper. Require it only when needed.

Add AIP API file path entry point

The PMPY client will not download files directly through the API,
instead it will fetch the file path and access the file directly on the
system.

This implementation provides the path for a single fileset, this can be
changed to provide the paths for all files to lower calls made to server

Add json schema validator to entity tests

Remove previous place holder test and now properly check for json
response format for Item and Thesis entities.

Add tests checking if files on entity actually exist
@lagoan lagoan closed this Jan 29, 2020
@lagoan lagoan mentioned this pull request Jan 30, 2020
2 tasks
Copy link
Contributor Author

@lagoan lagoan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks solid overall. Concerns largely revolve around efficiency of continuing to have PMPY download files when it could just operate off of the SAN, tests violating encapsulation when making assumptions about ActiveStorage internals, and tests assuming the AIP API isn't private.

Addressing your comments @mbarnett :

  • The AIP API now returns the paths for the entity's files
  • Tests do not violate encapsulation from ActiveStorage
  • AIP API is now private

config/routes.rb Outdated

# AIP API v1

entity_contraint_regex = /(?:#{Item.name.underscore.pluralize}|#{Thesis.name.underscore.pluralize})/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From @mbarnett :
let's encapsulate this into a proper constraint class. The logic is only going to get more complicated over time as new models are added, and it's better to keep the routes file simple & declarative.

@lagoan lagoan requested a review from mbarnett January 31, 2020 18:10
@lagoan lagoan requested a review from murny January 31, 2020 18:45
The files will be moved directly on the server. By adding the file uuid
we know the destination path directly with no need of extra calls or
processing.
Logic will get more complicated as new entities are introduced. This
class helps keep the routes.rb file simpler.
# AIP API v1

namespace :aip,
constraints: EntityConstraint.new,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work with the constraint 👍

:original_file
]

skip_before_action :verify_authenticity_token
Copy link
Contributor

@murny murny Feb 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain a bit about this? This is for CSRF tokens verification correct? Is there a reason we want to skip this check?

This check shouldn't run on any of these controller actions anyways since they are all get requests. GETs should be safe and idempotent. You can see the underlying code here:

https://github.com/rails/rails/blob/f33d52c95217212cbacc8d5e44b5a8e3cdc6f5b3/actionpack/lib/action_controller/metal/request_forgery_protection.rb#L233-L246

And you will notice it will never get ran, so we shouldn't need to skip this check manually here since its already skipped

I think? Unless I am missing something 😂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct on this. I just made the change.

As we are using the AIP API for getting information there is no need to
skip the verification for CSRF tokens. This API has not been designed to
modify Jupiter so this should not be a requirement.
Change from instantiation to fixtures when using entities on AIP API
tests. As we upgrade to Rails 6 all of the tests will need to use
fixtures so we can parallelize our test suite.
@lagoan lagoan changed the title Working version of Aip api WIP Working version of Aip api Feb 5, 2020
Using fixtures continued to be a problem under certain conditions. Move
back to instantiating classes on setup since this will not be a problem
upgrading to rails 6.
@lagoan lagoan changed the title WIP Working version of Aip api Working version of Aip api Feb 5, 2020
@lagoan
Copy link
Contributor Author

lagoan commented Feb 5, 2020

Fixtures continued to be a problem failing on some tests based on the seed used. Tests flip too much and it is something we need to address.

For now we can instantiate entities on the setup method as needed. We can look back on this problem once we move away from minitest-hooks.

Copy link
Contributor

@mbarnett mbarnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one tiny note on ways we can document the issue with fixtures, and a bit of a pitfall with opening files in Ruby.

}
end

def create_entity(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest that we should add a TODO: mentioning that this can be removed when we settle on a consistent way of defining test data and explaining the issues you ran into when trying to replace this with fixtures, so that someone doesn't try to clean this up and just run right into the issues you've already discovered.

Sidekiq::Testing.inline! do
entity.add_and_ingest_files(
files.map do |file|
File.open(file, 'r')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a non-obvious Ruby anti-pattern: you're actually leaking file handles (the OS never sees them as closed until the process ends).

The only guaranteed-safe way to use File.open is with a block:

files.map do |file|
  File.open(file, 'r') do |file_handle|
    entity.add_and_ingest_files([file_handle])
  end
end

file_handle is guaranteed to be closed after the block even in the presence of exceptions or underlying OS interrupts hitting the process.

(I know this is less efficient in that you're ingesting in two separate calls rather than one, which is an issue with the ingest API, but it's easier to stick to the safe pattern)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know this. Will make the change right now.

This commit includes the following changes:

- Fix problem with adding files to an entity that was leaking file
handles.
- Document a possible change in the future for AIP API entity tests with
how the entities are instantieted instead of using fixtures.
Copy link
Contributor

@mbarnett mbarnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Just to cross-link it here, the ticket capturing the remaining work on the n3 graph content tests is over at: #1448

# with the order the tests were run. The errors included:
# - No items on the database
# - Items not being found on database by id
# - Items defined to have files did not contain them
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Contributor

@murny murny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

gif

@lagoan
Copy link
Contributor Author

lagoan commented Feb 5, 2020

Yes!

@lagoan lagoan merged commit 7616920 into ualbertalib:integration_postmigration Feb 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants