Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Single-Lock Access for Legal Deposit Restrictions #6

Closed
ikreymer opened this issue Feb 9, 2018 · 8 comments
Closed

Support Single-Lock Access for Legal Deposit Restrictions #6

ikreymer opened this issue Feb 9, 2018 · 8 comments

Comments

@ikreymer
Copy link
Contributor

ikreymer commented Feb 9, 2018

Add support in pywb to allow 'single session access' to every top-level page, but not embedded resources.

Current workflow to be supported:

  • A user visits, and is given a session that is stored using a browser cookie.
  • Each timestamp+URL combination the user visits is ‘locked’ to that session. If another user requests the same URL and the same timestamp, they should get a suitable error page.
  • There should be ‘logoff’ page that expires the cookie and releases the locks.
  • There should be a page for administrators that lists all the session, and the items locked by each session, and provides a link where a session can be logged off (the cookie expired and all locks released).
  • It is sufficient that the locks are held in RAM. e.g. using a HashMap. They do not need to be stored permanently.
  • All locks should expire overnight. This can be done within pywb or the service could provide a ‘release all locks’ URL that can be initiated from outside.

EDIT Some clarifications:

  • Not all UKWA deployments need this feature, so it should be configuration per deployment.
  • If it helps, the 'reference implementation' is here
  • The above was implemented as a servlet filter, and uses a series of checks as to whether to lock specific resources, e.g. embeds/transclusions are not locked
ikreymer added a commit that referenced this issue Feb 10, 2018
support single-use session lock per url/ts (#6):
- add redis based session locking
- embedded resources -- any modifier other than mp_ and ajax requests skipped
- keys used:
   * lock:{coll}/{ts}/{url} for each locked url pointing to the sesh_id
   * sesh:{sesh_id} for each session, containing list of locked urls
- cookie only saved when lock is added, refreshed if new session key
- sessions expire at end of day (rounded up to next day)
- full tests for session limiting and expiration

Docker: add Dockerfile for launching with UKWApp extensions,
add .coveragerc and .dockerignore files
ikreymer added a commit that referenced this issue Feb 10, 2018
@ikreymer
Copy link
Contributor Author

A preliminary version is now part of the integration test. A 403 Not Allowed message should be shown if a page is locked.

It is sufficient that the locks are held in RAM. e.g. using a HashMap. They do not need to be stored permanently.

Using Redis instead so locks can persist across pywb restarts and support multiple instances.
All locks expire in Redis automatically, though separate endpoint can also be added to flush all locks.

ikreymer added a commit to ukwa/pywb that referenced this issue Feb 10, 2018
- support extending with custom rewriterapp by setting REWRITER_APP_CLASS
- correctly default to 'config.yaml' if no config file specified
ikreymer added a commit to ukwa/pywb that referenced this issue Feb 10, 2018
… for ukwa/ukwa-pywb#6)

- 'ba_' - for <base> rewriting
- 'je_' - 'javascript-embed' default for client-side rewriting in wombat

better modifiers for css rewriting (server and client):
- 'ce_' - 'css-embed' for any url() embeds in CSS
- 'cs_' - for css stylesheet @import rewriting/other .css
ikreymer added a commit that referenced this issue Feb 10, 2018
- `/_locks` shows locks.html template listing all sessions and locks
- `/_locks/clear` - clears locks for current session, if any
- `/_locks/clear/<id>` - clears all locks for urls locked by session <id>
- `/_locks/clear_url/<url>` - clears lock for <url>
@ikreymer
Copy link
Contributor Author

Added to integration deployment.
http://localhost:8081/_locks can be used to view current list of locks, clear by session and by individual url

@anjackson
Copy link
Contributor

I added a few additional notes to this ticket.

Ideally, it would be good to have some kind of automated test running on the integration test docker images, but I'm not quite sure how best to achieve that.

@anjackson
Copy link
Contributor

Okay, I managed to use the Python Robot Framework to set up Selinium tests to check the lock. If it's possible to add a hook to clear all locks, that would make the testing a bit more robust.

@ikreymer
Copy link
Contributor Author

Ah ok, was going to suggest headless chome/ff and selenium.. haven't used robot before, seems like a nice integration.

Can add the clear all and update the tests. It's also possible to set the timeout interval via SESSION_LOCK_INTERVAL, can set it to be short to test auto-expiration.

ikreymer added a commit that referenced this issue Feb 12, 2018
- add clear/reset all endpoint /_locks/reset
- update tests

integration-tests:
- set expiry to 30 seconds
- increase wait time to ensure time for init
- update to check for only one lock being set -- embeds are not locked
- update tests to check expiry -- resources become unlocked after 30 seconds
@ikreymer
Copy link
Contributor Author

Updated the tests to ensure only one lock created and that it expires. May be good to test with more complex pages as well.

@ikreymer
Copy link
Contributor Author

The detection of embedded resources has also been improved, though not perfect. It's not possible to detect conclusively just on basis of the modifier or url placement, so I think currently system, errs on side of being considered an embedded resource.

One alternative/addition to looking at url modifier (im_, js_, cs_ etc..) is to assume an embedded resource, unless it has been loaded in the top frame, eg. set the lock not when resource is being loaded, but via a special message from the top frame.

ikreymer added a commit that referenced this issue Feb 13, 2018
…OCKS_USERNAME and LOCKS_PASSWORD env vars are set

update unit tests to test basic auth
integration test: add /_logout test to test instant clearing of locks
ikreymer added a commit that referenced this issue Feb 16, 2018
- test locks with multiple browsers
- ensure all acid tests pass
- improve naming, messaging when sleeping
- cleose browsers on startup, hopefully make test rerunnable
ratelimitapp: only clamp expire time to N sec boundary if using full day, otherwise expire after N seconds (for easier testing)
@ikreymer
Copy link
Contributor Author

Updated integration tests, see #8

Also supports basic auth for all lock admin ops if LOGIN_USER and LOGIN_PASSWORD is set.

Additional unit tests can be run via py.test -vv ./ukwa_pywb/test_sessionlimit.py

ikreymer added a commit that referenced this issue Feb 18, 2018
single-use-lock (#6) enabled in collections only if 'single-use-lock: true' is set in per-collection config
N0taN3rd pushed a commit to webrecorder/pywb that referenced this issue Sep 3, 2019
- support extending with custom rewriterapp by setting REWRITER_APP_CLASS
- correctly default to 'config.yaml' if no config file specified
N0taN3rd pushed a commit to webrecorder/pywb that referenced this issue Sep 3, 2019
… for ukwa/ukwa-pywb#6)

- 'ba_' - for <base> rewriting
- 'je_' - 'javascript-embed' default for client-side rewriting in wombat

better modifiers for css rewriting (server and client):
- 'ce_' - 'css-embed' for any url() embeds in CSS
- 'cs_' - for css stylesheet @import rewriting/other .css
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants