Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permissions don't work after a complete reindex #402

Open
hackartisan opened this issue Feb 21, 2017 · 10 comments
Open

Permissions don't work after a complete reindex #402

hackartisan opened this issue Feb 21, 2017 · 10 comments
Labels

Comments

@hackartisan
Copy link
Contributor

reindex_everything must be invoked twice to get permissions into solr

@hackartisan hackartisan changed the title Permissions don't worry after a complete reindex Permissions don't work after a complete reindex Feb 21, 2017
@jcoyne
Copy link
Member

jcoyne commented Feb 21, 2017

This may be unfixable, because it's doing a solr query to get the permissions objects (first indexing pass) in order to write the permissions onto the actual object (second pass)

@barmintor
Copy link
Member

Is this because there's no way to query the repo by type, and the permissions objects point to the objects they govern?

@hackartisan
Copy link
Contributor Author

So the second time through it doesn't actually need to index permissions objects. The indexing job could add a step that queries the index itself instead of the repo, and only updates index on non-permissions objects. Does that sound right?

@jcoyne
Copy link
Member

jcoyne commented Feb 23, 2017

@HackMasterA only if you use the default indexers. You might have an indexer that uses a value out of the Fedora model to conditionally create a solr document. Thus, you are unable to derive the next solr document just from the last solr document.

@hackartisan
Copy link
Contributor Author

@jcoyne interesting; if you know of or could think of an example I'd be helpful. A conditional that would not add the object on the first pass but would add the object on the second pass?

I still think this could be a useful way to do it to keep from running the entire thing twice; you'd run it more like 1.5 times. But I guess if you had a case like the above, you'd be in an even worse situation than before because you'd still have to run the whole job again.

@jcoyne
Copy link
Member

jcoyne commented Feb 23, 2017

There's an ordering problem involved here too. Lets say we have these models:

class Library < ActiveFedora::Base
  has_many :books
end

class Book < ActiveFedora::Base
  belongs_to :library
  property :title
end

Now lets say the to_solr method for Library wants all the book titles:

 def to_solr(doc)
    super(doc)
    doc['book_titles'] = books.map(&:title)   
 end

This works fine, so long as we can guarantee the books are indexed before the Library. If the books are indexed after the Library, the library will have an incomplete set of titles in book_titles. Thus we need a two pass index. Once to build the relationships and a second time to do any of the other indexing.

@hackartisan
Copy link
Contributor Author

@jcoyne good point.

@hackartisan
Copy link
Contributor Author

@jcoyne but my proposed second pass would catch that, since it would reindex everything that isn't a permissions object.

@carolyncole
Copy link
Contributor

carolyncole commented Feb 24, 2017

@HackMasterA I added a solution to our sufia6 instance of ScholarSphere a while back for this very issue. Not sure if I am in love with it, but here is is: https://github.com/psu-stewardship/scholarsphere/blob/master/app/jobs/resolrize_job.rb

I used the id length to determine which objects were permission items (since those have very long ids) and then indexed those first.

It does not take into account @jcoyne's member issue though.

@hackartisan
Copy link
Contributor Author

@Cam156 thanks that's a good idea and in my case I think it could work. I do have nested attribute objects with the longer IDs, but it should be fine to index those before the works since the relationship is stored on the work side. I'll try this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Backlog
Development

No branches or pull requests

4 participants