-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing file content in Solr #1043
Comments
Two possible strategies,
|
FWIW I was considering @seth-shaw-unlv's first strategy for stuff like (H)OCR and transcripts. Having that as a field you don't display but index is hands down the simplest way to go. Make an action to run in response to updates of a media and have it dump its contents into a field. For something that would require a transform I'm less certain as to how it would play out. You could transform in Drupal with Twig templates and json or make a microservice so you're no longer constrained by PHP's limited xml handling. It all depends on the use case, I guess. |
For full text text, the first approach can work. But, there are many modules/use cases out there that require transform (tei, oral history etc), thus having a way to support that would be helpful. |
@Natkeeran could you flesh out the requirement of transform a little bit? I am unclear on how you would use TEI in an Islandora 8 context. |
@whikloj
Though we may not need all the above use cases in 8.x, the question remains if we need a generic way to index media/datastreams in solr then make them available for search, faceting etc in Drupal. |
My concern is thinking in 7.x terms for 8. For instance (IMHO) media !== datastream, more media & file == datastream but even that seems a little wrong as a datastream in Fcrepo 3 only has one parent. In Drupal 8 we could have multiple content nodes pointing to the same file with separate media entities. Maybe we need some sort of special entity to store file information. These entities would reference a file and could contain the FITS type data. If more than one node references the file, this data is still only stored once and perhaps not as XML. Could we convert it to some usable JSON that would be easier to work with. This data is meant to be machine readable. I guess what I'm saying is that most people in the Islandora 7.x world have trouble with and then learn to hate the XSLTs. So I think it might be nice to dump them. But I'm good with XSLTs, so I can go either way. |
One of the powerful features of Islandora 7.x is its ability to index content in the datastreams. In Islandora 8, we can index field information in content types. However, there is no prescribed way to index file content (ex xml, json files). What is the approach that will be taken to support this feature in Islandora 8?
The text was updated successfully, but these errors were encountered: