-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add logic for DIRECTORY_SEPARATOR to account for "/" or "\" in reposi… #432
base: master
Are you sure you want to change the base?
Conversation
…tory unique identifier
@@ -133,6 +133,8 @@ public function writeMetadataFile($metadata, $path, $overwrite = true) | |||
public function normalizeFilename($string) | |||
{ | |||
$string = urldecode($string); | |||
$pattern = '#'.DIRECTORY_SEPARATOR.'#'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We run MIK on Windows, and since DIRECTORY_SEPARATOR on Windows is \
, we've been burned using DIRECTORY_SEPARATOR unless the pattern is run through PHP's preg_quote()
, since \
is also an escape character in regexes. So I recommend, and have used, something like this:
$pattern = preg_quote('#' . DIRECTORY_SEPARATOR . '#');
I can test with that code if you want, report back, and if it works on Windows and Linux, I'll merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit more end-of-the-day hacking around has produced:
/**
* Convert %3A (:) and other characters in filenames into underscores (_).
*/
public function normalizeFilename($string)
{
$string = urldecode($string);
$bad_chars = [
':' => '_',
DIRECTORY_SEPARATOR => '_',
];
$string = str_replace(array_keys($bad_chars), $bad_chars, $string);
return $string;
}
This replaces :
in my local tests. Could you test to see if it catches the directory separator? The benefit of str_replace()
is that you don't need to worry about nasty cross-platform issues (I hope). Also adding new bad characters is pretty easy.
It's awesome you're using OAI toolchains! I'd love to know more. |
@flummingbird WRT you comments about testing, I'll update the docs in response. Thanks for the feedback. |
Also, we'd prefer if you open an issue describing the problem you've found before opening a PR. No sweat for this PR though, this is a good catch, thanks for finding it. |
not sure what the WRT stands for... Most of the credit should go to @jpeak5 |
Sorry, 'with regard to.' |
@flummingbird @jpeak5 looking at this a bit more closely, I wonder if we should define a list of potentially problematic characters to replace with _. For example, $pattern = preg_quote('#(' . DIRECTORY_SEPARATOR . '|:)#');
$string = preg_replace($pattern, '_', $string); Would this address the situation you encountered? |
indeed, we will give it a whirl in the am; thx Mark
Sent from my iPhone
On Jul 19, 2017, at 6:24 PM, Mark Jordan <[email protected]<mailto:[email protected]>> wrote:
@mjordan commented on this pull request.
________________________________
In src/writers/Oaipmh.php<#432 (comment)>:
@@ -133,6 +133,8 @@ public function writeMetadataFile($metadata, $path, $overwrite = true)
public function normalizeFilename($string)
{
$string = urldecode($string);
+ $pattern = '#'.DIRECTORY_SEPARATOR.'#';
A bit more end-of-the-day hacking around has produced:
/**
* Convert %3A (:) and other characters in filenames into underscores (_).
*/
public function normalizeFilename($string)
{
$string = urldecode($string);
$bad_chars = [
':' => '_',
DIRECTORY_SEPARATOR => '_',
];
$string = str_replace(array_keys($bad_chars), $bad_chars, $string);
return $string;
}
This replaces : in my local tests. Could you test to see if it catches the directory separator? The benefit of str_replace() is that you don't need to worry about nasty cross-platform issues (I hope).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#432 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABEMrF3EQr7-yzWfH6SnbUDKwVnqvo-6ks5sPpA5gaJpZM4OdH6P>.
|
Mark, Your end-of-day hack works just fine in our scenario. I'm curious though about the cross-platform issues you mention with
Any rate- I'd like to find out more about this OAI workflow, specifically, how, once you have the records, you ingest such that the Islandora items present thumbnails and links back to the original. I'm finally able next week to put some time into looking closely at this part of mik... We plan to harvest collections from only a single institution to start with, but I'm pretty interested in knowing more about how you or others are leveraging OAI. Alright- thx as always! jrp |
@jpeak5 Thanks for testing. The cross-platform issue we encountered with On to OAI... MIK's OAI toolchains spit out ordinary Islandora ingest packages, although usually with DC instead of MODS XML files. The OAI toolchains don't directly influence the behavior of the resulting Islandora objects in any special way. If I am understanding what you are describing, if you want the thumbnails of the objects (in collection browse lists or search result sets) to lead users offsite to the originating URLs, I think you'd need to do some stuff within Islandora itself to get that to work. That said, you could probably harvest thumbnails from the source objects to reuse in Islandora if you wanted, using a post-write hook script or a custom OAI filegetter. I'd have to know more about what you'd expect the Islandora objects corresponding to the harvested remote objects would be like to speculate on other solutions. |
@jpeak5 sorry to leave this PR open so long, but now that it sounds like you guys are fully back in operation, and that the OAI angle was mentioned in Mike Waugh's email announcing the launch of http://louisianadigitallibrary.org/, we thought we'd ping you. We'd be interested in knowing more about how you integrated the data you retrieved via OAI into the site. |
Also, mind if we link to a few of the collections on http://louisianadigitallibrary.org/ under the "Islandora content that has been prepared using MIK" section of the MIK README? |
Hi Mark, We've definitely been busy - I'm sorry to say that I'd forgotten all about this! Definitely no problem to link to the collections. We are still working on problems with harvesting the last 4 of about 80 Tulane collections, so this story is by no means done (and we have much code cleanup and at least one more PR), but here's a brief sketch of what we've done: We have used mik/oai to pull MODS + TN for all of the collections and items in Tulane University's standalone Islandora (at their request, of course). We pulled MODS because it was expedient - we needed a field (containing the source URL) that seemed to be getting stripped out of the DC. We injected a string into MODS.abstract containing a link back to the item in Tulane's site: Then, with some spaghetti, sorely in need of refactor, we alter the display of harvested items in our solr search results: https://github.com/lsulibraries/islandora_namespace_homepage/blob/master/islandora_namespace_homepage.module#L346 When we finish with this round of OAI, I'm hoping that we will fix up our code, PR, and then do some better documentation of what we've done (and how we intend to keep our Tulane items in sync with them...)... jrp |
Cool - so the thumbnail links back to the object at Tulane. What content model are those objects in your Islandora? |
How you've used MIK to harvest the OAI, and integrated the resulting objects into Islandora, would make a super-mega-awesome Migration Guide for the MIK wiki... 😀, don't you think so @MarcusBarnes? |
Agreed: super-mega-awesome. :) |
Hey Mark! |
@flummingbird neat. I just opened a PR to add links to sample collections. Nice work guys! |
@flummingbird @jpeak5 inspired by your use of OAI and redirecting to a remote object, I've developed a solution pack that does a lot of the same things but treats the local objects as a separate content model. It's in our local gitlab at https://git.lib.sfu.ca/mjordan/islandora_solution_pack_remote_resource. Sorry I didn't let you know back when I was working on it, I got distracted..... anyway, I'd welcome any comments you have before publicizing it more widely. There may be a use case for it here in British Columbia, maybe even within my own library since we still have a couple bespoke repository platforms that we'll probably never fully migrate into our Islandora. |
Hey Marcus,
We're currently auditing our OAI-PMH resources. and we were looking at
trying out this content type.
I'm a bit flummoxed. I've been trying to use the '--url_list' flag, in the
remote_resources_bach with very little luck. It clams that it has created
an ingest set, but the set is always 0 items.
If I can get a proof of concept working for this, I think we may be
interested in using the --url_list to replace our whole oai-pmh workflow.
It looks like you and Mark are working on it today... I'll keep trying the
module out. but if you have some time I I'd love some pointers. I tried
finding the project while logged into git lab, I guess it's hidden...
Let me know
…-Will
On Wed, Oct 11, 2017 at 12:29 PM, Mark Jordan ***@***.***> wrote:
@flummingbird <https://github.com/flummingbird> @jpeak5
<https://github.com/jpeak5> inspired by your use of OAI and redirecting
to a remote object, I've developed a solution pack that does a lot of the
same things but treats the local objects as a separate content model. It's
in our local gitlab at https://git.lib.sfu.ca/mjordan/islandora_solution_
pack_remote_resource. Sorry I didn't let you know back when I was working
on it, I got distracted..... anyway, I'd welcome any comments you have.
There may be a use case for it here in British Columbia, maybe even within
my own library since we still have a couple bespoke repository platforms
that we'll probably never fully migrate into our Islandora.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#432 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADe_3xvTsyHMjpXYwFvkf8g7nPIfxL9Hks5srPsDgaJpZM4OdH6P>
.
|
@flummingbird yes, I've been working on this lately to get ready for our local code4lib meetup, where I'm going to do a lightning talk on it. I've ripped out the In this new workflow, you'd run |
@flummingbird I just spent some time fixing up the CSV tool so that you can give it a spin: https://git.lib.sfu.ca/mjordan/islandora_remote_resource_batch_tools/tree/master/csv As I mentioned above, this tool is intended to be a more robust replacement for the |
Latest...
I've also added a GUI for loading a zip containing the data files, and resolved a few issues. I'll be porting over the remaining issues from gitlab to the GitHub repo soon. I'll be applying all updates to the GitHub versions and deprecating the gitlab repos from now on. |
OK, all ported to GitHub other than my notes for the code4lib LT. |
awesome! I'm going to try it out !!! Thanks a ton Mark!
…-Will
On Wed, Nov 8, 2017 at 12:12 AM, Mark Jordan ***@***.***> wrote:
OK, all ported to GitHub other than my notes for the code4lib LT.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#432 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADe_308QAnskudMMzla18a7wWBn-wNVDks5s0UYygaJpZM4OdH6P>
.
|
@flummingbird I'm going to demo this solution pack at our local code4lib on Friday. If you've had a chance to test it out I'd love to get some feedback. No worries if you haven't though. |
@mjordan I replied via email, but since it hasn't shown up here after an hour, I'm gonna paste my reply below... Mark, I'm sorry I haven't written sooner to say that this is fantastic! - exactly the content type we need. mjordan/islandora_solution_pack_remote_resource@7.x...lsulibraries:FR-oai In the short-term, I expect that that is how we will run the ingest, but I have begun planning for a GUI sync framework that will allow management and reporting over collections that use the Remote Resource in addition to automated (likely cron) updates. Thank you so very much for this, again! |
@jpeak5 awesome. I agree that OAI is a very important source of data for this solution pack. I put together some helper scripts, https://github.com/mjordan/islandora_remote_resource_batch_tools, which include an MIK post-write hook script to convert that output of its OAI crawl to a format that the batch loader could use as input. But, including OAI functionality directly in the batch loader makes a lot of sense and removes the dependency on MIK. |
…tory unique identifier
What does this Pull Request do?
We encountered a problem harvesting OAI, where someone's repository unique identifier had problematic characters IE: '/'
What's new?
We added logic to the src/writers/Oaipmh.php normalize_filename function, that would account for system directory separators. we chose to use hyphens to replace.
How should this be tested?
Set your repository unique identifier in islandora_oai config to http://example.com
Then try to harvest oai-pmh from there.
any sample oai-pmh sample config should work.
set_spec was a little hard to figure out. also the OAI example for harvesting from other islandoras, wasn't in the wiki TOC.
Interested parties @mjordan
Tag interested parties using @ mentions