Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This is a replacement pull request for #323 #332

Open
wants to merge 3 commits into
base: 7.x
Choose a base branch
from

Conversation

dnwk
Copy link

@dnwk dnwk commented Apr 14, 2020

This is a replacement pull request for #323

This is a pull request to make sure Google Scholar submodule will properly use the custom XPath settings in admin/islandora/solution_pack_config/scholar/xpaths for generating Google Scholar metatags, especially for authors.
In admin/islandora/solution_pack_config/scholar/xpaths, there are ways to set custom XPath for author information, including XPath for First Name, Last Name and one other fields just labeled as Authors. Currently in the file islandora_google_scholar.module, in line 43, it would just use “mods:name” as XPath and completely ignored what is set in the custom xpath settings. Furthermore, in line 50-54, it requires mods:role/mods:roleTerm and the roleTerm must be “author”. If roleTerms does not exist in MODS or the role is not author, it will not generate any information for authors. Since Google Scholar tags requires author information, it will not generate any Google Scholar tags if MODS does not match the hard-coded format exactly. Here is our sample MODS that current module will not be able to generate google scholar tags:

<name displayLabel="Creator(s)" type="personal">
    <namePart type="given">Brian</namePart>
    <namePart type="given">M*****</namePart>
    <namePart type="family">G****</namePart> 
    <displayForm>Brian M***** G*****</displayForm>
    <role>
        <roleTerm>Creator(s)</roleTerm>
    </role>
</name>

There are situations that “name” in MODS could be editors or advisors. However, since in the admin panel for custom xpath stated this is the xpaths for author, the custom xpaths should be accepted as author without roleTerm restriction. I also made accommodations in the code for middle name since it will have a namePart type given.

All the other required fields in the google scholar module works as intended and is taking custom xpath settings from the admin panel. I have tested my code in our islandora staging environment and it is generating google scholar meta tags in html header.
What's new?

variable_get('islandora_scholar_xpaths_authors_xpath') to get custom authors xpath while keeping mods:name as default.
Ignore roleTerm in MODS since the custom xpath settings specific says it is for authors.
Take mods:displayForm as a backup source for name if namePart type does not match “family” nor “given”.
It will accommodate more than one “given” name as middle name is marked as “given” in MODS.

How should this be tested?

Find an object in Islandora that’s covered by Islandora Scholar module.
In admin/islandora/solution_pack_config/scholar/xpaths, set custom xpaths for authors.
Visit the object page and check HTML header to make sure meta name="citation_author" exists and is showing the correct information.

@bondjimbond
Copy link
Contributor

Can you try restarting those two failing builds in Travis?

@manez
Copy link
Member

manez commented Apr 14, 2020

@bondjimbond I hit the refresh button on those. Fingers crossed 🤞

@bondjimbond
Copy link
Contributor

@dnwk Can you provide a bit more detail on how to replicate the actual problem that this PR solves? e.g. some sample data (both for MODS and a specific setting for the Custom XPATH field) that we can use to verify the issue and see what's different after checking out this PR?

@dnwk
Copy link
Author

dnwk commented Apr 14, 2020

@dnwk Can you provide a bit more detail on how to replicate the actual problem that this PR solves? e.g. some sample data (both for MODS and a specific setting for the Custom XPATH field) that we can use to verify the issue and see what's different after checking out this PR?

I am fixing a bug here. There is a custom XPATH in admin settings that let you set the XPath for Google Scholar tag. However, the actual code that generate the Google tag completely ignored whatever you set in that admin panel and just using the default. That's true for how author tag was generated.

@bondjimbond
Copy link
Contributor

OK, so the problem I'm encountering here...

  1. Changed the "Authors" xpath to //mods:mods[1]/mods:genre
  2. Checked the source on the page -- citation_author is no longer present.
  3. Changed "Authors" xpath back to the default
  4. citation_author is STILL not present.

Using the default setting for the Authors field, here's the output to compare:

7.x branch:

<head profile="http://www.w3.org/1999/xhtml/vocab">
--
  | <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  | <meta name="citation_title" content="#NoHomo: men&#039;s friendships, or &#039;something else&#039;" />
  | <meta name="DC.rights" content="http://rightsstatements.org/vocab/InC/1.0/" />
  | <meta name="DC.relation" content="Michel Foucault and sexualities and genders in education : friendship as ascesis" />
  | <meta name="DC.identifier" content="isbn: 9783030317379" />
  | <meta name="DC.identifier" content="doi: 10.1039/B005762M" />
  | <meta name="citation_publication_date" content="2019" />
  | <meta name="citation_isbn" content="9783030317379" />
  | <meta name="citation_abstract_html_url" content="http://localhost:8000/islandora/object/ir%3A4/" />
  | <link rel="shortcut icon" href="http://localhost:8000/misc/favicon.ico" type="image/vnd.microsoft.icon" />
  | <meta name="citation_doi" content="10.1039/B005762M" />
  | <meta name="citation_lastpage" content="22" />
  | <meta name="citation_firstpage" content="9" />
  | <meta name="citation_journal_title" content="Michel Foucault and sexualities and genders in education : friendship as ascesis" />
  | <meta name="DC.identifier" content="ir:4" />
  | <meta name="DC.contributor" content="Carlson, David Lee (editor)" />
  | <meta name="DC.contributor" content="Rodriguez, Nelson M. (editor)" />
  | <meta name="DC.contributor" content="Karioris, Frank G. (author)" />
  | <meta name="DC.publisher" content="Palgrave MacMillan" />
  | <meta name="Generator" content="Drupal 7 (http://drupal.org)" />
  | <meta name="DC.title" content="#NoHomo: men&#039;s friendships, or &#039;something else&#039;" />
  | <meta name="DC.contributor" content="Allan, Jonathan A. (Jonathan A. Allan (0000-0001-6702-7214)) (author)" />
  | <meta name="DC.contributor" content="(funder)" />
  | <meta name="DC.contributor" content="(author)" />
  | <meta name="DC.type" content="Text" />
  | <meta name="DC.type" content="chapters (layout features)" />
  | <meta name="DC.contributor" content="(translator)" />
  | <meta name="DC.date" content="2019" />
  | <meta name="citation_author" content="Karioris, Frank G." />
  | <meta name="citation_author" content="Allan, Jonathan A." />

This PR:


<head profile="http://www.w3.org/1999/xhtml/vocab">
--
  | <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  | <meta name="DC.identifier" content="ir:4" />
  | <meta name="DC.type" content="Text" />
  | <meta name="DC.type" content="chapters (layout features)" />
  | <meta name="DC.identifier" content="isbn: 9783030317379" />
  | <meta name="DC.identifier" content="doi: 10.1039/B005762M" />
  | <link rel="shortcut icon" href="http://localhost:8000/misc/favicon.ico" type="image/vnd.microsoft.icon" />
  | <meta name="DC.rights" content="http://rightsstatements.org/vocab/InC/1.0/" />
  | <meta name="DC.relation" content="Michel Foucault and sexualities and genders in education : friendship as ascesis" />
  | <meta name="DC.date" content="2019" />
  | <meta name="DC.contributor" content="(translator)" />
  | <meta name="DC.publisher" content="Palgrave MacMillan" />
  | <meta name="DC.title" content="#NoHomo: men&#039;s friendships, or &#039;something else&#039;" />
  | <meta name="Generator" content="Drupal 7 (http://drupal.org)" />
  | <meta name="DC.contributor" content="Karioris, Frank G. (author)" />
  | <meta name="DC.contributor" content="Carlson, David Lee (editor)" />
  | <meta name="DC.contributor" content="(funder)" />
  | <meta name="DC.contributor" content="Allan, Jonathan A. (Jonathan A. Allan (0000-0001-6702-7214)) (author)" />
  | <meta name="DC.contributor" content="Rodriguez, Nelson M. (editor)" />

@dnwk
Copy link
Author

dnwk commented Apr 15, 2020

@bondjimbond In the code for author, I am looking for mods:namePart under the author path. I guess you could say it should blindly take whatever is in the XPath user designated.
There is a variable_get for first name and last name since they are in the control panel. Or if those fields are empty, use the author one and blindly taking whatever value specified by the XPath?

@dnwk
Copy link
Author

dnwk commented Apr 15, 2020

@bondjimbond Could you also post the MODS?

@bondjimbond
Copy link
Contributor

Sure, here's the MODS...

<mods xmlns="http://www.loc.gov/mods/v3" xmlns:mods="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<titleInfo>
<title>#NoHomo: men's friendships, or 'something else'</title>
</titleInfo>
<genre authority="aat">chapters (layout features)</genre>
<originInfo>
<dateIssued>2019</dateIssued>
<publisher>Palgrave MacMillan</publisher>
<place>
<placeTerm type="text">New York</placeTerm>
</place>
</originInfo>
<name type="personal">
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
<namePart type="given">Frank G.</namePart>
<namePart type="family">Karioris</namePart>
</name>
<name type="personal">
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
<namePart type="given">David Lee</namePart>
<namePart type="family">Carlson</namePart>
</name>
<name type="personal">
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
<namePart type="given">Nelson M.</namePart>
<namePart type="family">Rodriguez</namePart>
</name>
<typeOfResource>text</typeOfResource>
<identifier type="isbn">9783030317379</identifier>
<part>
<date>2019</date>
<extent unit="page">
<start>9</start>
<end>22</end>
</extent>
</part>
<relatedItem type="host">
<titleInfo>
<title>
Michel Foucault and sexualities and genders in education : friendship as ascesis
</title>
</titleInfo>
<name type="corporate">
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
</relatedItem>
<extension type="submissionAgreement">Yes</extension>
<accessCondition type="use and reproduction" displayLabel="Rights Statement">http://rightsstatements.org/vocab/InC/1.0/</accessCondition>
<name type="personal">
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
<namePart type="given">Jonathan A.</namePart>
<namePart type="family">Allan</namePart>
<displayForm>Jonathan A. Allan (0000-0001-6702-7214)</displayForm>
<nameIdentifier type="orcid">0000-0001-6702-7214</nameIdentifier>
</name>
<name type="corporate">
<role>
<roleTerm authority="marcrelator" type="text">funder</roleTerm>
</role>
</name>
<name type="personal">
<role>
<roleTerm authority="marcrelator" type="text">translator</roleTerm>
</role>
</name>
<name type="corporate">
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<identifier type="doi">10.1039/B005762M</identifier>
</mods>

@@ -171,7 +178,7 @@ function islandora_scholar_create_meta_tags($object) {

$online_date = $mods_xml->xpath(variable_get('islandora_scholar_xpaths_online_date', '//mods:recordInfo/mods:recordCreationDate'));
if ($online_date) {
$date_string = islandora_scholar_parse_date_foryear($online_date);
$date_string = islandora_parse_date_foryear($online_date);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change breaks the page load for me; the function islandora_parse_date_foryear does not exist in any of the Islandora modules. islandora_scholar_parse_date_foryear does exist later on in this same file though at https://github.com/Islandora/islandora_scholar/blob/7.x/includes/google_scholar.inc#L210-L229.

I'm assuming this is just a typo, but this needs to be changed back to the way it was originally before this PR can be accepted.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, it is a typo. Fixing it

@@ -44,36 +43,46 @@ function islandora_scholar_create_meta_tags($object) {
else {
return FALSE;
}
foreach ($mods_xml->xpath('mods:name') as $name_xml) {
foreach ($mods_xml->xpath(variable_get('islandora_scholar_xpaths_authors_xpath', 'mods:name')) as $name_xml) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good, we should definitely not be using the hardcoded xpath if a configurable one exists.

includes/google_scholar.inc Show resolved Hide resolved
@bryjbrown bryjbrown dismissed their stale review April 17, 2020 21:07

My brain isn't working

@bryjbrown
Copy link
Member

bryjbrown commented Apr 17, 2020

@dnwk Okay! I've spent a long time looking at this and messing with it, and I've updated the code review several times as my understanding evolved, up to the point of dismissing the code review entirely so we can just talk about it.

The first thing that needs to be fixed is the bad function name, but I see that you saw that and will put in a fix.

The second thing I realized is that by adding the call to the variable for custom authors xpath and removing the pivot on roleTerm, you are shifting the roleTerm that gets selected onto the configured xpath, and this is very smart. Too smart for me to have fully understood it at first, unfortunately.

The problem that remains is that most people who were using the old code that assumed roleTerm = author never NEEDED to have a custom author xpath configured, and as a result they might not have custom xpaths enabled and are going to default to mods:name which WILL open the floodgate to ALL mods:names coming through and being listed as authors (this is the part that I got stuck on for a while). In order for this to work, if you take out the roleTerm = author pivot you'll have to make the default value being used select on 'author' so that it doesn't break the existing behavior.

Last but not least, when I upload a MODS record with roleTerm being "Creator(s)" like you had in your example, even when I set the custom authors xpath to use "Creator(s)" as the selected roleTerm:

//mods:mods[1]/mods:name/mods:role[mods:roleTerm = "Creator(s)"]/../mods:namePart[@type="family"]

I still don't see a citation_author appearing in the metatags so it doesn't appear to be working.

Moving forward, after fixing the function name I'd say the next step is to make sure that the citation_author selection is working with a default value so that names with roleTerm as 'author' show up in the citation_author metatags even when NO custom xpath is confgured and/or custom xpaths are not enabled. Once you have this working, THEN make sure that if you upload a MODS record that has a roleTerm that isn't 'author', you can select it by changing the custom xpaths and have it display citation_authors correctly.

@bryjbrown
Copy link
Member

Also just because I didn't say it before, THANK YOU for sticking with this PR and bearing with folks as they try to grapple with it. There's an intense amount of cognitive load going on here to even understand the problem or how the solution works, but this is a good idea and definitely addresses a bug that ought to be corrected. ❤️

@dnwk
Copy link
Author

dnwk commented Apr 17, 2020

//mods:mods[1]/mods:name/mods:role[mods:roleTerm = "Creator(s)"]/../mods:namePart[@type="family"]

@bryjbrown In my XPath, my XPath stopped at //mods:mods[1]/mods:name[@displayLabel= "Creator(s)"] So that my code will searching for the namePart underneath it. I am sure there is a way to write XPath to point to mods:name where it has roleTerm=author.

So, probably in
foreach ($mods_xml->xpath(variable_get('islandora_scholar_xpaths_authors_xpath', 'mods:name')) as $name_xml)

the default "mods:name" should be rewrite to somehow with an author in roleTerm. Let me research how I could write a XPath that default to mods:name where it has children of role\roleTerm="author". If you have an idea, let me know.

Thanks

@bryjbrown
Copy link
Member

bryjbrown commented Apr 17, 2020

@dnwk I think thats what the .. in

//mods:mods[1]/mods:name/mods:role[mods:roleTerm = "Creator(s)"]/../mods:namePart[@type="family"]

is doing, //mods:mods[1]/mods:name/mods:role[mods:roleTerm = "Creator(s)"] is selecting on mods:roleTerm elements that match "Creator(s)", and then the following /../mods:namePart[@type="family"] part is saying, "when you've found one, go back up to grab the mods:namePart[@type="family"] of that section.

I'm no xpath wizard so I can't verify one of the top of my head, but if you are trying specifically to get the mods:name element that has 'author' roleTerms, I'd try

//mods:mods[1]/mods:name/mods:role[mods:roleTerm = "author"]/../mods:name

and see if that woks as a default value.

@dnwk
Copy link
Author

dnwk commented Apr 17, 2020

@bryjbrown I am using some online XPath testing tool and settled on

/mods/name/role[roleTerm = "author"]/..

So that it will select the entire mods:name node for further processing.

… and change mods:name default xpath to "//mods:mods[1]/mods:name/mods:role[mods:roleTerm = "author"]/.." for legacy usage that relied on "author" role term
@dnwk
Copy link
Author

dnwk commented Apr 24, 2020

@bryjbrown update my code to fix the typo and updated default mods:name xpath. I have no way to test if the new xpath would fit into previous xpath assumption. Could you test it?
Thanks

@bryjbrown
Copy link
Member

@dnwk I see your new commit and it looks good at first glance; putting this on my to do list to review sometime this week hopefully. Thanks for following up on this!

@@ -171,7 +179,7 @@ function islandora_scholar_create_meta_tags($object) {

$online_date = $mods_xml->xpath(variable_get('islandora_scholar_xpaths_online_date', '//mods:recordInfo/mods:recordCreationDate'));
if ($online_date) {
$date_string = islandora_scholar_parse_date_foryear($online_date);
$date_string = islandora_scholar_date_foryear($online_date);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We originally had islandora_scholar_parse_date_foryear which was correct but the original commits changed this to islandora_parse_date_foryear leaving out the _scholar_ bit. Now it appears that you have added _scholar_ but in the process left out the _parse_ part. This needs to be islandora_scholar_parse_date_foryear, anything else causes the object page to fail to load. Please test a page load for the object before committing to verify that the object page loads.

@bryjbrown
Copy link
Member

@dnwk The new default xpath looks like its heading in the right direction, but still not working for me. When I switch from the main 7.x branch to this PR with the custom xpaths turned off, the the citation_author metatag info drops away for me implying that theres an xpath issue going on.

Furthermore, it looks like in adding the _scholar_ part back into islandora_scholar_parse_date_foryear, you dropped the _parse_ part which is also needed in order for the function to work, and with the function broken the object page does not load. Please do test that the object page loads before making any more commits, because if the object page doesn't load it requires us to fix the problems before we can test the PR.

Since I'm rapidly approaching the limits of my xpath knowledge, I'm requesting @DonRichards take a look at this. He wrote the custom xpath bits of Scholar so he should have a better idea of what's going on here.

Copy link
Member

@bryjbrown bryjbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to fix islandora_scholar_parse_date_foryear before testing can continue.

@CanOfBees
Copy link

CanOfBees commented Apr 29, 2020

Hi @bryjbrown @dnwk and @DonRichards -

Don and I were talking about this on our local Slack. While an XPath expression like //mods:mods[1]/mods:name/mods:role[mods:roleTerm = "author"]/.. is valid, if you want to capture the entirety of the mods:name in this case, something like

//mods:mods[1]/mods:name[mods:role/mods:roleTerm[. = 'author']]

//mods:mods[1]/mods:name[mods:role/mods:roleTerm = 'author']

would possibly make more sense.

I'm not sure exactly what I would need to do locally to fully test this PR and help with any XPath issues, so any notes along those lines would be welcome if you'd like help testing the XPath expressions here. Would the islandora labs vagrant be sufficient?

Lastly, @dnwk in your initial comment, if you have local MODS that uses name/@displayLabel consistently, your custom expression could be //mods:mods[1]/mods:name[@displayLabel='Creator(s)'].

In any case, I hope everyone doesn't mind an additional voice in the conversation. Apologies if I'm disrupting a mostly-finished PR.

Best!

Edit: I'm asleep at the wheel this morning. ☕

@bryjbrown
Copy link
Member

@CanOfBees I have been using the Islandora Vagrant VM to test, and using this MODS record:

<?xml version="1.0" encoding="UTF-8"?>
<mods:mods xmlns="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:flvc="info:flvc/manifest/v1" xmlns:mods="http://www.loc.gov/mods/v3" xmlns:dcterms="http://purl.org/dc/terms/" xsi:schemaLocation="http://www.loc.gov/standards/mods/v3/mods-3-4.xsd" version="3.4">
  <titleInfo lang="eng">
    <title>Scholar Test Document</title>
  </titleInfo>
  <name type="personal" authority="local">
    <namePart type="given">Arthur</namePart>
    <namePart type="family">Author</namePart>
    <role>
      <roleTerm authority="rda" type="text">author</roleTerm>
      <roleTerm authority="marcrelator" type="code">aut</roleTerm>
    </role>
  </name>
  <name type="personal" authority="local">
    <namePart type="given">Edward</namePart>
    <namePart type="family">Editor</namePart>
    <role>
      <roleTerm authority="rda" type="text">editor</roleTerm>
      <roleTerm authority="marcrelator" type="code">ed</roleTerm>
    </role>
  </name>
  <abstract>Test document for testing capture of authors.</abstract>
  <typeOfResource>text</typeOfResource>
  <genre authority="rdacontent">text</genre>
  <language>
    <languageTerm type="text">English</languageTerm>
    <languageTerm type="code" authority="iso639-2b">eng</languageTerm>
  </language>
  <physicalDescription>
    <form authority="rdamedia" type="RDA media terms">computer</form>
    <form authority="rdacarrier" type="RDA carrier terms">online resource</form>
    <extent>1 online resource</extent>
    <digitalOrigin>born digital</digitalOrigin>
  </physicalDescription>
</mods:mods>

It has a name with roleTerm=author and a name with roleTerm=editor, so you can test that the default works and should pick up "Arthur Author", but you should also be able to turn on the custom xpath queries and set the "author" selector xpath to point to roleTerm=editor and grab "Edward Editor" instead.

In my experience, the master 7.x branch grabs "Arthur Author" by default, but when custom xpaths are turned on and the author xpath is set to target roleTerm=editor instead of roleTerm=author, it STILL grabs "Arthur Author" as the author when it should be grabbing "Edward Editor". With the PR branch, it fails to grab anything whether custom xpaths are turned on or not.

@bryjbrown
Copy link
Member

Re-reviewed this PR as a result of today's 7x Committers Call. I think this PR is still valid and addresses an important bug that should get taken care of before the next release, so we do need to get this wrapped up and merged.

The current problem that needs to be solved before this PR can be merged is that we need the citation_author metatag to show up with the proper value in these 3 contexts:

  1. When Enable Custom XPaths is disabled
  2. When Enable Custom XPaths is enabled and the default value is set.
  3. When Enable Custom XPaths is enabled and a custom nondefault value is set.

It seems like the 7.x branch of Islandora Scholar currently works for 1, but not 2 or 3. I'm unsure if this PR gets 2 and 3 to work, but it definitely does not work with 1 currently. We need to get all 3 working as expected to move forward.

Since we haven't heard from @dnwk in a bit, I'd like to ask @DonRichards (and perhaps @CanOfBees if possible) to move forward with this since @DonRichards built the custom XPath machinery and has the most insight into how it works. I believe the core of the issue revolves around https://github.com/dnwk/islandora_scholar/blob/7.x-Kun-GoogleScholarDev/includes/google_scholar.inc#L47 which is using the value set for the islandora_scholar_xpaths_authors_xpath if it is set, and //mods:mods[1]/mods:name/mods:role[mods:roleTerm = "author"]/.. if it isn't. Either something is up with that variable, or something is up with that default XPath thats getting used if the variable isn't set. Possibly both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants