Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to convert (2017) into YYYY format and couldn't index it (dsDescriptionDate) #4558

Closed
andrewSC opened this issue Mar 29, 2018 · 14 comments
Assignees
Labels
Type: Bug a defect

Comments

@andrewSC
Copy link

After clearing the Solr index and kicking off a full-reindex I noticed multiple log messages similar to:

[2018-03-29T14:41:40.296-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.search.IndexAllServiceBean] [tid: _ThreadID=160 _ThreadName=__ejb-thread-pool10] [timeMillis: 1522348900296] [levelValue: 800] [[
indexing dataset 618 of 975 (id=4684, persistentId=<omitted>)]]

[2018-03-29T14:41:40.346-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.search.IndexServiceBean] [tid: _ThreadID=160 _ThreadName=__ejb-thread-pool10] [timeMillis: 1522348900346] [levelValue: 800] [[
unable to convert (2017-02-22) into YYYY format and couldn't index it (dsDescriptionDate)]]

[2018-03-29T14:41:40.777-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.search.IndexAllServiceBean] [tid: _ThreadID=160 _ThreadName=__ejb-thread-pool10] [timeMillis: 1522348900777] [levelValue: 800] [[
indexing dataset 619 of 975 (id=4686, persistentId=<omitted>)]]

After running asadmin set-log-levels edu.harvard.iq.dataverse.search.IndexServiceBean=FINE I was able to capture some more descriptive output (of a different dataset failing) from the logging in IndexServiceBean.java#L713 which provided:

15906-[2018-03-29T15:14:02.360-0400] [glassfish 4.1] [FINE] [] [edu.harvard.iq.dataverse.search.IndexServiceBean] [tid: _ThreadID=158 _ThreadName=__ejb-thread-pool8] [timeMillis: 1522350842360] [levelValue: 500] [CLASSNAME: edu.harvard.iq.dataverse.search.IndexServiceBean] [METHODNAME: addOrUpdateDataset] [[
15907-  indexing dsDescriptionDate:[(2017)] into dsDescriptionDate and maybe dsDescriptionDate_ss]]
15908-
15909-[2018-03-29T15:14:02.360-0400] [glassfish 4.1] [FINE] [] [edu.harvard.iq.dataverse.search.IndexServiceBean] [tid: _ThreadID=158 _ThreadName=__ejb-thread-pool8] [timeMillis: 1522350842360] [levelValue: 500] [CLASSNAME: edu.harvard.iq.dataverse.search.IndexServiceBean] [METHODNAME: addOrUpdateDataset] [[
15910-  date as string: (2017)]]
15911-
15912-[2018-03-29T15:14:02.360-0400] [glassfish 4.1] [FINE] [] [edu.harvard.iq.dataverse.search.IndexServiceBean] [tid: _ThreadID=158 _ThreadName=__ejb-thread-pool8] [timeMillis: 1522350842360] [levelValue: 500] [CLASSNAME: edu.harvard.iq.dataverse.search.IndexServiceBean] [METHODNAME: addOrUpdateDataset] [[
15913:  Trying to convert (2017) to a YYYY date from dataset 5135]]
15914-
15915-[2018-03-29T15:14:02.360-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.search.IndexServiceBean] [tid: _ThreadID=158 _ThreadName=__ejb-thread-pool8] [timeMillis: 1522350842360] [levelValue: 800] [[
15916-  unable to convert (2017) into YYYY format and couldn't index it (dsDescriptionDate)]]

Which seems to imply that the conversion is failing due to parenthesis existing around the date at the time of conversion. I did some quick spelunking and it doesn't seem immediately obvious what's adding them? Here's what I found in the codebase:

$ ag '\"\(\"\ \+'
main/java/edu/harvard/iq/dataverse/mydata/MyDataFilterParams.java
273:            valStr = "(" + valStr + ")";
294:            valStr = "(" + valStr + ")";
297:        return  "(" + SearchFields.PUBLICATION_STATUS + ":" + valStr + ")";

main/java/edu/harvard/iq/dataverse/mydata/MyDataFinder.java
354:            return "(" + entityIdClause + " OR " + parentIdClause + ")";

main/java/edu/harvard/iq/dataverse/mydata/RoleTagRetriever.java
395:            //msg("dv id: " + dvId + "(" + this.idToDvObjectType.get(dvId) + ") | roleId: "
396:            //        + roleId + "(" +  this.rolePermissionHelper.getRoleName(roleId)+")");
496:           //msgt("dvIdForCard: " + dvIdForCard + "(" + this.idToDvObjectType.get(dvIdForCard) + ")");

main/java/edu/harvard/iq/dataverse/mydata/SolrQueryFormatter.java
53:                            String qPart = "(" + paramName  + ":(" + orClause + "))";
55:             qPart = "(" + paramName  + ":(" + orClause + ") AND " + SearchFields.TYPE + ":(" +  dvObjectType + "))";

main/java/edu/harvard/iq/dataverse/authorization/AuthenticationServiceBean.java
849:                            String error = "at " + stacktrace.getClassName() + "." + stacktrace.getMethodName() + "(" + stacktrace.getFileName() + ":" + lineNumber + ") ";

main/java/edu/harvard/iq/dataverse/harvest/server/OAISetServiceBean.java
222:        query = "(" + query + ")";

main/java/edu/harvard/iq/dataverse/api/Index.java
191:                            String error = "at " + stacktrace.getClassName() + "." + stacktrace.getMethodName() + "(" + stacktrace.getFileName() + ":" + lineNumber + ") ";
279:                            String error = "at " + stacktrace.getClassName() + "." + stacktrace.getMethodName() + "(" + stacktrace.getFileName() + ":" + lineNumber + ") ";

main/java/edu/harvard/iq/dataverse/PermissionServiceBean.java
503:                    indirectParentIds = "(" + Integer.toString(dvIdAsInt);
525:                        indirectDatasetParentIds = "(" + Integer.toString(dvIdAsInt);

main/java/edu/harvard/iq/dataverse/dataaccess/ImageThumbConverter.java
347:        logger.fine("image dimensions: " + width + "x" + height + "(" + storageIO.getDataFile().getStorageIdentifier() + ")");

main/java/edu/harvard/iq/dataverse/search/IndexServiceBean.java
133:        logger.fine("indexDataverse called on dataverse id " + dataverse.getId() + "(" + dataverse.getAlias() + ")");
@scolapasta
Copy link
Contributor

The () are likely being added by the display formatting, but I would've though the indexing code would not be using that. Not sure if this is a new regression or not, but thanks for bringing to our attention.

@pdurbin
Copy link
Member

pdurbin commented Mar 29, 2018

Over at http://irclog.iq.harvard.edu/dataverse/2018-03-29#i_65142 @andrewSC mentioned he's running 4.8.5 but I wanted to mention that I see this problem in the develop branch as well. The phoenix server is running 3845816 and this is what I see:

[root@dvnweb-vm6 ~]# grep dsDescriptionDate /usr/local/glassfish4/glassfish/domains/domain1/logs/server.log -B1
[2018-03-29T09:54:44.537-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.search.IndexServiceBean] [tid: _ThreadID=51 _ThreadName=jk-connector(2)] [timeMillis: 1522331684537] [levelValue: 800] [[
  unable to convert (2015-01-14) into YYYY format and couldn't index it (dsDescriptionDate)]]
--
[2018-03-29T09:54:45.379-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.search.IndexServiceBean] [tid: _ThreadID=51 _ThreadName=jk-connector(2)] [timeMillis: 1522331685379] [levelValue: 800] [[
  unable to convert (2015-01-14) into YYYY format and couldn't index it (dsDescriptionDate)]]
--
[2018-03-29T09:54:48.233-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.search.IndexServiceBean] [tid: _ThreadID=50 _ThreadName=jk-connector(1)] [timeMillis: 1522331688233] [levelValue: 800] [[
  unable to convert (2015-01-14) into YYYY format and couldn't index it (dsDescriptionDate)]]
--
[2018-03-29T09:54:49.031-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.search.IndexServiceBean] [tid: _ThreadID=54 _ThreadName=jk-connector(5)] [timeMillis: 1522331689031] [levelValue: 800] [[
  unable to convert (2015-01-14) into YYYY format and couldn't index it (dsDescriptionDate)]]
[root@dvnweb-vm6 ~]# 

Here's where the exception is being caught: https://github.com/IQSS/dataverse/blob/v4.8.5/src/main/java/edu/harvard/iq/dataverse/search/IndexServiceBean.java#L730

@pameyer
Copy link
Contributor

pameyer commented Mar 29, 2018

Checked my notes; similar behavior seen for dateOfCollectionStart and timePeriodCoveredEnd earlier this month (no commit tag in my notes, unfortunately). But not restricted to dsDescriptionDate

@scolapasta
Copy link
Contributor

Right, I'd expect you'd see it with any date subfield of a compound field that has a display format. @pameyer at some point you had made a change to how these things were indexed, no? Could that have introduced this regression? (not trying to call you out, sorry)

@pameyer
Copy link
Contributor

pameyer commented Mar 30, 2018

@scolapasta I did make some changes there; but I'm pretty sure I didn't add parenthesis anywhere.

@scolapasta
Copy link
Contributor

@pameyer not that you added the parenthesis, but that if you use the displayformat version of the field the parenthesis would be added due to being defined that way for display in the db.

@pameyer
Copy link
Contributor

pameyer commented Mar 30, 2018

@scolapasta if I'm remembering correctly, the changes I made sent value instead of display value to solr. I'd expect that to mean that things that were sent to solr with parenthesis in displayvalue wouldn't have the parenthesis anymore - but that doesn't seem to be what's happening here.

Not at all discounting the possibility that this was something that change broke without noticing (especially since that's happened previously)...

@scolapasta
Copy link
Contributor

@pameyer OK, went and found that old pull request: #4038

You are correct re: the change here - I wonder if something got reverted.

@bikramj
Copy link
Contributor

bikramj commented Apr 26, 2018

I have tried upgrades from 4.7.1 -> 4.8 -> 4.8.1 -> 4.8.2 -> 4.8.3 -> 4.8.4 -> 4.8.5 in our test instance and I saw these indexing errors in each released build!

unable to convert Start: 2014-05  into YYYY format and couldn't index it (dateOfCollectionStart)]]
unable to convert End: 2014-09  into YYYY format and couldn't index it (dateOfCollectionEnd)]]
unable to convert (2015-02) into YYYY format and couldn't index it (dsDescriptionDate)]]
unable to convert Start: 2009  into YYYY format and couldn't index it (dateOfCollectionStart)]]
unable to convert End: 2010  into YYYY format and couldn't index it (dateOfCollectionEnd)]]
unable to convert Start: 1871  into YYYY format and couldn't index it (timePeriodCoveredStart)]]
unable to convert End: 1994  into YYYY format and couldn't index it (timePeriodCoveredEnd)]]

Does it mean we will have to fix dates in DB!

@pdurbin
Copy link
Member

pdurbin commented Apr 26, 2018

@bikramj I don't think you need to worry about your dates in general but "(2015-02)" doesn't pass our validator. Check this out:

screen shot 2018-04-26 at 1 57 50 pm

We haven't investigated this issue yet. It's still in the "inbox" according to our kanban board: https://waffle.io/IQSS/dataverse

If anyone wants to poke around and try figure out what's happening, please go ahead!

@pameyer
Copy link
Contributor

pameyer commented Apr 26, 2018

Took a quick look since I'm investigating #4560 in the same general area. I'm pretty sure that the parenthesis are coming from logger.info("unable to convert " + dateAsString + " into YYYY format and couldn't index it (" + dsfType.getName() + ")"); in IndexServiceBean.java; rather than indicating issues with the data.

@pdurbin
Copy link
Member

pdurbin commented Aug 8, 2018

#4906 is related and when we fix it hopefully this issue (#4558) will be resolved as well. Same error.

@pdurbin
Copy link
Member

pdurbin commented Aug 9, 2018

Some possible clues into how this regression was introduced: #4906 (comment)

@matthew-a-dunlap
Copy link
Contributor

The same fix was used for this issue and #4906. They should be reviewed together.

@scolapasta scolapasta assigned scolapasta and unassigned scolapasta Sep 26, 2018
@kcondon kcondon self-assigned this Sep 26, 2018
kcondon added a commit that referenced this issue Sep 27, 2018
@kcondon kcondon closed this as completed Sep 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug a defect
Projects
None yet
Development

No branches or pull requests

7 participants