-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A fix for the harvesting regression/tests introduced in 10836 #10990
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - I see it addresses bad values which were the source of the test fails.
PR looks good but I'm having trouble passing the 2 API tests. Here's what was done:
Issue: API tests are both failing throwing similar error: |
Note that you do not want to perform steps 3. and 4. before step 5. That will result in the API tests failing - because the tests will be trying again to harvest the datasets that are already harvested and exist locally on your system. (We should probably modify and improve the tests so that they work without the expectation that the datasets do not exist in the db; but that would be outside of this PR. They were written to run under Jenkins, and the database is always blank there. For now, make sure to remove the harvesting client and the associated datasets before running the API tests.) But, since you reported that
there must be something else going on. Please copy and paste all the console output from these tests, plus any error messages from server.log around the time of failures, plus (probably most importantly) the dedicated log files left from the failed harvesting runs. These will be in the same directory as server.log, and named like
|
P.S. I just kicked another Jenkins run at https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/view/change-requests/job/PR-10990/ |
Just adding console log outputs for the API tests. |
@landreev @ofahimIQSS it's not the harvest that's failing - at least not when I test it locally. What's failing is there's a search run on "q": "metadataSource:h45c7704" - and it's not getting any results, when it's expected to return all of the harvested datasets. |
@sekmiller @ofahimIQSS - that's exactly what I suspected was taking place. I got it to fail on my local build once too; and it was just that - harvests successful, but no hits from the search engine. Due, I'm assuming, to the (relatively) recent autocommit/softcommit changes in the solr config, that in practice added delays between indexing something and seeing it in search results. The tests in question have passed on Jenkins 3 time in a row in the meantime. @ofahimIQSS Could you post the actual harvesting logs, to confirm for sure that that's what's happening on your local build as well? It would be great to resolve and merge this asap, seeing how the Jenkins tests are failing on all PRs on these tests. |
@ofahimIQSS @sekmiller (I just had another guess though, as to why it's failing on our local builds and not on Jenkins, that has nothing to do with soft commit or indexing delays; let me verify) |
@landreev I looked in /usr/local/payara6/glassfish/domains/domain1/logs/ and couldn't find the harvesting logs since the script didn't get up to the point of adding a client. I'm assuming thats the reason it wasn't generated. |
No, you definitely went way past creating the clients. Your tests ran the actual harvests according to the error messages. So the harvest logs must be there. Strictly speaking, we don't need to see your logs to know that the actual harvests succeeded during your run, similarly to what @sekmiller saw on his instance. Because there are things like this in the console output that you posted:
(the above is the output of |
@sekmiller @ofahimIQSS
If you look at the line that Stephen posted earlier:
i.e., the test expects to find these datasets in the search engine under |
Might be a little late to the party but here are the files requested. I will create a PR to document how to access the Harvesting logs for future reference. harvest_cleanup_ha6ead1a_2024-11-01T16-37-53.txt Edit: #10996 --- Doc on accessing harvesting logs PR |
That's a great idea! ... and yeah, the log files do confirm what we assumed was taking place. like this line at the end:
i.e., harvests were in fact successful, but the results not indexed as the tests expected... |
@landreev do you have another theory beyond the soft commit? Would it make sense to put a pause in before the search? Sorry - missed the feature flag comment above. |
Any chance this could be merged today? - just to have all the Jenkins tests passing again by next week. |
(I just realized it was already pretty late :) - can definitely wait if need be) |
I'm happy to report the 2 tests passed after setting the feature flag as mentioned above. Testing Complete, merging PR. |
What this PR does / why we need it:
In one of the last commits in PR #10836, while addressing feedback from review, I rearranged/tried to clean up some validation and sanitizing code. Unfortunately, that introduced an error when importing harvested datasets (specifically, metadata-poor datasets created from oai_dc, where sanitizing invalid, or filling in missing required values is usually required). This PR fixes the regression.
The tests that are failing in develop have passed in https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/view/change-requests/job/PR-10990/1/.
Which issue(s) this PR closes:
Special notes for your reviewer:
Suggestions on how to test this:
Very straightforward; trying to harvest anything from demo.dataverse.org using oai_dc is going to fail in develop as of now.
This configuration, for example:
There are 7 datasets in the set; all 7 will fail in a develop build; all 7 should succeed with this PR.
The 2 api tests that are failing in the dev. branch:
testHarvestingClientRun_AllowHarvestingMissingCVV_False
testHarvestingClientRun_AllowHarvestingMissingCVV_True
should now be passing.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation: