Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cleanup activity for URL field #9970

Merged
merged 22 commits into from
Jun 8, 2023

Conversation

Alexandra-Stath
Copy link
Contributor

Fixes koppor#216

Add cleanup activity for URL field. Specifically when a url is found in the note field of a BibTex entry, we place it under the URL field.

  • I have already made the necessary adjustments to several classes under gui and logic packages to ensure cleanup's functionality.
  • Added parameterized tests that validate the correctness of the process. Concidered various types of urls that need to be matched accordingly.

URLcleanup

Mandatory checks

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.


String NoteFieldValue = entry.getField(NOTE_FIELD).orElse(null);
String urlRegex = (
"(?i)\\b((?:https?://|www\\d{0,3}[.]|[a-z0-9.\\-]+[.]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a comment/link on the regex, e.g where you got it from

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was originally fetched from stackoverflow, but there were some modifications in order for it to be functional. Shall I comment that link in my code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and the modifications you did

@@ -1136,6 +1136,7 @@ Cleanup\ entries=Καθαρισμός καταχωρήσεων
Automatically\ assign\ new\ entry\ to\ selected\ groups=Αυτόματη ανάθεση νέας καταχώρησης στις επιλεγμένες ομάδες
%0\ mode=λειτουργία %0
Move\ DOIs\ from\ note\ and\ URL\ field\ to\ DOI\ field\ and\ remove\ http\ prefix=Μετακίνηση του DOI (Ψηφιακό Αναγνωριστικό Αντικειμένου) από το πεδίο σημείωσης και διεύθυνσης URL, στο πεδίο DOI και αφαίρεση του προθέματος http
Move\ URLs\ in\ note\ field\ to\ url\ field=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please only edit the en file. All other languages are handled by crowdin

@Siedlerchr
Copy link
Member

Thanks for your contribution and really great that you added tests as well!
Overall looks already good.

@Siedlerchr
Copy link
Member

There is one code style check failing (open rewrite)

These recipes would make results to src/main/java/org/jabref/logic/cleanup/URLCleanup.java:
org.jabref.config.rewrite.cleanup
org.openrewrite.java.cleanup.UnnecessaryParentheses
Report available:

Run 'gradle rewriteRun' to apply the recipes.

// Expected entry in case note field holds more than one URLs
BibEntry expectedWithMultipleURLs = new BibEntry()
.withField(StandardField.URL, "https://hdl.handle.net/10442/hedi/6089")
.withField(StandardField.NOTE, "/url{http://142.42.1.1:8080}");
Copy link
Member

@Siedlerchr Siedlerchr Jun 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please test with \url{} ? See also the link from koppor in the description. That would make more sense because it's a LaTeX command that is very liked to end up in the field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

import org.jabref.model.entry.field.StandardField;

/**
* Checks whether URL exists in note field, and stores it under url field .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No space before final dot of a sentence.

Suggested change
* Checks whether URL exists in note field, and stores it under url field .
* Checks whether URL exists in note field, and stores it under url field.

Comment on lines 83 to 85
/**
* Moves URL from NOTE to URL field.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No comment necessary - the comment is in URLCleanup class - similar to the other cleanup jobs.

Suggested change
/**
* Moves URL from NOTE to URL field.
*/

@@ -1055,7 +1055,7 @@ exportFormat=Format d'exportation
Output\ file\ missing=Fichier de sortie manquant
The\ output\ option\ depends\ on\ a\ valid\ input\ option.=L'option de sortie dépend d'une option d'entrée valide.
Linked\ file\ name\ conventions=Conventions pour les noms de fichiers liés
Filename\ format\ pattern=Modèle de format de nom de fichier
Filename\ format\ pattern=Modèle de format de nom de fichier
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes will be overwritten. You need to go to Crowdin service to update issues at the translations.


private static Stream<Arguments> provideURL() {
/*
* Expected entries with various types of URLs(e.g, not secure protocol, password included for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space before opening brace in English text

Suggested change
* Expected entries with various types of URLs(e.g, not secure protocol, password included for
* Expected entries with various types of URLs (e.g, not secure protocol, password included for

Comment on lines 30 to 38
BibEntry[] expectedwithURL = {
new BibEntry().withField(StandardField.URL, "https://hdl.handle.net/10442/hedi/6089"),
new BibEntry().withField(StandardField.URL, "http://hdl.handle.net/10442/hedi/6089"),
new BibEntry().withField(StandardField.URL, "http://userid:[email protected]:8080"),
new BibEntry().withField(StandardField.URL, "http://142.42.1.1:8080"),
new BibEntry().withField(StandardField.URL, "http://☺.damowmow.com"),
new BibEntry().withField(StandardField.URL, "http://-.~_!$&'()*+,;=:%40:80%2f::::::@example.com"),
new BibEntry().withField(StandardField.URL, "https://www.example.com/foo/?bar=baz&inga=42&quux"),
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values are used only ones. Please inline them below.

Background: We aim to keep expected and actual togeher so that one can read them easily. Using this array approach, reading will get difficult. See one inline-proposal below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@koppor So I follow this approach expected/actual together, as well for the other expected entries (e.g., expectedWithTwoNoteValues and expectedWithMultipleURLs

Comment on lines 65 to 66
Arguments.of(expectedwithURL[0], new BibEntry().withField(
StandardField.NOTE, "\\url{https://hdl.handle.net/10442/hedi/6089}")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Arguments.of(expectedwithURL[0], new BibEntry().withField(
StandardField.NOTE, "\\url{https://hdl.handle.net/10442/hedi/6089}")),
Arguments.of(
new BibEntry().withField(StandardField.URL, "https://hdl.handle.net/10442/hedi/6089"),
new BibEntry().withField(StandardField.NOTE, "\\url{https://hdl.handle.net/10442/hedi/6089}")),

String newNoteFieldValue = NoteFieldValue
.replace(url, "")
.replace("\\url{}", "")
.replace(",", "").trim();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work. in call cases. Maybe remove it - or fix it with maybe checking for regex "^, " - so that , appears at the beginning of the line AND is followed by a space.

Examples for note content:

\url{http://example.org}, cited by Kramer, 2002.

And some other strings one cannot forsee. Thus, simply removing all commas will produce wrong content.

I agree, this happens in edge cases. If you don't find a quick solution, keep as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@koppor, three proposed solutions here:

  1. Either url is preceded or followed by a comma
String newNoteFieldValue = NoteFieldValue
                    .replace(url, "")
                    .replaceAll("(, )?\\\\?url\\{\\}(, )?", "").trim();
  1. Not preferred but works
String newNoteFieldValue = NoteFieldValue
                    .replace(url, "")
                    .replace(", \\url{}", "")
                    .replace("\\url{}, ", "")
                    .replace("\\url{}", "").trim();
  1. Another solution, working with the current cases
// works with: \url{http://example.org}, cited by Kramer, 2002.
// doesn't work with: Cited by Kramer, 2002, \url{http://example.org}
String newNoteFieldValue = NoteFieldValue
                    .replace(url, "")
                    .replace("\\url{}", "")
                    .replaceFirst(", ", "").trim();

Overall, I could go with the first proposed solution as is more effective and covers plenty of cases
Also added a test case for: \url{http://example.org}, cited by Kramer, 2002.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fist one looks good, you can also add a comment before the RegEx to explain it

Siedlerchr
Siedlerchr previously approved these changes Jun 7, 2023
Copy link
Member

@Siedlerchr Siedlerchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failing test is not related

koppor
koppor previously approved these changes Jun 8, 2023
Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nitpicks regarding casing of variable names (they should all be lower case in Java). I'll fix them for myself and then merge.

Thank you for working on this!

src/main/java/org/jabref/logic/cleanup/URLCleanup.java Outdated Show resolved Hide resolved
src/main/java/org/jabref/logic/cleanup/URLCleanup.java Outdated Show resolved Hide resolved
src/main/java/org/jabref/logic/cleanup/URLCleanup.java Outdated Show resolved Hide resolved
src/main/java/org/jabref/logic/cleanup/URLCleanup.java Outdated Show resolved Hide resolved
src/main/java/org/jabref/logic/cleanup/URLCleanup.java Outdated Show resolved Hide resolved
@koppor koppor merged commit b4d2d51 into JabRef:main Jun 8, 2023
@koppor
Copy link
Member

koppor commented Jun 8, 2023

@Alexandra-Stath Thank you for the follow-up. I did not use any IDE and thus missed one thing. Merged now!

@Alexandra-Stath
Copy link
Contributor Author

@koppor @Siedlerchr
Thank you both very much for your time and cooperation.
Wish you all the best!

* remove it from the note field, and no other action is performed.
*/
if (entry.hasField(URL_FIELD)) {
String urlFieldValue = entry.getField(URL_FIELD).orElse(null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought, this could lead to issues, but I did not see it. There were no tests covering the orElse branch.

In future: In case Optionals are available, make use of them!

Fix and link to issue at #10435

Siedlerchr added a commit that referenced this pull request Jan 1, 2024
* upstream/main: (68 commits)
  Fix issue 9863 - Change Tab selection order (#9907)
  New Crowdin updates (#9994)
  Enable editing typo with double clicking on field name in custom entry type (#9977)
  Remove "Field name:" from localization - we already have "Field name" (#9991)
  Changed database to catalog in org.jabref.gui.slr and org.jabref.logic.crawler (#9989)
  Use environment variables for hostname detection (#9910)
  Add cleanup activity for URL field (#9970)
  Fix freezing when fetching IBSN and no results are found (#9987)
  Make Group(Node)TreeViewModel more OO (#9978)
  Fix container for group item count still visible if display count is off  (#9980)
  Fix paste of BibTeX data (#9985)
  Bring back SimplifyBoolean* and UnnecessaryParantheses (and refine guide) (#9981)
  Add new menu entry to remove braces from selection, aka unprotect it. (#9968)
  User specific comment (#9727)
  Bump org.openrewrite.recipe:rewrite-recipe-bom from 1.19.3 to 2.0.0 (#9975)
  Fix typos
  Remove non-needed link
  Fix typos
  Enable RemoveJavaDocAuthorTag (#9973)
  Bump com.fasterxml.jackson.datatype:jackson-datatype-jsr310 (#9974)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade note field if containing url
4 participants