Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: The "Database Lookup" step trim function does not seem to work #2806

Closed
duursma opened this issue Apr 4, 2023 · 8 comments · Fixed by #2837
Closed

[Bug]: The "Database Lookup" step trim function does not seem to work #2806

duursma opened this issue Apr 4, 2023 · 8 comments · Fixed by #2837
Labels
Milestone

Comments

@duursma
Copy link

duursma commented Apr 4, 2023

Apache Hop version?

2.4.0

Java version?

11

Operating system

Windows

What happened?

I select a number of records from a database with a field know to have trailing spaces/newlines. After enabeling 'trim - both' on the field the trailing whitespace remains. After including a "String operation" step with the trim option set to 'both' the trailing whitespace does disappear.

Issue Priority

Priority: 3

Issue Component

Component: Database

@nadment
Copy link
Contributor

nadment commented Apr 18, 2023

In which transform you put 'trim - both' on the field?
Can you provide a basic pipeline to illustrate the problem?

@duursma
Copy link
Author

duursma commented Apr 18, 2023

test-trim.hpl.gz

@liosc
Copy link

liosc commented Apr 19, 2023

Hi. I just tried a very similar pipeline and the database lookup trim worked. My environment is Hop 2.4.0 (March, 26 snapshot), Windows and jdk-11.0.15.10. Could you please share the exact version of Java and the stream metadata after the look up (you can add a "Metadata structure of stream" transform and share the output of that).

@duursma
Copy link
Author

duursma commented Apr 20, 2023

My environment is: hop 2.4.0 (2023-03-27 07.51.24)
openjdk version "19.0.1" 2022-10-18
OpenJDK Runtime Environment (build 19.0.1+10-21)
OpenJDK 64-Bit Server VM (build 19.0.1+10-21, mixed mode, sharing)
ojdbc8-21.4.0.0.1.jar
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.18.0.0.0

Metadata output:
Position Fieldname Comments Type Length Precision Origin
2 REMARK String -1 -1 Database lookup
1 SAMPLE_NR SAMPLE_NR String 13 -1 Table input

Two sample lines from the database input:
SAMPLE_NR REMARK
L1995-06-7019 MON_DAT is analysedatum. Feitelijke MON_DAT is onbekend

L1995-06-7018 MON_DAT is analysedatum. Feitelijke MON_DAT is onbekend

Two sample lines from the database input:
SAMPLE_NR REMARK
L1995-06-7019 MON_DAT is analysedatum. Feitelijke MON_DAT is onbekend
L1995-06-7018 MON_DAT is analysedatum. Feitelijke MON_DAT is onbekend

So if copy/paste works the same, it looks newline related. The two trims do not do the same function with respect to newlines.

@sramazzina sramazzina added this to the 2.5 milestone Apr 20, 2023
@sramazzina
Copy link
Contributor

I had a look at the code and I confirm that the two transforms does the same thing in a different way. Apart from this, regarding the newline character in the output, that could be acceptable so I'm not conridering the behavior as a bug. The idea could be to add a new option parameter to let the user decide if the newline char must be removed by the trim or not. In that case we can change the type of the issue from bug to feature request. @hansva what do you think?

@duursma
Copy link
Author

duursma commented Apr 20, 2023

You could also make it a documentation issue. Make clear what trimming does in each case. But I would prefer to have 'trim' have a single meaning everywhere.

@sramazzina
Copy link
Contributor

@duursma I agree with you and we will take care about that but, apart from this techincal detail, we must decide how to manage the newline char. Someone could disaagree about removing the newline char, someone else could agree (like you for example). Who knows. Therefore the idea of providing a configuration option to guide that choice seems to be the best one.

@hansva
Copy link
Contributor

hansva commented Apr 20, 2023

I disagree with adding an option. Trim should be as simple as a java String.Trim() and should be uniform everywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants