Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Text file output - concurrent writes issue #4558

Open
dave-csc opened this issue Nov 12, 2024 · 1 comment
Open

[Bug]: Text file output - concurrent writes issue #4558

dave-csc opened this issue Nov 12, 2024 · 1 comment

Comments

@dave-csc
Copy link
Contributor

Apache Hop version?

2.10.0

Java version?

17.0.2

Operating system

Linux

What happened?

This might be a very specific scenario, but I'll file it as a bug anyway.

I set up a Text file output transform inside a "mappable" transform, with the purpose of creating a structured log file for what happens in the "mapping" transform. In short, the mapping transform generates the data needed in the log, and then pass those to the sub-pipeline to write them in the log. Hence, the Text file output is set to always write on the same file in "append" mode.

It happens that the parent transform calls the "logger" in multiple places, and sometimes their writes are mixed up in the resulting files, for example:

1596213;2023;30833201;E;DOE;JOHN;0;1596238;2023;30835173;S;MOE;JANE;1;Data correctly sent - HTTP status: 200 - Server response: true
+++ Invalid data provided

whereas the expected output should be:

1596213;2023;30833201;E;DOE;JOHN;0;+++ Invalid data provided
1596238;2023;30835173;S;MOE;JANE;1;Data correctly sent - HTTP status: 200 - Server response: true

I could probably mitigate this with some Blocking transforms here and there, but probably a better option would be checking if the file is already in use before writing (and then wait for its release before actually writing).

Issue Priority

Priority: 3

Issue Component

Component: Pipelines, Component: Transforms

@hansva
Copy link
Contributor

hansva commented Nov 12, 2024

If they are being called from multiple places then these are distinct instances. And have no knowledge of the other instances running at the same time. The only solution will be to call the mapper only once in each pipeline. The same issue can happen when running multiple copies at the same time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants