Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to approach multi codepage datasets? #574

Closed
BenceBenedek opened this issue Jan 24, 2023 · 3 comments
Closed

How to approach multi codepage datasets? #574

BenceBenedek opened this issue Jan 24, 2023 · 3 comments
Labels
accepted Accepted for implementation question Further information is requested

Comments

@BenceBenedek
Copy link

Background [Optional]

Hi, I'm currently working on a use case where we have:

-a fairly complex copybook (some 1700 lines)
-several record types (variable length)
-several fields which contains free text
-several different codepages were used (based on country code)

One example would be this record:

   10  FILLER                    REDEFINES   ...-DATAPART.

*** ========= ... ... TEXT==========================
15 ...-REC....
20 ...-...-TIMESTAMP PIC X(26).
20 ...-...-CLTX-TEXT PIC X(4026).

I may overlook something, but in order to get a readable data for all countries, I need to parse the cobol file for every codepage (i have to define the code page during the cobrix configuration) which is used, then filter the data based on country code, write out the df and finally, merge all the df-s.

Ideally, only the specific fields should be decoded with specific codepages, and this should be done by one parse action.

Question

Is there a way to apply business logic and based on that, use the correct code page during the parsing?

Many thanks for your help.

@BenceBenedek BenceBenedek added the question Further information is requested label Jan 24, 2023
@yruslan
Copy link
Collaborator

yruslan commented Jan 25, 2023

This is a very good question. This is not supported at the moment, but shouldn't be very hard to add.

@yruslan yruslan added the accepted Accepted for implementation label Jan 25, 2023
@yruslan
Copy link
Collaborator

yruslan commented Feb 15, 2023

This is how it is supported in the current master, and will be in 2.6.4:

        .option("field_code_page:cp037", "FIELD-1,FIELD_2")
        .option("field_code_page:cp870", " FIELD-3 ")

You can specify a code page, and the list of fields that have that encoding.

@BenceBenedek
Copy link
Author

This is how it is supported in the current master, and will be in 2.6.4:

        .option("field_code_page:cp037", "FIELD-1,FIELD_2")
        .option("field_code_page:cp870", " FIELD-3 ")

You can specify a code page, and the list of fields that have that encoding.

Thank you @yruslan will test it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Accepted for implementation question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants