How to approach multi codepage datasets? #574

BenceBenedek · 2023-01-24T14:03:52Z

Background [Optional]

Hi, I'm currently working on a use case where we have:

-a fairly complex copybook (some 1700 lines)
-several record types (variable length)
-several fields which contains free text
-several different codepages were used (based on country code)

One example would be this record:

   10  FILLER                    REDEFINES   ...-DATAPART.

*** ========= ... ... TEXT==========================
15 ...-REC....
20 ...-...-TIMESTAMP PIC X(26).
20 ...-...-CLTX-TEXT PIC X(4026).

I may overlook something, but in order to get a readable data for all countries, I need to parse the cobol file for every codepage (i have to define the code page during the cobrix configuration) which is used, then filter the data based on country code, write out the df and finally, merge all the df-s.

Ideally, only the specific fields should be decoded with specific codepages, and this should be done by one parse action.

Question

Is there a way to apply business logic and based on that, use the correct code page during the parsing?

Many thanks for your help.

The text was updated successfully, but these errors were encountered:

yruslan · 2023-01-25T11:13:41Z

This is a very good question. This is not supported at the moment, but shouldn't be very hard to add.

…ges.

yruslan · 2023-02-15T07:49:31Z

This is how it is supported in the current master, and will be in 2.6.4:

        .option("field_code_page:cp037", "FIELD-1,FIELD_2")
        .option("field_code_page:cp870", " FIELD-3 ")

You can specify a code page, and the list of fields that have that encoding.

BenceBenedek · 2023-02-21T07:08:18Z

This is how it is supported in the current master, and will be in 2.6.4:
        .option("field_code_page:cp037", "FIELD-1,FIELD_2")
        .option("field_code_page:cp870", " FIELD-3 ")
You can specify a code page, and the list of fields that have that encoding.

Thank you @yruslan will test it out.

BenceBenedek added the question Further information is requested label Jan 24, 2023

yruslan added the accepted Accepted for implementation label Jan 25, 2023

yruslan added a commit that referenced this issue Feb 10, 2023

#574 Add support for reading data with fields having multiple code pa…

444ccb0

…ges.

yruslan added a commit that referenced this issue Feb 13, 2023

#574 Add support for reading data with fields having multiple code pa…

b76804c

…ges.

yruslan closed this as completed Mar 6, 2023

Beno922 mentioned this issue Oct 25, 2023

Multiple codepages in the same file #631

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to approach multi codepage datasets? #574

How to approach multi codepage datasets? #574

BenceBenedek commented Jan 24, 2023

yruslan commented Jan 25, 2023

yruslan commented Feb 15, 2023

BenceBenedek commented Feb 21, 2023

How to approach multi codepage datasets? #574

How to approach multi codepage datasets? #574

Comments

BenceBenedek commented Jan 24, 2023

Background [Optional]

Question

yruslan commented Jan 25, 2023

yruslan commented Feb 15, 2023

BenceBenedek commented Feb 21, 2023