Non-obvious local file loading paths #581

gsantia · 2021-03-03T21:59:18Z

It took me a bit to figure out how to fetch a local file for an extract config, because it's not really clear in the documentation or the library what to use for the working directory. My directory structure looks like this:

my_study/
└── data/
    └── gen_manifest.csv
    └── s3_scrape.csv
└── ingest_package/
    └── extract_configs/
       └── genomic.py

After a bit of fooling around, I managed to get what I wanted (load gen_manifest.csv for extraction and use s3_scrape.csv to supplement it), but it was really just after guessing what the paths should be. Here's what I mean:

genomic.py

source_data_url = "file://../../data/gen_manifest.csv"

def do_after_read(df):
    s3_scrape = pd.read_csv("data/s3_scrape.csv")
    merged_df = df.merge(
        right=s3_scrape,
        left_on="filepath",
        right_on="Filepath",
        how="inner"
    )
    return merged_df

So using the file protocol I have to go up two directories, but for reading with pandas I don't. Changing how this works in the library probably breaks some previous extract configs, so I assume a change to the documentation is preferable.

The text was updated successfully, but these errors were encountered:

fiendish · 2021-03-04T18:55:48Z

Side question: Is there a reason why you're loading a second file manually and merging it in do_after_read instead of combining them in the transform module?

gsantia · 2021-03-04T19:44:13Z

Side question: Is there a reason why you're loading a second file manually and merging it in do_after_read instead of combining them in the transform module?

No. I haven't done an ingest in a while, so I was referencing the Chung one I did almost a year ago. But I did it there because I was again referencing someone else's ingest package. So it seems like it happens sometimes.

fiendish · 2021-03-08T15:07:30Z

It seems like any compatibility concerns from a code change could be addressed by #504

gsantia · 2021-03-09T20:31:27Z

Is changing the code preferable? I don't mind adding a snippet to the documentation.

fiendish · 2021-03-09T21:47:40Z

It might be!

gsantia added the documentation Regarding developer or user documentation label Mar 3, 2021

gsantia self-assigned this Mar 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-obvious local file loading paths #581

Non-obvious local file loading paths #581

gsantia commented Mar 3, 2021 •

edited

Loading

fiendish commented Mar 4, 2021 •

edited

Loading

gsantia commented Mar 4, 2021

fiendish commented Mar 8, 2021

gsantia commented Mar 9, 2021

fiendish commented Mar 9, 2021

Non-obvious local file loading paths #581

Non-obvious local file loading paths #581

Comments

gsantia commented Mar 3, 2021 • edited Loading

fiendish commented Mar 4, 2021 • edited Loading

gsantia commented Mar 4, 2021

fiendish commented Mar 8, 2021

gsantia commented Mar 9, 2021

fiendish commented Mar 9, 2021

gsantia commented Mar 3, 2021 •

edited

Loading

fiendish commented Mar 4, 2021 •

edited

Loading