Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-obvious local file loading paths #581

Open
gsantia opened this issue Mar 3, 2021 · 5 comments
Open

Non-obvious local file loading paths #581

gsantia opened this issue Mar 3, 2021 · 5 comments
Assignees
Labels
documentation Regarding developer or user documentation

Comments

@gsantia
Copy link
Contributor

gsantia commented Mar 3, 2021

It took me a bit to figure out how to fetch a local file for an extract config, because it's not really clear in the documentation or the library what to use for the working directory. My directory structure looks like this:

my_study/
└── data/
    └── gen_manifest.csv
    └── s3_scrape.csv
└── ingest_package/
    └── extract_configs/
       └── genomic.py

After a bit of fooling around, I managed to get what I wanted (load gen_manifest.csv for extraction and use s3_scrape.csv to supplement it), but it was really just after guessing what the paths should be. Here's what I mean:

genomic.py

source_data_url = "file://../../data/gen_manifest.csv"

def do_after_read(df):
    s3_scrape = pd.read_csv("data/s3_scrape.csv")
    merged_df = df.merge(
        right=s3_scrape,
        left_on="filepath",
        right_on="Filepath",
        how="inner"
    )
    return merged_df

So using the file protocol I have to go up two directories, but for reading with pandas I don't. Changing how this works in the library probably breaks some previous extract configs, so I assume a change to the documentation is preferable.

@gsantia gsantia added the documentation Regarding developer or user documentation label Mar 3, 2021
@gsantia gsantia self-assigned this Mar 3, 2021
@fiendish
Copy link
Contributor

fiendish commented Mar 4, 2021

Side question: Is there a reason why you're loading a second file manually and merging it in do_after_read instead of combining them in the transform module?

@gsantia
Copy link
Contributor Author

gsantia commented Mar 4, 2021

Side question: Is there a reason why you're loading a second file manually and merging it in do_after_read instead of combining them in the transform module?

No. I haven't done an ingest in a while, so I was referencing the Chung one I did almost a year ago. But I did it there because I was again referencing someone else's ingest package. So it seems like it happens sometimes.

@fiendish
Copy link
Contributor

fiendish commented Mar 8, 2021

It seems like any compatibility concerns from a code change could be addressed by #504

@gsantia
Copy link
Contributor Author

gsantia commented Mar 9, 2021

Is changing the code preferable? I don't mind adding a snippet to the documentation.

@fiendish
Copy link
Contributor

fiendish commented Mar 9, 2021

It might be!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Regarding developer or user documentation
Projects
None yet
Development

No branches or pull requests

2 participants