Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Duplicated source when seed and model paths overlap #6102

Closed
2 tasks done
jmg-duarte opened this issue Oct 19, 2022 · 6 comments · Fixed by #6179
Closed
2 tasks done

[Bug] Duplicated source when seed and model paths overlap #6102

jmg-duarte opened this issue Oct 19, 2022 · 6 comments · Fixed by #6179
Labels
bug Something isn't working

Comments

@jmg-duarte
Copy link
Contributor

jmg-duarte commented Oct 19, 2022

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

If I have the following structure:

proj_name/
- data_provider
  - seeds
  - models
    - sources.yml
- < consider the previous structure repeated for N providers >

And the following configuration:

model-paths: 
  - proj_name/data_provider/
seed-paths: 
  - proj_name/data_provider

I will get an error like:

10:32:20  Running with dbt=1.3.0
10:32:21  Encountered an error:
Compilation Error
  dbt found two sources with the name "some_source".
  
  Since these resources have the same name, dbt will be unable to find the correct resource
  when looking for source("data_provider", "some_source").
  
  To fix this, change the name of one of these resources:
  - source.proj_name.data_provider.some_source (proj_name/data_provider/models/sources.yml)
  - source.proj_name.data_provider.some_source (proj_name/data_provider/models/sources.yml)

I am aware that I can do:

model-paths: 
  - proj_name/data_provider/models
seed-paths: 
  - proj_name/data_provider/seeds

But this is not practical for several data providers, furthermore, as the data in question is usually not mixed with other providers, we would like to keep the structure as close as it is.

Expected Behavior

No name clash.

My intuition says that sources shouldn't be analysed when searching for seeds, but there may be a reason behind this.

Steps To Reproduce

Described in current behavior.

Relevant log output

Described in current behavior.

Environment

- OS: MacOS
- Python: 3.8.10
- dbt: 1.3.0

Which database adapter are you using with dbt?

snowflake

Additional Context

No response

@jmg-duarte jmg-duarte added bug Something isn't working triage labels Oct 19, 2022
@github-actions github-actions bot changed the title [Bug] <title> [CT-1374] [Bug] <title> Oct 19, 2022
@jmg-duarte jmg-duarte changed the title [CT-1374] [Bug] <title> [Bug] Duplicated model when seed and model paths overlap Oct 19, 2022
@jmg-duarte jmg-duarte changed the title [Bug] Duplicated model when seed and model paths overlap [Bug] Duplicated source when seed and model paths overlap Oct 19, 2022
@jtcohen6
Copy link
Contributor

jtcohen6 commented Oct 19, 2022

Hey @jmg-duarte - I agree this is a bug! This sounds the same as one previously reported: #5120. We included a fix for that (#5176) in dbt-core v1.2.0 and v1.1.1.

As such, I wasn't able to reproduce this while running locally, using the latest version of dbt-core.

Is it possible that this regressed in v1.3? I would expect that this functional test should have started failing: https://github.com/dbt-labs/dbt-core/blob/main/tests/functional/configs/test_dupe_paths.py

@jtcohen6 jtcohen6 added duplicate This issue or pull request already exists awaiting_response and removed triage labels Oct 19, 2022
@jmg-duarte
Copy link
Contributor Author

Hey @jtcohen6, thanks for the quick response! I think this still happens if the paths resolve to the same one but the name is different.

Such as:

path/to/folder
path/to/folder/

I haven't had the chance to look too deep into this, but I've been seeing this since 1.2.2.
I'll try to create a repo with a minimal reproducible example so we can address this with confidence.

@jmg-duarte
Copy link
Contributor Author

I have a minimal reproducible example, steps to reproduce:

  1. dbt init
  2. Modify the dbt_project.yml making the seeds and models point to the same folder
  • The folder must differ by having a trailing /, example:
seeds: ["path/to/folder"]
models: ["path/to/folder/"]
  1. dbt compile

From an outside perspective, it seems as though the paths are not resolved before being used, so my suggestion would be to resolve them.

I've also updated the original issue to reflect this finding.

@jtcohen6 jtcohen6 removed duplicate This issue or pull request already exists triage labels Oct 27, 2022
@jtcohen6
Copy link
Contributor

Ah, good to know that / makes the difference! That would explain why the fix from #5120 isn't working here

@jmg-duarte
Copy link
Contributor Author

Turning this

def _all_source_paths(
model_paths: List[str],
seed_paths: List[str],
snapshot_paths: List[str],
analysis_paths: List[str],
macro_paths: List[str],
) -> List[str]:
# We need to turn a list of lists into just a list, then convert to a set to
# get only unique elements, then back to a list
return list(
set(list(chain(model_paths, seed_paths, snapshot_paths, analysis_paths, macro_paths)))
)

Into

def _all_source_paths(
    model_paths: List[str],
    seed_paths: List[str],
    snapshot_paths: List[str],
    analysis_paths: List[str],
    macro_paths: List[str],
) -> List[str]:
    # We need to turn a list of lists into just a list, then convert to a set to
    # get only unique elements, then back to a list
    paths = chain(model_paths, seed_paths, snapshot_paths, analysis_paths, macro_paths)
    paths = map(lambda s: str(pathlib.Path(s).resolve()), paths)
    return list(set(paths))

Should work, I can try and submit a PR if this looks ok to you

@jtcohen6
Copy link
Contributor

@jmg-duarte That makes sense to me! If you're up to contribute the fix, plus another test like this one, that would be very much appreciated :)

emmyoop pushed a commit that referenced this issue Feb 21, 2023
* Remove trailing slashes from source paths (#6102)

* Run changie

* Handle mypy complaints

* Revert format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants