Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get files from SFTP server using pattern of files #353

Closed
AashishMTech opened this issue Aug 8, 2024 · 13 comments
Closed

Get files from SFTP server using pattern of files #353

AashishMTech opened this issue Aug 8, 2024 · 13 comments
Labels
enhancement New feature or request

Comments

@AashishMTech
Copy link

Feature Description

SFTP connector with pattern search using regex.

What would you like to see added to Sling?

Current sftp connector works absolutely perfect for one file i.e. abc.csv or multiple files abc_def.csv or abcdef.csv.
however it does not seem to take the pattern like ^abc_\d{4}-\d{2}-\d{2}.csv$

Is there a way of solving this in sling ?

@AashishMTech AashishMTech added the enhancement New feature or request label Aug 8, 2024
@flarco
Copy link
Collaborator

flarco commented Aug 8, 2024

Regex are not currently accepted, but glob is.

Can you try abc_????-??-??.csv?

@AashishMTech
Copy link
Author

AashishMTech commented Aug 8, 2024

It does not work,
here it the error, it skips after _

abc_"
file does not exist�[0m
Sling command failed:

abc_*.csv works but we have some files with similar names hence we wanted more granular approach.

@AashishMTech
Copy link
Author

here are some more similar error lines

�[90m4:27PM�[0m �[32mINF�[0m [1 / 1] running stream sftp://transfers2.vendor.com/my_company/abc_?.csv �[90m4:27PM�[0m �[32mINF�[0m connecting to target database (snowflake) �[90m4:27PM�[0m �[32mINF�[0m reading from source file system (sftp) �[90m4:27PM�[0m �[32mINF�[0m �[31mexecution failed�[0m �[90m4:27PM�[0m �[32mINF�[0m �[31m~ error listing path: "my_company/abc_" file does not exist�[0m

@flarco
Copy link
Collaborator

flarco commented Aug 14, 2024

Should be good now with 5f9d3a5
Feel free to build on that branch and test.
Closing.

sling conns discover local -p 'sling-cli/core/dbio/iop/???.go'
+---+-----------------------------------+------+---------+--------------------------------+
| # | NAME                              | TYPE | SIZE    | LAST UPDATED (UTC)             |
+---+-----------------------------------+------+---------+--------------------------------+
| 1 | sling-cli/core/dbio/iop/csv.go    | file | 14 KiB  | 2024-04-15 14:48:13 (120d ago) |
| 2 | sling-cli/core/dbio/iop/ssh.go    | file | 8.5 KiB | 2024-08-13 10:16:20 (15h ago)  |
+---+-----------------------------------+------+---------+--------------------------------+

@flarco flarco closed this as completed Aug 14, 2024
@flarco flarco mentioned this issue Aug 15, 2024
@AashishMTech
Copy link
Author

AashishMTech commented Aug 15, 2024

This works when only pattern is available, like ???.csv will give asc.csv

with you we suffix or prefix it with a string it does not work, for e.g. "abc_???.csv" will not return abc_def.csv it fails saying no file found.

@flarco flarco reopened this Aug 15, 2024
@flarco
Copy link
Collaborator

flarco commented Aug 19, 2024

OK, applied ffe5a02
Can you try a dev build?

https://f.slingdata.io/dev/latest/sling_linux_amd64.tar.gz
https://f.slingdata.io/dev/latest/sling_darwin_arm64.tar.gz

@AashishMTech
Copy link
Author

AashishMTech commented Aug 20, 2024

I am using Version v1.2.16.dev

Replication not working

~ failure running replication (see docs @ https://docs.slingdata.io/sling-cli) --------------------------- sftp://transfers2.mysftp.com/my_folder/SITE_????-??-??.csv --------------------------- ~ error listing path: "my_folder/SITE_" file does not exist

looks like its skipping after _

@flarco
Copy link
Collaborator

flarco commented Aug 20, 2024

can you share your replication?

@AashishMTech
Copy link
Author

AashishMTech commented Aug 20, 2024

source: MY_SFTP
target: SNOWFLAKE

defaults:
  mode: truncate


streams:
  "sftp://transfers2.mysftp.com/myfolder/SITE_????-??-??.csv":
    object: 'myschema.site'
    single: true
    
env:
  SAMPLE_SIZE: 2000 # increase the sample size to infer types (default=900).
  SLING_STREAM_URL_COLUMN: true # adds a _sling_stream_url column with file path
  SLING_LOADED_AT_COLUMN: timestamp
  

@flarco
Copy link
Collaborator

flarco commented Aug 25, 2024

can you try again with latest build?

@AashishMTech
Copy link
Author

same error as before, when tried with the dev build

Version: 1.2.16.dev (2024-08-22)

~ failure running replication (see docs @ https://docs.slingdata.io/sling-cli) --------------------------- sftp://transfers2.mysftp.com/my_folder/SITE_????-??-??.csv --------------------------- ~ error listing path: "my_folder/SITE_" file does not exist

@flarco
Copy link
Collaborator

flarco commented Aug 26, 2024

Version should be: Version 1.2.16.dev (2024-08-25)
You have to download again.

Also, remove single: true:

source: MY_SFTP
target: SNOWFLAKE

defaults:
  mode: truncate


streams:
  "myfolder/SITE_????-??-??.csv":
    object: 'myschema.site'
    single: false    
env:
  SAMPLE_SIZE: 2000 # increase the sample size to infer types (default=900).
  SLING_STREAM_URL_COLUMN: true # adds a _sling_stream_url column with file path
  SLING_LOADED_AT_COLUMN: timestamp

@AashishMTech
Copy link
Author

This works, I am able to get files now.

@flarco flarco closed this as completed Aug 26, 2024
@flarco flarco mentioned this issue Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants