Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Stage READ/WRITE support and restrict URL format #17820

Closed
1 task done
cpegeric opened this issue Jul 31, 2024 · 1 comment
Closed
1 task done

[Feature Request]: Stage READ/WRITE support and restrict URL format #17820

cpegeric opened this issue Jul 31, 2024 · 1 comment
Assignees
Labels
kind/feature phase/testing priority/p0 Critical feature that should be implemented in this version
Milestone

Comments

@cpegeric
Copy link
Contributor

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

- Current stage implementation only supports SELECT INTO OUTFILE to write to a local file system but not S3. (see #17748)
- READ is not supported by Stage yet.
- the Stage URL should be restricted to URL format only, i.e. file:///, s3:// or stage://.  
- Current supported URL format such as stagename:/path or simply a path should not be supported.

Describe the feature you'd like

  1. Able to import data from SQL "LOAD DATA INFILE" and External table and read from local file or S3.
  2. SELECT INTO OUTFILE accept stage URL as input filepath and write to local file or S3.
  3. stage URL can specify to a local file, s3 file and another stage with subpath.
  4. restrict the stage URL format by only supporting file:///, s3:// or stage://

Describe implementation you've considered

Stage URL format

The possible format of URL:

1. s3://<bucket>/<path> e.g. s3://bucket/path/to
2. file:///<path> e.g. file///path/to
3. stage://<stagename>/<path>

CREATE/DROP/SHOW Stage

CREATE STAGE [ IF NOT EXISTS ] { stage_name }
   { StageParams }
   [ COMMENT = '<string_literal>' ]
   
DROP STAGE [IF EXISTS] { stage_name };

SHOW STAGES;

StageParams (for Amazon S3) :
URL =  "s3://<bucket>[/<path>/]" CREDENTIALS = {"AWS_KEY_ID"='<string>', "AWS_SECRET_KEY"='<string>', "AWS_ROLE"='<string>', "AWS_TOKEN"='<string>', "AWS_REGION"='<string>', "COMPRESSION"='<string>', 'PROVIDER'='<string>', 'ENDPOINT'='<string>'}�
                                                 
StageParams (for File System) :
URL= 'file:///[/path/]'

StageParams (for sub-stage):
URL= "stage://<stagename>[/path/]"

External stages

Amazon S3
URL = s3://<bucket>[/path/]
Specifies the URL for the external location (existing bucket accessed using an S3 API endpoint) used to store data files, where:

  • is the name of the bucket
  • is an optional case-sensitive path (or prefix in S3 terminogloy) for files in the cloud storage location (i.e. files with names that begin with a common string)

CREDENTIALS = {'key1'='value1', 'key2'='value2',...}
Specifies the credentials for the external cloud storage

Mandatory keys are

  • AWS_KEY_ID specifies the key ID provided by the storage
  • AWS_SECRET_KEY specifies the secret key provided by the storage
  • AWS_REGION specifies the region of the storage
  • PROVIDER specifies the provider of the storage. Possible values are "amazon" and "minio".
  • ENDPOINT specifies the host of the storage. For Amazon S3, set endpoint as empty string.

Examples

Amazon S3
CREATE STAGE my_stage URL='s3://bucket/data/files/'
CREDENTIALS={'AWS_KEY_ID'='1a2b3c', 'AWS_SECRET_KEY='4x6y6z', 'AWS_REGION'='en_US', 'PROVIDER'='amazon'}

Amazon S3-compatible storage (Minio)
CREATE STAGE my_stage URL='s3://bucket/data/files/'
CREDENTIALS={'AWS_KEY_ID'='1a2b3c', 'AWS_SECRET_KEY'='4x6y6z', 'AWS_REGION'='en_US', 'PROVIDER'='minio', 'ENDPOINT'='endpoint-host'}

Sub-Stage
Credential of the sub-stage will be inherited from the parent stage. Path conversion and credential look up from parent stage will be executed in execution time.
CREATE STAGE my_substage URL='stage://stagename/sub/path/';


File based stage
CREATE STAGE my_filestage URL='file:///path/to/somewhere/'

READ/WRITE

Loading from stage

LOAD DATA INFILE 'stage://dbname/my_stage/abc.csv' INTO TABLE my_table;

External Table

create external table t(...) INFILE '<string>' FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\n';

Example,

create external table ext_table INFILE 'stage://stagename/path/to/somewhere' fields terminated by ',' enclosed by '\"' lines terminated by '\n';

For Datalink support,

select load_file(" stage://stagename/path/abc.dat?offset=0&size=23");

Writing to the stage

SELECT * FROM my_table INTO OUTFILE 'stage://my_stage/data.csv';

Documentation, Adoption, Use Case, Migration Strategy

No response

Additional information

No response

@guguducken guguducken assigned matrix-meow and unassigned matrix-meow Aug 6, 2024
@sukki37 sukki37 added this to the 1.3.0 milestone Aug 7, 2024
@sukki37 sukki37 added the priority/p0 Critical feature that should be implemented in this version label Aug 12, 2024
@sukki37 sukki37 assigned cpegeric and unassigned matrix-meow Aug 12, 2024
@sukki37 sukki37 mentioned this issue Aug 22, 2024
7 tasks
@cpegeric cpegeric assigned aressu1985 and unassigned cpegeric Oct 23, 2024
@Ariznawlll
Copy link
Contributor

test done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature phase/testing priority/p0 Critical feature that should be implemented in this version
Projects
None yet
Development

No branches or pull requests

5 participants