Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read multiple file(s) at a time when wildcard in file path #204

Closed
porscheme opened this issue Mar 31, 2022 · 2 comments
Closed

Read multiple file(s) at a time when wildcard in file path #204

porscheme opened this issue Mar 31, 2022 · 2 comments
Labels
type/enhancement Type: make the code neat or more efficient

Comments

@porscheme
Copy link

@wey-gu

Using below config file...

  • When multiple CSV data files are located at ./students/*.CSV path, Importer is trying to read all the file(s) at once
  • Each CSV data file in 4 GB in size
  • Why not read one file at a time?

Thanks in advance

version: v2
description: example
removeTempFiles: false
clientSettings:
  retry: 3
  concurrency: 1 # number of graph clients
  channelBufferSize: 1
  space: StudentCentral
  connection:
    user: root
    password: nebula
    address: rp-nebula-graphd-svc:9669
  postStart:
    commands: |
      DROP SPACE IF EXISTS StudentCentral;    
      CREATE SPACE IF NOT EXISTS StudentCentral(partition_num=6, replica_factor=2, vid_type=FIXED_STRING(80));
      USE StudentCentral;
      CREATE TAG IF NOT EXISTS                      Student(sudentId string, hcs string, docInstance string);
maritalStatusId int, raceIds string);
    afterPeriod: 8s
logPath: /csv_data/err/test.log
files:
  - path: ./students/*.CSV
    batchSize: 10000
    inOrder: false
    type: csv
    csv:
      withHeader: false
      withLabel: false
      delimiter: ","
    schema:
      type: vertex
      vertex:
        vid:
          type: string
          index: 0
        tags:
          - name: Patient
            props:
              - name: sudentId
                type: string
              - name: hcs
                type: string
              - name: docInstance
                type: string
@wey-gu
Copy link
Contributor

wey-gu commented Apr 21, 2022

Sorry for the late response, didn't manage to clean my notifications in mailbox.

Yes, this should be done in an on-demand way to yield each file in a separate fashion instead of loading them in RAM in one go.

@Sophie-Xie Sophie-Xie added type/enhancement Type: make the code neat or more efficient and removed enhancement labels Nov 29, 2022
@veezhang
Copy link
Contributor

#264 How many files are supported for configuration now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement Type: make the code neat or more efficient
Projects
None yet
Development

No branches or pull requests

4 participants