Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long upload times #20

Open
jnnr opened this issue Dec 17, 2018 · 7 comments
Open

Long upload times #20

jnnr opened this issue Dec 17, 2018 · 7 comments
Assignees

Comments

@jnnr
Copy link

jnnr commented Dec 17, 2018

Uploading a timeseries of 8760 entries (380kB) takes 4049.0355615615845 sec. which is more than an hour.
Download, on the other side, is done within milliseconds.

  • is this normal?
  • can it be accelerated?

This is the function I execute, if that helps.

def upload_to_oep(df, Table, engine, metadata):
    table_name = Table.name
    schema_name = Table.schema

    if not engine.dialect.has_table(engine, table_name, schema_name):
        Table.create()
        print('Created table')
    else:
        print('Table already exists')

    Session = sessionmaker(bind=engine)
    session = Session()
    # insert data
    try:
        dtype = {key: Table.columns[key].type for key in Table.columns.keys()}
        df.to_sql(table_name, engine,
                          schema='sandbox',
                          if_exists='replace',
                          dtype=dtype)
        print('Inserted to ' + table_name)
    except Exception as e:
        session.rollback()
        session.close()
        raise
        print('Insert incomplete!')
    finally:
        session.close()

    return Table
@henhuy
Copy link

henhuy commented Dec 18, 2018

Same here with your example!
When switching to my local db the df is uploaded within 2 sec...
I think oedialect is not able to bulk insert the df...?

If you want to move on, maybe switching to ARRAY(FLOAT) and uploading your series into one row is a workaround? This is fast as well with oedialect.

@oakca
Copy link

oakca commented Dec 18, 2018

Following df is uploaded to OEP (w/ same script), and it took 3376.480009082 seconds:

df
         t      Mid.Elec    South.Elec    North.Elec
0        0      0.000000      0.000000      0.000000
1        1  43102.490062  43102.490062  43102.490062
2        2  41692.002871  41692.002871  41692.002871
3        3  40592.822117  40592.822117  40592.822117
...
8758  8758  47676.965212  47676.965212  47676.965212
8759  8759  46671.945046  46671.945046  46671.945046
8760  8760  46671.945046  46671.945046  46671.945046

[8761 rows x 4 columns]

@oakca
Copy link

oakca commented Mar 2, 2019

Well I searched little bit about "to_sql being slow". I ended up finding bunch of post in stackoverflow, as well as github-issues on this topic.

Upon reading the following: pandas-dev/pandas#8953, the only possible solution for this could be using some package called odo or something called d6tstack.

I checked the d6tstack, so it is basically creating the engine as we do, but it uses engine.raw_connection() instead of engine.connect() and creates something called engine.cursor(). It sends the header(first raw) of the df frame via to_sql and then changes the dataframe into .csv and uses cursor.copy_from() and cursor.commit()

I tried to implement that but it failed...

I checked also all of the stackoverflow posts, they are also mentioning same things, there are some workarounds but not for all cases, so kinda dead-end.

Probably best way: checking odo? or creating an algorithm to create dictionaries from the dataframe and apply following as in the examples https://github.com/OpenEnergyPlatform/examples/blob/feature/tutorial_reworked/api/OEP-oedialect_upload_from_csv.ipynb

try:
    insert_statement = ExampleTable.insert().values(
        [
            dict(variable='fairy dust', unit='t', year=2020, value=200),
            dict(variable='mana', unit='kg', year=1999, value=120),
            dict(variable='the force', unit='l', year=1998, value=1100)
        ]
    )
    session.execute(insert_statement)
    session.commit()
    print('Insert successful!')
except Exception as e:
    session.rollback()
    raise
    print('Insert incomplete!')
finally:
    session.close()

@Ludee
Copy link
Member

Ludee commented Mar 13, 2019

I think @MGlauer fixed this issue in the oedialect. Not sure if released yet.

@MGlauer
Copy link
Contributor

MGlauer commented Mar 13, 2019

Not yet. There is a fix, but atm it is rather fragile and I have to make it a bit more robust.

@Bachibouzouk Bachibouzouk transferred this issue from OpenEnergyPlatform/oeplatform Aug 28, 2019
@Bachibouzouk
Copy link
Contributor

@MGlauer - is the fix more robust now?

@jh-RLI
Copy link
Contributor

jh-RLI commented May 7, 2020

@MGlauer this is fixed now? You mentioned something this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants