Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when write back to GP locked by Postgres scanner #209

Open
2 tasks done
wonb168 opened this issue Apr 10, 2024 · 2 comments
Open
2 tasks done

when write back to GP locked by Postgres scanner #209

wonb168 opened this issue Apr 10, 2024 · 2 comments

Comments

@wonb168
Copy link

wonb168 commented Apr 10, 2024

What happens?

I am writing functions in PL/Python in Greenplum (GP) with three main steps:

  1. Using DuckDB to read GP tables.
  2. Calculating a result table, "rst," in DuckDB, extracting the DDL of the table, and creating the table in GP (need to add distribution keys, compression, etc., which default CREATE does not include,use plpy.execute(ddl)).
tb="gto_skc_store_step_kpi_summary_duck"
sql="""select sql from duckdb_tables() where table_name like 'tmp_rst' and schema_name = 'main';"""
ddl=dd.execute(sql).fetchone()[0]
ddl=f"""drop table if exists tenant_{tenant}_adm.{tb};
"""+ddl.replace('CREATE TABLE tmp_rst', f'CREATE TABLE tenant_{tenant}_adm.{tb}') \
	.replace('DOUBLE', 'numeric').replace(';',"""WITH (
	appendonly=true,
	orientation=column,
	compresslevel=6,
	compresstype=zlib
)
DISTRIBUTED BY (skc_sk);""")

plpy.notice(ddl)
plpy.execute(ddl)
  1. Writing the "rst" result table to GP (using COPY or INSERT).
insert into gp.tenant_{tenant}_adm.{tb}
select * from tmp_rst
-- or
COPY gp.tenant_{tenant}_adm.gto_skc_store_step_kpi_summary_duck FROM '/home/gpadmin/{tb}.parquet';

The third step never completes, and checking pg_stat_activity shows a locked state.
However, if split it into two functions, the first completes the first two steps, and write the result to a Parquet file, and the second reads the Parquet file, writing to GP, finishing in just tens of seconds.
Why?
pg_stat_activity:
image
and pg_locks in following excel:
pg_locks.xlsx

To Reproduce

import duckdb as dd 
sql=f"""COPY gp.tenant_{tenant}_adm.gto_skc_store_step_kpi_summary_duck FROM '/home/gpadmin/{tb}.parquet';"""
dd.execute(sql)

OS:

centos7

DuckDB Version:

0.10.1

DuckDB Client:

python3.9

Full Name:

wang cz

Affiliation:

Linezone

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have
@szarnyasg
Copy link
Contributor

Hi @wonb168, thanks for opening this issue. Could you please make the script self-contained with imports?

@szarnyasg szarnyasg transferred this issue from duckdb/duckdb Apr 10, 2024
@wonb168
Copy link
Author

wonb168 commented Apr 10, 2024

I write a minest demo:

-- create extension plpython3u;
CREATE OR REPLACE FUNCTION test_ddl() RETURNS text AS $$ 
import duckdb 
def exesql(sql):
	plpy.notice(sql)
	dd.execute(sql)

dd=duckdb.connect('gp.duckdb')
dd=duckdb.connect()
dbname="mdmaster_hggp7_dev"	
tenant=dbname.split('_')[1]
dburl=f"dbname={dbname} user=gpadmin host=127.0.0.1 port=2345"
sql=f"load postgres;ATTACH '{dburl}' AS gp (TYPE postgres);"
exesql(sql)
sql="drop table if exists tmp_rst;create table tmp_rst(id int,name text);insert into tmp_rst values (1,'a');"
exesql(sql)
sql="""select sql from duckdb_tables() where table_name like 'tmp_rst' and schema_name = 'main';"""
sql="drop table if exists public.tmp_rst;"+(dd.execute(sql).fetchone()[0]).replace('tmp_rst','public.tmp_rst') 
plpy.notice(sql)
plpy.execute(sql) 

sql=f"detach gp;ATTACH '{dburl}' AS gp (TYPE postgres);"
#sql="CALL pg_clear_cache();"
#sql="call postgres_execute(sql)" #Table Function with name postgres_execute does not exist!
exesql(sql)

sql="insert into gp.public.tmp_rst select * from tmp_rst"
exesql(sql)

return 'done' 
$$ LANGUAGE plpython3u;
select test_ddl();

It raise error, no table.
I guess duck got the gp tables at the ATTACH moment, it can't know the new table,
then can't exe insert sql,
but how to get the newest table from gp,
I test CALL pg_clear_cache() or re attatch , both NOT WORK!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants