Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak when executing vectorized quals #242

Merged
merged 2 commits into from
Mar 20, 2024

Conversation

japinli
Copy link
Contributor

@japinli japinli commented Feb 23, 2024

Currently, a VectorColumn structure is created when executing vectorized quals for each tuple on the ExecutorState memory context. However, it will be freed only until execution finishes. This commit changes the memory context to a tuple memory context.

Here are steps to reproduce:

  1. Clone tpch-kit.
    git clone https://github.com/gregrahn/tpch-kit.git
    cd tpch-kit/dbgen
  2. Compile tpch-kit.
    Make sure the following in Makefie:
    DATABASE = POSTGRESQL
    MACHINE  = LINUX
    WORKLOAD = TPC
    
  3. Generate data.
    ./dbgen -s 1
    mkdir s1 && for i in $(ls *.tbl); do sed 's/|$//' $i > s1/${i/tbl/csv}; rm $i; done;
  4. Generate load schema and data script.
    cd s1
    echo "CREATE EXTENSION columnar;" > load.sql
    sed 's|);$|) USING columnar;|g' ../dss.ddl >> load.sql
    for csv in $(ls *.csv); do table=$(echo $csv | cut -d. -f1); echo "COPY $table FROM '$PWD/$csv' WITH (FORMAT csv, DELIMITER '|');"; done >> load.sql
  5. Initialize the database and load data.
    initdb -D s1data
    pg_ctl -l logfile -D s1data start
    psql -f load.sql postgres
  6. Test
    1. Start a new connection using psql;
    2. use top to monitor the memory used by the backend process;
    3. Execute the following query;
      select
        s_name, s_address
      from
        supplier, nation
      where
        s_suppkey in (
          select
            ps_suppkey
          from
            partsupp
          where
            ps_partkey in (
              select
                p_partkey
              from
                part
              where
                p_name like 'khaki%'
            )
            and ps_availqty > (
              select
                0.5 * sum(l_quantity)
              from
                lineitem
              where
                l_partkey = ps_partkey
                and l_suppkey = ps_suppkey
                and l_shipdate >= date '1997-01-01'
                and l_shipdate < date '1997-01-01' + interval '1' year
            )
        )
        and s_nationkey = n_nationkey
        and n_name = 'EGYPT'
      order by
        s_name;

In the top you will see the memory increased and finally cause out of memory.

Currently, a VectorColumn structure is created when executing vectorized
quals for each tuple on the ExecutorState memory context.  However, it
will be freed only until execution finishes.  This commit changes the
memory context to a tuple memory context.
In ReadStripeNextVector(), the memory of columnValueOffset is allocated
in ExecutorState memory content, it will be freed until execution is
finished, so call pfree() to explicitly release the memory to avoid
memory growing up.
@wuputah wuputah requested a review from mkaruza March 20, 2024 01:28
@wuputah wuputah merged commit 3194451 into hydradatabase:main Mar 20, 2024
@wuputah wuputah mentioned this pull request Apr 1, 2024
@japinli japinli deleted the memory-leak branch April 3, 2024 05:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants