Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle DDL when using MaterializedPostgreSQL #29535

Closed
kartaris opened this issue Sep 29, 2021 · 21 comments · Fixed by #33200
Closed

How to handle DDL when using MaterializedPostgreSQL #29535

kartaris opened this issue Sep 29, 2021 · 21 comments · Fixed by #33200
Assignees
Labels
question Question?

Comments

@kartaris
Copy link

There is a statement "DDL is not replicated, but can be handled (described below)." in documentation but it is not described there.

How can it be handled?

@kartaris kartaris added the question Question? label Sep 29, 2021
@kssenii
Copy link
Member

kssenii commented Sep 29, 2021

Also recently there is a second, but not-automatic option, you can remove table from replication and add it back (it will have updated structure) via detach table and attach table.

@kartaris
Copy link
Author

A bad doc.. See https://github.com/ClickHouse/ClickHouse/blob/master/docs/en/operations/settings/settings.md#materialized_postgresql_allow_automatic_update-materialized-postgresql-allow-automatic-update.

I just tried setting this option to 1 when creating database using command

CREATE DATABASE postgresql_replica 
ENGINE = MaterializedPostgreSQL('...) 
SETTINGS materialized_postgresql_allow_automatic_update = 1;

Altered a table in postgres, inserted some rows into it and nothing changes in CH maybe I'm doing something wrong?

@kssenii
Copy link
Member

kssenii commented Sep 29, 2021

It takes some time to update. The table is reloaded in the background.

@kartaris
Copy link
Author

kartaris commented Sep 29, 2021

Think I've found a problem.
Server failed and exited with code 139.
Here are some logs
https://gist.github.com/kartaris/7dee797717bc7be2be6e831fc012a4bf

@kssenii
Copy link
Member

kssenii commented Sep 29, 2021

Thank you. It is a result of setting this setting to 1 or not?

@kartaris
Copy link
Author

Thank you. It is a result of setting this setting to 1 or not?

It happens after a while. But yeah, it fails only when it is set to 1.
My steps:

  1. Create database using MaterializedPostgreSQL with setting materialized_postgresql_allow_automatic_update = 1
  2. Alter table in postgres database and insert some rows
  3. After some time CH fails

@kssenii
Copy link
Member

kssenii commented Sep 29, 2021

hm, ok, thank you, I'll see. There are tests with the same steps in our integration tests and also I remember some person also wrote that he used this setting to 1 as default, so I thought it should be ok, I'll make more tests.

@kssenii kssenii reopened this Nov 17, 2021
@kssenii kssenii self-assigned this Nov 17, 2021
@singggum3b
Copy link

I'm also having this issues, CH dies after sometime running.
Plus there're case that's PGmaterializedEngine can't convert data to string and the replication stuck there.
Also, column renaming not recognized, only after adding a new column it's get update.

@singggum3b
Copy link

detach table does not work:

 DB::Exception: Failed to remove table `test_table` from replication. Info: std::exception. Code: 1001, type: pqxx::undefined_table, e.what() = ERROR: relation "test_table" does not exist (version 21.12.2.17 (official build)). (POSTGRESQL_REPLICATION_INTERNAL_ERROR) (version 21.12.2.17 (official build))

@kssenii
Copy link
Member

kssenii commented Dec 23, 2021

@singggum3b

Plus there're case that's PGmaterializedEngine can't convert data to string and the replication stuck there.

Just complaining that something does not work will not make any difference. You could attach some information about how it broke (what value failed to be converted), some error logs, stack traces...

Also, column renaming not recognized, only after adding a new column it's get update.

Because DDL cannot be fully replicated by the postgresql logical replication protocol. Only changes which can break replication are detected. Column renaming does not break replication because insertion is done by position, not by column name. Positional insertion is ok because addition/removal of column is always detected and passed by the protocol and as a result full update is performed.

detach table does not work:

It says the error reason in the exception message... It is expected behaviour. But now I see that it is better to ignore exception if table does not exist, I'll make a fix.

@singggum3b
Copy link

Yea, sorry @kssenii , i was testing the PGMaterializeEngine locally, and there're so many errors, and i forgot to capture the logs for everything.
Not complaining anything, i know this is an experimental feature.
It's a super useful one though, this could eliminate the need of ETL tool to transfer data to Clickhouse.

detach table does not work:
But you suggest to use attach and detach to refresh the table ? The test table actually exist on the materialized DB.

Btw, could you advise what's the current recommended way to replicate data from PG into Clickhouse in near realtime manner ?
Thanks you !

@singggum3b
Copy link

I managed to extract some logs for conversion error. Just doing some data editing, alter column name, adding column at source.

 2021.12.22 13:41:41.877203 [ 300 ] {} <Error> PostgreSQLReplicaConsumer(test): Conversion error: Unexpected text after s: '2021-12-22 13:41:15.251712'.
2021.12.22 13:41:41.976827 [ 123 ] {} <Error> void DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::routine(DB::TaskRuntimeDataPtr) [Queue = DB::MergeMutateRuntimeQueue]: Code: 241. DB::Exception: Memory limit (total) exceeded: would use 1.75 GiB (attempt to allocate chunk of 4219396 bytes), maximum: 1.75 GiB. (MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xa20d85a in /usr/bin/clickhouse
1. DB::Exception::Exception<char const*, char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, long&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, char const*&&, char const*&&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&&, long&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&&) @ 0xa222669 in /usr/bin/clickhouse
2. MemoryTracker::allocImpl(long, bool) @ 0xa221fe3 in /usr/bin/clickhouse
3. MemoryTracker::allocImpl(long, bool) @ 0xa221db0 in /usr/bin/clickhouse
4. MemoryTracker::allocImpl(long, bool) @ 0xa221db0 in /usr/bin/clickhouse
5. DB::Memory<Allocator<false, false> >::alloc() @ 0xa25c34d in /usr/bin/clickhouse
6. DB::WriteBufferFromFile::WriteBufferFromFile(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned long, int, unsigned int, char*, unsigned long) @ 0xa3120ee in /usr/bin/clickhouse
7. DB::DiskLocal::writeFile(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned long, DB::WriteMode) @ 0x12b5c9b7 in /usr/bin/clickhouse
8. DB::MergeTreeDataPartWriterOnDisk::Stream::Stream(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<DB::IDisk>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<DB::ICompressionCodec> const&, unsigned long) @ 0x13aadbc3 in /usr/bin/clickhouse
9. ? @ 0x13ab82cd in /usr/bin/clickhouse
10. DB::ISerialization::enumerateStreams(DB::ISerialization::SubstreamPath&, std::__1::function<void (DB::ISerialization::SubstreamPath const&)> const&, std::__1::shared_ptr<DB::IDataType const>, COW<DB::IColumn>::immutable_ptr<DB::IColumn>) const @ 0x12a25739 in /usr/bin/clickhouse
11. DB::MergeTreeDataPartWriterWide::addStreams(DB::NameAndTypePair const&, std::__1::shared_ptr<DB::IAST> const&) @ 0x13ab1d27 in /usr/bin/clickhouse
12. DB::MergeTreeDataPartWriterWide::MergeTreeDataPartWriterWide(std::__1::shared_ptr<DB::IMergeTreeDataPart const> const&, DB::NamesAndTypesList const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::__1::vector<std::__1::shared_ptr<DB::IMergeTreeIndex const>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeIndex const> > > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<DB::ICompressionCodec> const&, DB::MergeTreeWriterSettings const&, DB::MergeTreeIndexGranularity const&) @ 0x13ab1b8a in /usr/bin/clickhouse
13. DB::MergeTreeDataPartWide::getWriter(DB::NamesAndTypesList const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::__1::vector<std::__1::shared_ptr<DB::IMergeTreeIndex const>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeIndex const> > > const&, std::__1::shared_ptr<DB::ICompressionCodec> const&, DB::MergeTreeWriterSettings const&, DB::MergeTreeIndexGranularity const&) const @ 0x13aa1342 in /usr/bin/clickhouse
14. DB::MergedBlockOutputStream::MergedBlockOutputStream(std::__1::shared_ptr<DB::IMergeTreeDataPart const> const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, DB::NamesAndTypesList const&, std::__1::vector<std::__1::shared_ptr<DB::IMergeTreeIndex const>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeIndex const> > > const&, std::__1::shared_ptr<DB::ICompressionCodec>, bool) @ 0x13b7e64a in /usr/bin/clickhouse
15. void std::__1::allocator_traits<std::__1::allocator<DB::MergedBlockOutputStream> >::__construct<DB::MergedBlockOutputStream, std::__1::shared_ptr<DB::IMergeTreeDataPart>&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const>&, DB::NamesAndTypesList&, std::__1::vector<std::__1::shared_ptr<DB::IMergeTreeIndex const>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeIndex const> > >, std::__1::shared_ptr<DB::ICompressionCodec>&, bool&>(std::__1::integral_constant<bool, true>, std::__1::allocator<DB::MergedBlockOutputStream>&, DB::MergedBlockOutputStream*, std::__1::shared_ptr<DB::IMergeTreeDataPart>&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const>&, DB::NamesAndTypesList&, std::__1::vector<std::__1::shared_ptr<DB::IMergeTreeIndex const>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeIndex const> > >&&, std::__1::shared_ptr<DB::ICompressionCodec>&, bool&) @ 0x139f7db1 in /usr/bin/clickhouse
16. DB::MergeTask::ExecuteAndFinalizeHorizontalPart::prepare() @ 0x139ed528 in /usr/bin/clickhouse
17. bool std::__1::__function::__policy_invoker<bool ()>::__call_impl<std::__1::__function::__default_alloc_func<DB::MergeTask::ExecuteAndFinalizeHorizontalPart::subtasks::'lambda'(), bool ()> >(std::__1::__function::__policy_storage const*) @ 0x139faf89 in /usr/bin/clickhouse
18. DB::MergeTask::ExecuteAndFinalizeHorizontalPart::execute() @ 0x139f1d0b in /usr/bin/clickhouse
19. DB::MergeTask::execute() @ 0x139f6a3a in /usr/bin/clickhouse
20. DB::MergePlainMergeTreeTask::executeStep() @ 0x139e8dcc in /usr/bin/clickhouse
21. DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::routine(std::__1::shared_ptr<DB::TaskRuntimeData>) @ 0xa1e5a1a in /usr/bin/clickhouse
22. DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::threadFunction() @ 0xa1e5679 in /usr/bin/clickhouse
23. ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0xa2512aa in /usr/bin/clickhouse
24. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()&&...)::'lambda'()::operator()() @ 0xa2530c4 in /usr/bin/clickhouse
25. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xa24e6b7 in /usr/bin/clickhouse
26. ? @ 0xa2520bd in /usr/bin/clickhouse
27. ? @ 0x400086b609 in ?
28. __clone @ 0x40009a7293 in ?
 (version 21.12.2.17 (official build))
2021.12.22 13:41:42.170574 [ 129 ] {} <Error> void DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::routine(DB::TaskRuntimeDataPtr) [Queue = DB::MergeMutateRuntimeQueue]: Code: 241. DB::Exception: Memory limit (total) exceeded: would use 1.75 GiB (attempt to allocate chunk of 4219556 bytes), maximum: 1.75 GiB. (MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below):

@kssenii
Copy link
Member

kssenii commented Dec 23, 2021

But you suggest to use attach and detach to refresh the table ? The test table actually exist on the materialized DB.

Does the table have non-default schema? Actually just detach should also work, but I tested only with default schema. Support for non-default schema was added very recently, so I might have missed it for detach, will check.

i was testing the PGMaterializeEngine locally, and there're so many errors

Ok, if you send them all, some example / error logs (so I can reproduce), then I promise to fix them all this weekend.

Btw, could you advise what's the current recommended way to replicate data from PG into Clickhouse in near realtime manner ?

Now there is exponential check for update (up to 10 sec). So if there are no changes for replication, then it reschedules with 1 second, then if still no changes - with 2 seconds, 4, 8, 10. (10 seconds is maximum delay). I think I can make this behaviour configurable by setting if it is useful (like min_delay, max_delay, delay_factor).

@singggum3b
Copy link

I don't know if it's help, but i'll attach the full error logs and and my local setup here for reference:
Clickhouse version: yandex/clickhouse-server:21.12.2.17-alpine on mac m1;

Test source db: PG 14
image

CREATE DATABASE IF NOT EXISTS test_materialized ENGINE = MaterializedPostgreSQL('host.docker.internal:5432', 'test', 'postgres', '1234')
    SETTINGS
        materialized_postgresql_schema = 'test',
        materialized_postgresql_allow_automatic_update = 1;

Hardware:
Ram spec is quite low, run in container like ~1GB

Test case:
I'm just doing random schema altering, column type altering, data CRUD and watch the reflection on the destination, nothing special.
Please let me know if you need more details.

@singggum3b
Copy link

The log file are too big, so i had to upload to google drive:
https://drive.google.com/file/d/1CxARSwt-Y8XdNmihi7bZMAf0H8T2mJsm/view?usp=sharing

@kssenii
Copy link
Member

kssenii commented Jan 9, 2022

@singggum3b @kartaris ddl related issue should be fixed after #33200.
Will be in 22.1 version.
Please reopen the issue if it still happens, but I believe it will not happen after this fix.

@singggum3b
Copy link

Thanks @kssenii , we planned to use this feature extensively in the near future ❤️

@singggum3b
Copy link

Hi @kssenii , i've did some simple test on 22.1 version.
Look like some simple DDL does not break the replication anymore, however there are some issue i encountered:

  • Detach/Attach any materialized table break the replication ( no more update after re-attach a detached table)
  • Postgres start spamming log like crazy about FATAL: out of relcache_callback_list slots .

@coffenbacher
Copy link

It seems like materialized_postgresql_allow_automatic_update no longer exists, is there a new way to handle basic column additions / removals going forward? Going in manually to attach / detach isn't super slick but also not a big deal, my bigger pain point is that I'm going to have to somehow monitor the replication to see if there's a schema issue now

@singggum3b
Copy link

singggum3b commented Apr 8, 2024

FYI anyone trying to use this engine. I've ended up writing a custom ETL to read PG logical replication and stream to Clickhouse instead, and it's working quite well in production after several iteration (around 5000 logical update item/sec)
There're too may problem and edge case with data types and batching/performance issues which this engine simply does not address.
Ps. Forgot to mention the deduplication issuse. Also with external ETL you can have nice materialize view chaining which is quite useful if handled right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Question?
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants