Skip to content
This repository has been archived by the owner on May 17, 2022. It is now read-only.

Plugin request: Postgres #11

Closed
timgl opened this issue Feb 26, 2021 · 7 comments
Closed

Plugin request: Postgres #11

timgl opened this issue Feb 26, 2021 · 7 comments
Assignees

Comments

@timgl
Copy link
Contributor

timgl commented Feb 26, 2021

I want to dump all my people and events into a postgres database so I can use metabase to do queries.

@mariusandra
Copy link
Contributor

Was this plugin requested by someone? What's the urgency? Will a posthog -> posthog export be good enough, like we have now with the replicator plugin?

The trouble is, if we expose pg for a plugin to use, we might be opening ourselves up for security issues. The connections will be made from our VPC, so they could theoretically find their way towards sensitive data.

Are there any ways we can get around this without seriously complicating the network setup? Is this something we should be worried about?

CC @macobo @fuziontech

@timgl
Copy link
Contributor Author

timgl commented May 7, 2021

@mariusandra Yeah it was requested by someone using cloud but wanting to do their own analysis on Metabase. Haven't heard from them in a while so probably not top urgency (esp as our focus is shifting to self hosted).

Will a posthog -> posthog export be good enough, like we have now with the replicator plugin?

I think something that works out of the box would be nicer but this could work for now if anyone asks.

@fuziontech
Copy link
Member

I would say the best way for them to load postgres would be via loading data from an s3 dump. Having that gap would be more secure and would be less likely to get shoved over from volume.

As for metabase - the best suggestion is for them to spin up a small redshift cluster and have them wrap the data in s3 using that. Loading data to PG is just a bad pattern IMO.

@yakkomajuri
Copy link
Contributor

Ok so I've now ran into this barrier. I’m building a Redshift plugin and was wanting to leverage pg to access it.

There's probably some way to get it done via HTTP for Redshift, but not for any random Postgres instance I'd assume. So it'd be great to be able to use the package.

Of course as a general rule an S3 dump might be best (also could then use COPY instead of INSERT) but I'd love to find ways to make plugins easy to use (i.e. you don't need another service just to export your data).

@macobo
Copy link
Contributor

macobo commented May 19, 2021

The trouble is, if we expose pg for a plugin to use, we might be opening ourselves up for security issues. The connections will be made from our VPC, so they could theoretically find their way towards sensitive data.

This already applies for fetch - e.g. clickhouse exposes a HTTP api which can be used for evil things there. Exposing pg does not change that equation.

@yakkomajuri
Copy link
Contributor

@yakkomajuri
Copy link
Contributor

Ah, well, people aren't dumped yet. Exporting people is something we've been discussing how to do.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants