Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per-connection decoders with PoC citext and hstore support #89

Closed
wants to merge 10 commits into from

Conversation

bjeanes
Copy link

@bjeanes bjeanes commented Mar 25, 2017

This is a follow up to my thoughts in #88.

Summary

Reading from a result set picks a decoder via a connection-local decoder map instead of the global list. Misses on the local map fallback to the existing global map of handlers.

When a connection is first initialised, it is immediately used to interrogate pg_type and expand on the statically-defined global decoder map.

This mechanism is used to implement citext support.

Notes
  • This will cause types to be determined multiple types in a connection pooling scenario. It may instead be preferable to do this once per DB. However, given that connections may or may not be pooled, the current approach seemed the most prudent, at least from a proof-of-concept standpoint.
  • Changes to the types during a connection won't be visible (e.g. CREATE EXTENSION). There's a few options here, but they don't seem deeply important to address now. It does mean the test for citext has to jump through some hoops, though.
  • The statically-defined global map of decoder handlers is depended upon to interpret the result set from querying pg_type.
  • I've left more comments and commented-out code than I usually would as the point of this PR is foremost to communicate the idea. E.g. A small snippet of code shows how a specialised HstoreDecoder could be wired up, too.

bjeanes added 8 commits March 25, 2017 23:21
No-op for now but it introduces some indirection to experiment with decoders
for known types whose OIDs differ from database-to-database, such as `hstore`
or `citext`.
Probably this could do with its own error type or even fail gracefully to the
built-in static list.
Duh... `oid` is a hidden column that I can just query. No need to monkey around
to figure it out.
This refactors the existing behaviour to have a shared interface (which could
be an abstract module) for registering decoders:

    Decoders.register_decoder(decoder : Decoders::Decoder, oid)
    PG::Connection#register_decoder(decoder : Decoders::Decoder, oid)

While the `Decoders` module maintains responsibility for discovering types, it
adds them to the connection's map via this interface, instead of returning a
new `DecoderMap` for the `PG::Connection` to assign.

This also means the connection's `DecoderMap` is never nil and isn't swapped
out. However, it is mutated. This only happens automatically before the
connection is available for use by the caller but the `#register_decoder`
method could be called later.
I just reverse engineered this by looking at the output of the `ByteaDecoder`
on these test values so this may not handle some other scenarios I haven't
accounted for.
I think this could probably take most maps (e.g. Map(Symbol, T)) by always
`#to_s`ing everything except `nil`.

This encoder will need some stronger test cases to ensure values are escaped
properly (I don't think `#inspect` will necessarily cut it).
@bjeanes
Copy link
Author

bjeanes commented Mar 26, 2017

Added an HstoreDecoder and param handling but it's probably pretty naïve.

@bjeanes bjeanes force-pushed the per-connection-decoders branch from aa24fb6 to 2dc4bda Compare March 26, 2017 05:02
@bjeanes bjeanes changed the title Per-connection decoders with PoC citext support Per-connection decoders with PoC citext and hstore support Mar 26, 2017
@bjeanes
Copy link
Author

bjeanes commented Mar 26, 2017

Further thoughts:

  • A lot of types will use the same decoder and as more types are supported, the number of these decoder instances will grow with the connections. They aren't stateful so perhaps stand to be made singletons.
  • This first pass just demonstrates loading a fallback decoder based on the type category and loading one based on the type name. However, for other DOMAINs actually looking up the decoder based on the base type's OID would be good.
  • Array decoding doesn't (yet) benefit from this approach. pg_type does contain enough information that we could add extra array decoders. Though I don't yet know enough about Crystal to see how this dynamic approach could support the recursive type definition employed now.
  • Is there any reason read_array and read approaches can't be unified? Array entries in pg_type include OID for the elements' type, which could be used to pass in the element decoder.
  • This code still needs a bit of factoring (I don't think this new code all belongs in Decoders directly—probably in a child namespace).

PG_DB.exec "drop table if exists test_table"
PG_DB.exec "create table test_table (v #{datatype})"
begin
PG_DB.exec "create extension \"#{extension}\"" if extension
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add if exists here and in the drop extension later on, to avoid errors if the test database isn't 100% clean

@will
Copy link
Owner

will commented Mar 29, 2017

I need to think more on your questions. I'm at a postgres conf this week (and giving a talk including the protocol parts of this project 😀), so I'll dive in next week. Thanks for putting this together!

@bjeanes
Copy link
Author

bjeanes commented Mar 29, 2017 via email

@will
Copy link
Owner

will commented Apr 15, 2017

Hey @bjeanes did you have any progress on the other approach?

@bjeanes
Copy link
Author

bjeanes commented Apr 18, 2017

@will ah no I got sidetracked with other things.

I was basically wanting to turn the DecoderMap as Hash(OID, Decoder) to TypeMap as Hash(OID, PG::Type). PG::Type is a struct which includes a (singleton) reference to decoder but also also more information from the pg_type table. I don't know if this is actually valuable but I have a hunch that it could make array decoding less of a special-case and allow for some sane assumptions for newly encountered types based on the category (geometric, numeric, etc).

If not that approach, I think my actual PoC here could be factored to be less ad-hoc as I kinda just dumped some code in the existing decoder code.

I dunno... what do you think? Did you have any imagined direction when you've contemplated per-connection type support? Anything you'd want to see to have a this or a derivative of this be merged?

@bjeanes
Copy link
Author

bjeanes commented Apr 18, 2017

Oh I also think there's more opportunity for composite or higher-order decoders to leverage the primitive decoders and perhaps for some symmetry between encoding/decoding, which might make it a bit clearer when adding support for future types.

@bjeanes bjeanes force-pushed the per-connection-decoders branch from d5ffde1 to 58e1211 Compare April 18, 2017 01:42
def self.register_connection_decoders(connection : PG::Connection) : Void
types = connection.query_all(TYPE_SQL, as: {UInt32, String, Char})
types.each do |oid, name, category|
oid = oid.to_i32 # Query execution if I read straight to Int32 :\
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops this comment should read "query execution hangs if...".

That was a reminder to me to ask @will why that is? I was fighting a lot of mysterious hangs if I didn't get things right. It can be hard to debug sometimes. Any advice?

@bjeanes
Copy link
Author

bjeanes commented Feb 18, 2018

I've been thinking about picking back up my side project that needed this functionality and drove this PR. @will are you still working on this lib and if so what do you see as the path forward for hstore/citext support?

@will
Copy link
Owner

will commented Feb 19, 2018

Yeah, this is still a project. I think the only way for any sort of extension data type where the oid changes from database to database will require a per-connection mapping. If you're interested in working on this more, I'm happy to help if you run into anything.

@bjeanes bjeanes closed this Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants