Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify registration of custom types with pgxpool #2054

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

nicois
Copy link
Contributor

@nicois nicois commented Jun 23, 2024

Sometimes pgx is not capable of inferring the correct codec by
inspecting the database, as is done with LoadType(s), and requires
a user-defined operation to be performed instead.

To allow these custom types to also benefit from connection
initialisation, it is possible to register these functions with pgxpool
using the new RegisterCustomType method.
When using the default configuration, this will still require each new
connection to perform one query to the backend to retrieve the OIDs for
these custom types. This is already a benefit, instead of requiring
a query for each custom type, with the associated latency.

Even better, when the reuseTypeMap pgxpool configuration is selected,
only the first connection requires this query; subsequent connections
will execute the custom registration code, using the cached OID mapping.

@jackc
Copy link
Owner

jackc commented Jun 25, 2024

I'm not following all that's going on here, but I still wonder if we really want helper configuration like AutoLoadTypeNames on the pool or if that should really be part of conn.

... 🤔 ... 🤔 ...

Hmm... I think you're half way to an enormous improvement in making it easy to support custom types. What if the AutoLoadTypeNames and the custom type registration function map was defined on the conn? And that was the only way to access this new functionality?

There would be no new public function or methods, only a couple additions to pgx.ConnConfig. The existing LoadType would be soft deprecated as this approach would be much easier. The only possible use for the existing approach I can imagine is dynamically registering types created after the connection is established.

@nicois
Copy link
Contributor Author

nicois commented Jun 25, 2024

The reason I see pgxpool as the right place for this is that if someone is directly using connections, they want more control over their operations, and can handle the sequencing of calling to autoload.

The place where caching the calls comes in is when you are using connection pooling, and it doesn't make sense to have a mutex on each connection, as it's the pool which should coordinate the loading phase of things. I haven't fully reflected on this, so might follow up a little later in more detail.

This PR is mostly meant to capture my idea, and it's better IMO to look at the other 2 PRs first, with this one just showing you what is going on in my head for subsequent enhancements.

@jackc
Copy link
Owner

jackc commented Jun 25, 2024

Okay, I'll comment on #2046.

@nicois
Copy link
Contributor Author

nicois commented Jun 25, 2024

The problem of pushing this logic down the the connection (config) is that it gets harder to be smart about only doing this once for the connection pool. If every connection automatically makes up to 2 queries to the database when it's created (one if there are custom registrations, like hstore, one for any other non-default types which need loading), this overhead is going to be done for every new connection made by pgxpool. While if this information is provided to the pool instead, these queries are only done once, then every new connection made by the pool is instantly available for use.

If this logic is moved from pgxpool to the connection, either it would still need to be done here, causing code duplication, or we would need to provide a way for the pool to provide the *[]pgtype.Types information which was retrieved for the first connection, to avoid needless re-querying by subsequent connections.

I really think that if someone has elected to not use pgxpool, they want more fine-grained control of exactly how each connection is set up, including how types are registered. They are only creating a single connection, so whether they register a handler with ConnConfig or just call the function themself, it's all the same; there is no win for a single connection in registering a handler which is only called once.

In general I'd say we want to promote the use of pgxpool over directly making connections, unless you know what you are doing. By keeping these optimisations on pgxpool, we aren't preventing people using LoadTypes directly on a single connection, but if someone wants to have multiple connections, they should really use a pool.

@nicois nicois force-pushed the nicois/pgxpool-register-custom-types branch from 0097c59 to 021bf72 Compare June 25, 2024 08:22
@jackc
Copy link
Owner

jackc commented Jun 25, 2024

or we would need to provide a way for the pool to provide the *[]pgtype.Types information which was retrieved for the first connection

Right. That's what I was thinking we would want to do.

I'm not sure of the exact interface, but it just feels off to me to have a config option that really is for a connection to not be available on the connection and only be available on the pool.


Perhaps the interface on the conn should be a callback function specifically for loading types. There could be a helper function to build this that would take the list of derived types and the map of custom types.

The pool would replace this function on the config with a function that copied the the type map for each connection after the first. IDK...

@nicois
Copy link
Contributor Author

nicois commented Jun 25, 2024

If someone is making a single connection, they need to:

  • call some functions to register special types
  • provide a list of type names which can be converted into an array of types using the single SQL call
  • register those types, in that order, with the connection's typemap.

We could have one ConnConfig field which can hold the sequence of functions, for the first step, and a second one holding a list of type names, for the second. But we would also need a third field which would hold a precalculated sequence of types. This last field would be populated by pgxpool with the types calculated by the first two steps (and one way or another, if the last field is populated, the first two fields would not be used). Is this what you mean?

It can be done this way. If it was, I guess the process would be something like this, when pgxpool is used:

  • a ConnConfig is created holding the special registration functions and the list of type names
  • the first time a connection is made, if the renew config option is set, pgxpool acquires a mutex and waits for the connection to be completed, which would include any ConnConfig AfterConnect and ValidateConnect calls. pgxpool then calls conn.TypeMap().Copy() to retrieve a copy of the types and saves them. Subsequent connections use an altered ConnConfig which instead of defining the first two items, instead defines just the third configuration value, which has the list of types to be registered.

This can be done, and ultimately it's your call. If the majority of users did not want to reuse the type mapping, I would agree with you that it's better, as the same approach is used to define the type names and the registration functions regardless of whether the connection is going to be used with a pool or not.

I really just want to repeat what I said earlier: that if someone is managing their connections manually (without using a pool), they are probably doing something exotic. If they have custom registrations, it doesn't help them to define them as functions. If they want to load types after (or before) doing their custom registrations, calling LoadTypes and then TypeMap().RegisterTypes(...) is probably what they would prefer than magic configuration settings. There is a good chance they would want to cache the typemap themselves, too, which they cannot do if these two steps are done automatically, behind the scenes, exposed only via config settings.

In a sense, Conn is a low-level object, providing finer-level control on what is happening. Allowing more explicit control over things like how types are registered is a positive in that situation.
On the other hand, I see PgxPool as a high-level interface to postgresql, and people should be encouraged to use that. Having options to automatically apply types to new connections makes more sense to me here, as we want the pool to properly manage the use and reuse of these.

From my perspective, my primary goal is to get these PRs merged. It's going to work either way. I would be wary of making LoadTypes and RegisterTypes private methods though, as the config options alone are going to prevent certain uses of these methods which people might want. If you're happy with this duplication, I can modify the PRs to work this way.

@nicois nicois force-pushed the nicois/pgxpool-register-custom-types branch 2 times, most recently from 44eb7b7 to cc208ca Compare June 27, 2024 12:05
nicois added 3 commits June 27, 2024 22:18
When loading even a single type into pgx's type map, multiple SQL
queries are performed in series. Over a slow link, this is not ideal.
Worse, if multiple types are being registered, this is repeated multiple
times.

This commit add LoadTypes, which can retrieve type
mapping information for multiple types in a single SQL call, including
recursive fetching of dependent types.
RegisterTypes performs the second stage of this operation.
Provide a backwards-compatible configuration option for pgxpool
which streamlines the use of the bulk loading and registration of types:
- ReuseTypeMaps: if enabled, pgxpool will cache the typemap information,
  avoiding the need to perform any further queries as new connections
  are created.

ReuseTypeMaps is disabled by default as in some situations, a
connection string might resolve to a pool of servers which do not share
the same type name -> OID mapping.
Provide a backwards-compatible configuration option for pgxpool
which streamlines the use of the bulk loading and registration of types:
- ReuseTypeMaps: if enabled, pgxpool will cache the typemap information,
  avoiding the need to perform any further queries as new connections
  are created.

ReuseTypeMaps is disabled by default as in some situations, a
connection string might resolve to a pool of servers which do not share
the same type name -> OID mapping.
@nicois nicois force-pushed the nicois/pgxpool-register-custom-types branch from cc208ca to 43edb0e Compare June 27, 2024 12:19
@nicois nicois marked this pull request as ready for review June 27, 2024 12:20
@nicois nicois marked this pull request as draft July 1, 2024 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants