Skip to content
This repository has been archived by the owner on Mar 4, 2024. It is now read-only.

Feature hostnames #309

Merged
merged 14 commits into from
Sep 21, 2022
Merged

Conversation

NorbertHeusser
Copy link
Contributor

This is the second PR to finalize hostname support for the raft library. This will close #298

@NorbertHeusser
Copy link
Contributor Author

NorbertHeusser commented Sep 8, 2022

OK. Fixed all build issues now. And signed all commits.

Re-signed the contributor agreement the third time now :-(

@codecov-commenter
Copy link

codecov-commenter commented Sep 9, 2022

Codecov Report

Merging #309 (0a3a323) into master (09a821f) will increase coverage by 0.06%.
The diff coverage is 93.85%.

@@            Coverage Diff             @@
##           master     #309      +/-   ##
==========================================
+ Coverage   83.54%   83.60%   +0.06%     
==========================================
  Files          49       49              
  Lines        9268     9339      +71     
  Branches     2477     2499      +22     
==========================================
+ Hits         7743     7808      +65     
- Misses        969      970       +1     
- Partials      556      561       +5     
Impacted Files Coverage Δ
src/uv_tcp_connect.c 89.94% <90.90%> (+1.18%) ⬆️
src/uv_tcp_listen.c 91.12% <93.54%> (+1.18%) ⬆️
src/uv_ip.c 78.37% <100.00%> (+0.32%) ⬆️
src/uv_tcp.c 94.52% <100.00%> (+1.18%) ⬆️
src/heap.c 96.15% <0.00%> (-3.85%) ⬇️
src/uv_send.c 88.25% <0.00%> (-1.52%) ⬇️
src/uv_append.c 90.78% <0.00%> (-0.02%) ⬇️
src/uv_writer.c 77.44% <0.00%> (ø)
src/uv_segment.c 88.36% <0.00%> (ø)
src/uv.c 84.64% <0.00%> (+0.21%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@NorbertHeusser
Copy link
Contributor Author

Did an additional force push with just a whitespace formatting correction to force a new cla-check.

@MathieuBordere
Copy link
Contributor

OK. Fixed all build issues now. And signed all commits.

Re-signed the contributor agreement the third time now :-(

The cla-check finally recognized you ... hooray for that :-) Will review soon.

@NorbertHeusser
Copy link
Contributor Author

NorbertHeusser commented Sep 12, 2022

I am happy the cla-check works now for :-)

Perhaps a few comments on the design. As the getaddrinfo may result into multiple IP addresses I need to deal with that in the different parts of the solution. This holds for the synchronous system call as well as for the asynchronous lbuv based invocation. The solution design I have chosen are:

  • For the outgoing connection I need to try one IP after the other until I find a first one working. This will cover the case one of the IP addresses is not open for listening. Once I am able to connect to the port I expect the handshake to work as well. Another solution would have been to try the next IP additionally in case the handshake is failing. But I did not consider this to be a common error scenario. As I need to close the tcp handle after each attempt and reopen it I added an additional bool flag retry to react on it in the abort (otherwise we could try to close the handle twice). It would be possible to combine the two flags resolving and retry in a single one as never both will be set at the same time, but I find the current solution better maintainable.
  • For the listen socket I have to bind all the resolved IP addresses. But to my experience I have seen a bind address resolving to multiple IP addresses very rarely. For this reason I have chosen to keep the current dedicated listener and rename it to default_listener as this single listener will be used in most cases. As I know after the name resolve wether I need multiple listeners I will create multiple listener only in case they are needed and in this rare case don't use the default_listener. This ways it's easier to close everything in a clear way as well.
  • Due to an old glibc behavior to reply with duplicate entries (which is fixed in some linux distros like SLES) I added a small workaround for this as well. Otherwise I would end up in an address already in use error.
  • For testing purpose I implement a kind of getaadrinfo/freeaddrinfo mock. By providing a local implementation of these in function in the testing library. As they will be bound by the linker first. The behavior of this helper file is described in the header. To avoid the need to provide all getaddrinfo mock responses for all test cases and to make sure everything works with the real system getaddrinfo the mock function will forward the calls to the real implementation, if it is not enabled. And the mock functionality is used in the tests cases which want to test some special cases like multiple IPs only.

src/uv_tcp_connect.c Outdated Show resolved Hide resolved
src/uv_tcp_connect.c Outdated Show resolved Hide resolved
src/uv_ip.c Outdated Show resolved Hide resolved
@freeekanayaka
Copy link
Contributor

Looks overall good to me at first sight. Before I dig deeper: how much important is it to support multiple listeners? Could we instead return an error if the hostname resolves to multiple IPs? Or choose the first IP.

I know that's sub-optimal, just wondering if it is worth the cost of additional complexity or if it's something that will probably never be used in practice. For example is it relevant for your own use case?

Since you already did all the work it's a pity to simplify it, but if there's no immediate need for multiple listeners we could keep this code around in a branch and only introduce this support when there's a request for it.

I guess same reasoning for outgoing connections.

test/lib/addrinfo.c Show resolved Hide resolved
test/lib/addrinfo.h Show resolved Hide resolved
@NorbertHeusser
Copy link
Contributor Author

Regarding the listen sockets dealing with multiple IP addresses immediately comes into play, if we would introduce handling of IPv6 addresses. As systems with IPv6 only are still extremely rare today most systems have dual mode using IPv4 and IPV6 at the same time. So with activation of IPv6 additionally to IPv4 most hostnames will resolve one (or more) IPv6 addresses and an IPv4 address.

For outgoing TCP connections the most common use case I have seen in the past is the configuration of multiple network interfaces and IP addresses to achieve fail safety. And this is beside the cluster scale-out concept one of the most common use cases to install a cluster as a HA solution. This might be less common over the last year due to the cheaper availability of network components supporting network bonding.

Right now I need the hostname feature to be able to run our cluster in a docker swarm environment. And this does not support static IPs. But the customer plans to use multiple different virtual networks over his docker swarm. So I donÄt know right now, how fast we need it. But sooner or later we will have a customer asking for it.

And in general the I don't like implementing half of a solution (was already thinking about adding IPv6 support straight away as well). Hostnames resolving to multiple IPs is nothing new, but has been the case since the beginning of the IP protocol definition. And to my opinion dealing with this should be default implementation in all IP network base applications.

@freeekanayaka
Copy link
Contributor

Fair enough, thanks for explaining. I'll give a deeper look at the PR.

@NorbertHeusser
Copy link
Contributor Author

Fixed the requested typos/errors in comments. And applied a clang-format (except for struct comments) to align the formatting.

@freeekanayaka
Copy link
Contributor

Looks good overall.

The main bit I'm not entirely convinced about is the differentiation between default_listener and listeners. In my mind it'd be simpler to actually have just listeners, because in any case you have to handle all the intricacies of the multiple-listeners case no matter what. It's true that listeners will have only 1 item most of the times, but that's fine, the logic will stay the same.

@NorbertHeusser
Copy link
Contributor Author

NorbertHeusser commented Sep 14, 2022

The main bit I'm not entirely convinced about is the differentiation between default_listener and listeners. In my mind it'd be simpler to actually have just listeners, because in any case you have to handle all the intricacies of the multiple-listeners case no matter what. It's true that listeners will have only 1 item most of the times, but that's fine, the logic will stay the same.

What you describe was the first attempt I tried to implement, but I ran into problems with the close/destruction order of some corner cases, when I tried to do this. Will try to describe what I observed. The root cause, if I manage the listeners totally dynamically the listeners array (even for a single listener) will be created after the successful response from the name resolving. In case the name resolving does not work we will end up without any listener tcp handle. But in this case we don't have any handle to close in the UvTcpListenClose and therefor don't trigger the uvTcpListenCloseCbListener. And therefore the uv_tcp_listen will not trigger the UvTcpMaybeFireCloseCb. This all work fine as long as there are outgoing connection, which where connected successfully. In this case the outgoing connections will trigger the UvTcpMaybeFireCloseCb and all structures will get removed. But without any outgoing connections the structures will not be freed. I tried to solve this by direct invocation of UvTcpMaybeFireCloseCb from the UvTcpListenClose in case the listener array is not created. But this will fail, if there are outgoing connections, because they expect the whole tcp object to exists until their close_cb functions are finished.

More generally description:
So there is a destructions dependency between the tcp_listen and tcp_connect part of the code, which work fine right now as both code parts will always have at least have one handle to close. So the code flow will allways close all handles. All close-CB will be invoced and the last one will finally trigger the destruction of the tcp object grouping both.
This clear order of close/destruction no longer works , if one source code does not have a handle to close any more.

Any suggestion welcome how to solve this problem. So how do I trigger the UvTcpMaybeFireCloseCb from the tcp_listen part of the code without knowing, if there are outgoing connections or not.

@freeekanayaka
Copy link
Contributor

I see, thanks. I'll have to look more closely myself as well then, perhaps there's a way out.

@freeekanayaka
Copy link
Contributor

Ok, after reading the code I understand the situation better. I believe we should handle this ordering issue with the same pattern that we use throughout the code: queues (which is what libuv itself does, and basically most asynchronous programs, in way form or another).

My suggestion would be

  1. Replace the struct uv_tcp_s listener; field of struct UvTcp with a queue listeners queue object. This queue will contain all active listeners (see the next point).
  2. Add a new struct uvTcpListener to uv_tcp_listen.c, it should look roughly like this:
/* Hold information about a single listening socket. */
struct uvTcpListener
{
    struct UvTcp *t;                 /* Transport implementation */
    struct uv_tcp_s tcp;             /* Listening TCP socket handle */
    queue queue;                     /* Current queue (either listeners or aborting) */
};
  1. Modify UvTcpListen to create an uvTcpListener object for each required address, initialize its tcp object and call uv_tcp_bind/uv_listen on it, and finally insert it in the listeners queue. This is basically a slight modification of what you do in UvTcpListenOnMultipleIP
  2. Modify UvTcpListenClose to start closing all active listeners, something like:
void UvTcpListenClose(struct UvTcp *t)
{
    queue *head;
    assert(t->closing);
    assert(!QUEUE_IS_EMPTY(&t->listeners));

    while (!QUEUE_IS_EMPTY(&t->accepting)) {
        struct uvTcpIncoming *incoming;
        head = QUEUE_HEAD(&t->accepting);
        incoming = QUEUE_DATA(head, struct uvTcpIncoming, queue);
        uvTcpIncomingAbort(incoming);
    }

    while (!QUEUE_IS_EMPTY(&t->listeners)) {
        struct uvTcpListener *listener;
        head = QUEUE_HEAD(&t->listener);
        listener = QUEUE_DATA(head, struct uvTcpListener, queue);
        uvTcpListenerAbort(listener);
    }
}
  1. Implement uvTcpListenerAbort with something like:
/* The listening TCP handle has been closed, release all memory
 * associated with the listener object. */
static void uvTcpListenerCloseCb(struct uv_handle_s *handle)
{
    struct uvTcpListener *listener = handle->data;
    struct UvTcp *t = listener->t;
    QUEUE_REMOVE(&listener->queue);
    RaftHeapFree(listener);
    UvTcpMaybeFireCloseCb(t);
}

/* Stop listening for incoming connection on the given listener. */
static void uvTcpListenerAbort(struct uvTcpListener *listener)
{
    struct UvTcp *t = listener->t;
    /* After uv_close() returns we are guaranteed that no more listen callback
     * will be called. */
    QUEUE_REMOVE(&listener->queue);
    QUEUE_PUSH(&t->aborting, &listener->queue);
    uv_close((struct uv_handle_s *)&listener->tcp, uvTcpListenerCloseCb);
}
  1. Modify UvTcpMaybeFireCloseCb so it checks only the aborting queue and not t->listener.data (which does not exist anymore):
void UvTcpMaybeFireCloseCb(struct UvTcp *t)
{
    if (!t->closing) {
        return;
    }

    assert(QUEUE_IS_EMPTY(&t->listeners));
    assert(QUEUE_IS_EMPTY(&t->accepting));
    assert(QUEUE_IS_EMPTY(&t->connecting));

    if (!QUEUE_IS_EMPTY(&t->aborting)) {
        return;
    }

    if (t->close_cb != NULL) {
        t->close_cb(t->transport);
    }
}

I believe that should work neatly, unless I'm missing something.

One important note to add is that in the case you mentioned, where the name resolving does not work for the listening address, then the UvTcpListen function should just return an error and the closing sequence shouldn't even be triggered (essentially this error will bubble up and raft_start() will return it, before the event loop is even started).

I'm even not entirely sure why you raised the issue of the name resolving not working, because as I said in that case things are simple, it's just error propagation before the loop is even started, so no actual closing sequence is involved (and certainly there won't be any outgoing connections attempts ongoing).

Hope it all makes sense.

@NorbertHeusser
Copy link
Contributor Author

After reading your long comment about adding an additional queue I was not really sure, if this is really what we need.
The problem I was facing was not about the UvTcpMaybeFireCloseCb being invoked to ealry, but instead nto being invoked at all. Especially your comments about everything around listen is synchronous and should not be a problem let me get back one more time and check, if the problems I was facing were about the unit tests only.

So I removed the default listener from the code, which made the code around the listener much more straight forward. To avoid the UvTcpMaybeFireCloseCb not being invoked at all I now invoke it directly from UvTcpListenClose in case there is no listener.
I got it all working by replacing a assert in the UvTcpMaybeFireCloseCb with an if check (as the UvTcpMaybeFireCloseCb maybe invoked one more time now). And remove one assertion from the unit test (as the tcp close cb maybe invoked in some unit test now directly from the close).

Please take a look, if the PR maybe accepted this way.

test/integration/test_uv_tcp_listen.c Outdated Show resolved Hide resolved
test/integration/test_uv_tcp_listen.c Outdated Show resolved Hide resolved
src/uv_tcp_listen.c Outdated Show resolved Hide resolved
src/uv_tcp_listen.c Outdated Show resolved Hide resolved
src/uv_tcp_listen.c Outdated Show resolved Hide resolved
@freeekanayaka
Copy link
Contributor

This approach looks good to me, I've left a few minor comments, I'll give it another pass later.

Copy link
Contributor

@freeekanayaka freeekanayaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another round of comments.

src/uv_tcp.c Show resolved Hide resolved
src/uv_tcp.c Outdated Show resolved Hide resolved
src/uv_tcp.h Outdated Show resolved Hide resolved
src/uv_tcp_listen.c Outdated Show resolved Hide resolved
Copy link
Contributor

@freeekanayaka freeekanayaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added just one small nit.

Looks good to me, thanks for your work and for bearing with review feedback :)

src/uv_tcp.c Outdated Show resolved Hide resolved
@freeekanayaka
Copy link
Contributor

@stgraber @MathieuBordere this can be merged as far as I'm concerned.

@stgraber
Copy link
Contributor

@MathieuBordere LGTM, feel free to merge if happy with it

@MathieuBordere
Copy link
Contributor

Thanks a lot for your contribution!

@MathieuBordere MathieuBordere merged commit 5b2d180 into canonical:master Sep 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for hostnames instead of static IP addresses
5 participants