-
-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Was trying to build and run the distributed file system called SeaWeedFS. #1188
Comments
System call 102 is getuid, 104 is getgid, those should easy to implement (just add one linux in linux.cc). The netlink thing is a harder problem... It's a Linux-specific interface for inquiring about routing another other stuff, that we never implemented. It's not impossible to implement, but the first thing I would check is whether this SeaWeedFS can gracefully recover from failing to use it. If it can continue to work without netlink, I wouldn't rush to implement it. |
We actually have at least partial netlink support on ipv6 branch but given how rich the netlink interface is it is hard to know if it will be enough. In ipv6 branch netlink is used to implement @Akilan1999 would you mind sending a patch to create a simple app to demo running SeaWeedFS on OSv (please see other apps under https://github.com/cloudius-systems/osv-apps for an example). |
sure |
I finally found a bit of time to work on it and I have managed to get SeaweedFS running on OSv:
I do not know SeaweedFS much so I could not really tell how well it works. But it seems to respond to some curl calls. The netlink support does not seem to be critical. From what I could tell it is used by golang to detect network interfaces (see https://go.dev/src/syscall/netlink_linux.go) but it seems to fall back to another mechanism or assumes some defaults. In any case, I am still planning to port the netlink implementation from the ipv6 branch. My initial experiments seem to confirm that it would be enough to satisfy the needs of golang network interfaces discovery logic. I should be sending some patches soon - mostly to add a number of syscalls:
|
Wow amazing ! |
This patch adds 4 new syscalls that map one-to-one to the four functions listed in the title above. These are required to run SeaweedFS on OSv. Refs #1188 Signed-off-by: Waldemar Kozaczuk <[email protected]> Message-Id: <[email protected]>
V2: The only difference is removed delete_dir() function was accidentally left from previous attempts to implement this syscall. It looks like the golang apps that need to iterate over entries in a directory use a system call getdents64 which is documented in https://man7.org/linux/man-pages/man2/getdents.2.html. Normally this functionality is provided by the libc functions like opendir(), readdir(), etc which actually do delegate to getdents64. Go is known of bypassing libc in such cases. So this patch implements the syscall getdents64 by adding a utility function to VFS main.cc that is then called by syscall in linux.cc. For details of how this function works please look at the comments. This patch also adds a unit test to verify this syscall works. Refs #1188 Signed-off-by: Waldemar Kozaczuk <[email protected]>
This patch modifies the OSv routecache to handle the case when a route entry gets added for a non-loopback address and loopback device before packets start coming from outside of the local network. The practical example is an app that on startup calls itself on its HTTP endpoint but using non-loopback address like 192.168.122.15 before any external requests from outside of the guest. This is exactly what would happen when running SeaWeedFS (see #1188). This bug can be also replicated with a golang-httpclient and httpserver-api apps: ./scripts/build image=golang-httpclient,httpserver-monitoring-api //Change '0.0.0.0' in modules/httpserver-api/global_server.cc to '192.168.122.15' ./script/run.py -api -e '/httpclient.so http://192.168.122.15:8000/os/version 0 10' //The app will wait for 10 seconds before shutting down After it starts, run 'curl http://localhost:8000/os/version' and observe curl never receive the response. Let us assume we have an app that binds to 192.168.122.15 and listens on some port. When a call to connect on a non-loopback address comes from the app itself, it goes through the layers of the networking stack and eventually calls route_cache::lookup() to find a route entry for 192.168.122.15 as in the stack trace below. in route_cache::lookup (dst=dst@entry=0xffff8000014bb84c, fibnum=<optimized out>, ret=ret@entry=0xffff8000014bb860) at ./bsd/sys/net/routecache.hh:197 in_pcbladdr (inp=inp@entry=0xffffa0000155a200, faddr=faddr@entry=0xffff8000014bb98c, laddr=laddr@entry=0xffff8000014bb988, cred=0x0) at bsd/sys/netinet/in_pcb.cc:881 in_pcbconnect_setup (inp=0xffffa0000155a200, nam=0xffffa000008e0510, laddrp=0xffff8000014bb9e4, lportp=0xffff8000014bb9e2, faddrp=0xffffa0000155a264, fportp=0xffffa0000155a254, oinpp=0xffff8000014bb9e8, cred=0x0) at bsd/sys/netinet/in_pcb.cc:1056 tcp_connect (tp=tp@entry=0xffffa00000f76800, nam=nam@entry=0xffffa000008e0510, td=<optimized out>) at bsd/sys/netinet/tcp_usrreq.cc:1089 tcp_usr_connect (td=<optimized out>, nam=0xffffa000008e0510, so=<optimized out>) at bsd/sys/netinet/tcp_usrreq.cc:463 tcp_usr_connect (so=<optimized out>, nam=0xffffa000008e0510, td=<optimized out>) at bsd/sys/netinet/tcp_usrreq.cc:436 kern_connect (fd=<optimized out>, sa=0xffffa000008e0510) at bsd/sys/kern/uipc_syscalls.cc:374 Initially the route cache is empty so it adds new entry that associates the address 192.168.122.15, the loopback device and its netmask 255.0.0.0. Once all subsequent networking stack calls to handle this particular HTTP request complete, at some point later external client calls the same HTTP endpoint from outside of OSv guest. While handling this call, at some point we get to this point as illustrated by the stack trace below: in route_cache::lookup (dst=<optimized out>, fibnum=<optimized out>, ret=0xffff800000c4ead0) at ./bsd/sys/net/routecache.hh:197 in tcp_maxmtu (inc=<optimized out>, flags=0x0) at bsd/sys/netinet/tcp_subr.cc:1690 in tcp_mssopt (inc=0xffffa00000f7b910) at bsd/sys/netinet/tcp_input.cc:3131 in syncookie_generate (flowlabel=<synthetic pointer>, sc=0xffffa00000f7b900, sch=<optimized out>) at bsd/sys/netinet/tcp_syncache.cc:1536 in _syncache_add (inc=<optimized out>, to=<optimized out>, th=<optimized out>, inp=<optimized out>, lsop=0xffff800000c4edc8, m=0xffffa00000f7bf00, toepcb=0x0, tu=0x0) at bsd/sys/netinet/tcp_syncache.cc:1210 in tcp_input (m=0xffffa00000f7bf00, off0=<optimized out>) at bsd/sys/netinet/tcp_input.cc:941 in ip_input (m=<optimized out>) at bsd/sys/netinet/ip_input.cc:774 in netisr_dispatch_src (proto=1, source=<optimized out>, m=0xffffa00000f7bf00) at bsd/sys/net/netisr.cc:769 in netisr_dispatch_src (proto=9, source=<optimized out>, m=0xffffa00000f7bf00) at bsd/sys/net/netisr.cc:769 in virtio::net::receiver (this=0xffff900000ba3000) at drivers/virtio-net.cc:545 This time the destination address (dst) is 192.168.122.1 which is a gateway for the non-loopback (eth0) network. However at this point we have a single route entry created when handling the first request and the search() method used by route_cache::lookup() simply compares this entry network with the network of the IP address in question by applying netmask 255.0.0.0 and it matches so that entry is returned. The problem is that this and all subsequent packets get routed to the loopback device instead of the virtio device in this case. And the client never receives relevant packets to complete the handshake and TCP connection does not get established from the client perspective. Effectively the application will never be able to receive any external requests, however it can still handle any internal ones like the 1st one in this example. What is interesting if we revert the sequence in this example and send external request before the app makes an internal request to itself everything works fine. It is because when handling the external call, the initial and only route entry gets created for IP 192.168.122.1 and netmask 255.255.255.0 and non-loopback (eth0) device. The subsequent internal request would match the netmask and go through the ifloop() like this illustrates: 0xffff80000119b040 /httpclient.so 0 15.368111615 net_packet_in b'IP 192.168.122.15.46890 > 192.168.122.15.8000: Flags [S], seq 744996567, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 15368 ecr 0], length 0' log_packet_in(mbuf*, int) core/net_trace.cc:143 netisr_queue_workstream(netisr_workstream*, unsigned int, netisr_work*, mbuf*, int*) [clone .constprop.0] bsd/sys/net/netisr.cc:633 netisr_queue_src bsd/sys/net/netisr.cc:684 netisr_queue_src bsd/sys/net/netisr.cc:712 if_simloop bsd/sys/net/if_loop.cc:291 ether_output bsd/sys/net/if_ethersubr.cc:256 ip_output(mbuf*, mbuf*, route*, int, ip_moptions*, inpcb*) bsd/sys/netinet/ip_output.cc:621 tcp_output bsd/sys/netinet/tcp_output.cc:1385 tcp_usr_connect bsd/sys/netinet/tcp_usrreq.cc:465 tcp_usr_connect bsd/sys/netinet/tcp_usrreq.cc:436 kern_connect bsd/sys/kern/uipc_syscalls.cc:374 So this patch fixes this problem by changing the search() method to handle the first scenario. In essence, instead of simply comparing the networks of the entry IP address and dst, it first identifies type of the device the entry is associated with. If non-loopback it does the same as before, if device is loopback if checks if dst is loopback in which case it is a match otherwise it compares full IP addresses. With the patch applied, in first scenario when the second external call is handled the entry device is checked which is loopback, and new logic returns null as full addresses would not match. This patch has been backported from the original one by Jan-Michael Kho contributed to the Spirent fork of OSv - SpirentOrion@f6e6e54. V2: Comparing to V1 this version adds more comments to the code and removes some unnecessary includes. Co-authored-by: "Jan-Michael Kho" <[email protected]> Co-authored-by: Waldemar Kozaczuk <[email protected]> Signed-off-by: Waldemar Kozaczuk <[email protected]>
@Akilan1999 I have recently fixed an important bug that makes running SeaweedFS much better on OSv (please see a0251df). In essence, it allows SeaweedFS to bind to an individual interface and talk to itself at the same time. That is why the older examples had to use the I have also run some experiments and benchmarks. I have described pretty detailed steps in the README I have added to the app: |
Awesome ! |
I was able to build it successfully. But when running it I ran into the following issue.
I would assume it would make sense to open the following issue to provide support for the following Syscall.
The text was updated successfully, but these errors were encountered: