Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Was trying to build and run the distributed file system called SeaWeedFS. #1188

Closed
Akilan1999 opened this issue Apr 3, 2022 · 7 comments
Closed

Comments

@Akilan1999
Copy link

I was able to build it successfully. But when running it I ran into the following issue.

syscall(): unimplemented system call 102
syscall(): unimplemented system call 102
syscall(): unimplemented system call 104
syscall(): unimplemented system call 102
syscall(): unimplemented system call 104
syscall(): unimplemented system call 102
syscall(): unimplemented system call 104
I0403 21:45:05     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: address family not supported by protocol
syscall(): unimplemented system call 263
syscall(): unimplemented system call 263
syscall(): unimplemented system call 266
I0403 21:45:05     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: address family not supported by protocol
I0403 21:45:05     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: address family not supported by protocol
I0403 21:45:05     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: address family not supported by protocol
I0403 21:45:05     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: address family not supported by protocol
I0403 21:45:05     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: address family not supported by protocol
I0403 21:45:05     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: address family not supported by protocol

I would assume it would make sense to open the following issue to provide support for the following Syscall.

@nyh
Copy link
Contributor

nyh commented Apr 4, 2022

System call 102 is getuid, 104 is getgid, those should easy to implement (just add one linux in linux.cc).
263 is unlinkat, 266 is symlinkat. Will take slightly more work because we need to implement symlinkat() (although it should be easy, it's similar to other similar functions).

The netlink thing is a harder problem... It's a Linux-specific interface for inquiring about routing another other stuff, that we never implemented. It's not impossible to implement, but the first thing I would check is whether this SeaWeedFS can gracefully recover from failing to use it. If it can continue to work without netlink, I wouldn't rush to implement it.

@wkozaczuk
Copy link
Collaborator

We actually have at least partial netlink support on ipv6 branch but given how rich the netlink interface is it is hard to know if it will be enough. In ipv6 branch netlink is used to implement getifaddr() and if_nameindex() (see b687b7c).

@Akilan1999 would you mind sending a patch to create a simple app to demo running SeaWeedFS on OSv (please see other apps under https://github.com/cloudius-systems/osv-apps for an example).

@Akilan1999
Copy link
Author

sure

@wkozaczuk
Copy link
Collaborator

wkozaczuk commented May 16, 2022

I finally found a bit of time to work on it and I have managed to get SeaweedFS running on OSv:

./scripts/run.py -e '/weed master -port 9333' --forward 'tcp::9333-:9333'
OSv v0.56.0-96-g45990a64
eth0: 192.168.122.15
Booted up in 278.20 ms
Cmdline: /weed master -port 9333
I0516 03:09:29     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:09:29     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:09:29     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:09:29     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:09:29     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:09:29     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:09:29     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:09:29     2 file_util.go:23] Folder /tmp Permission: -rwxrwxr-x
I0516 03:09:29     2 master.go:232] current: :9333 peers:
I0516 03:09:29     2 master_server.go:122] Volume Size Limit is 30000 MB
I0516 03:09:29     2 master.go:143] Start Seaweed Master 30GB 2.96  at :9333
I0516 03:09:29     2 raft_server.go:80] Starting RaftServer with :9333
I0516 03:09:29     2 raft_server.go:129] current cluster leader: 
I0516 03:09:47     2 master.go:176] Start Seaweed Master 30GB 2.96  grpc server at :19333
I0516 03:09:48     2 masterclient.go:80] No existing leader found!
I0516 03:09:48     2 raft_server.go:146] Initializing new cluster
I0516 03:09:48     2 master_server.go:165] leader change event:  => :9333
I0516 03:09:48     2 master_server.go:168] [ :9333 ] :9333 becomes leader.
I0516 03:09:52     2 master_grpc_server.go:278] + client master@:9333
curl http://localhost:9333/cluster/status?pretty=y
{
  "IsLeader": true,
  "Leader": ":9333"
}
./scripts/run.py -e '/weed server -dir=/tmp' --forward 'tcp::9333-:9333'
OSv v0.56.0-96-g45990a64
eth0: 192.168.122.15
Booted up in 282.69 ms
Cmdline: /weed server -dir=/tmp
I0516 03:18:42     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:18:42     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:18:42     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:18:42     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:18:42     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:18:42     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:18:42     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:18:42     2 master.go:232] current: :9333 peers:
I0516 03:18:42     2 file_util.go:23] Folder /tmp Permission: -rwxrwxr-x
I0516 03:18:42     2 file_util.go:23] Folder /tmp Permission: -rwxrwxr-x
I0516 03:18:42     2 network.go:14] failed to detect net interfaces: route ip+net: netlinkrib: invalid argument
I0516 03:18:42     2 volume.go:195] detected volume server ip address: 
I0516 03:18:42     2 volume_grpc_client_to_master.go:41] checkWithMaster :9333: get master :9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp :19333: connect: connection refused"
I0516 03:18:42     2 master.go:232] current: :9333 peers::9333
I0516 03:18:42     2 master_server.go:122] Volume Size Limit is 30000 MB
I0516 03:18:42     2 master.go:143] Start Seaweed Master 30GB 2.96  at :9333
I0516 03:18:42     2 raft_server.go:80] Starting RaftServer with :9333
I0516 03:18:42     2 raft_server.go:129] current cluster leader: 
I0516 03:18:44     2 volume_grpc_client_to_master.go:41] checkWithMaster :9333: get master :9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp :19333: connect: connection refused"
I0516 03:18:45     2 volume_grpc_client_to_master.go:41] checkWithMaster :9333: get master :9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp :19333: connect: connection refused"
I0516 03:18:47     2 volume_grpc_client_to_master.go:41] checkWithMaster :9333: get master :9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp :19333: connect: connection refused"
I0516 03:18:49     2 volume_grpc_client_to_master.go:41] checkWithMaster :9333: get master :9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp :19333: connect: connection refused"
I0516 03:18:51     2 volume_grpc_client_to_master.go:41] checkWithMaster :9333: get master :9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp :19333: connect: connection refused"
I0516 03:18:53     2 volume_grpc_client_to_master.go:41] checkWithMaster :9333: get master :9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp :19333: connect: connection refused"
I0516 03:18:54     2 volume_grpc_client_to_master.go:41] checkWithMaster :9333: get master :9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp :19333: connect: connection refused"
I0516 03:18:56     2 volume_grpc_client_to_master.go:41] checkWithMaster :9333: get master :9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp :19333: connect: connection refused"
I0516 03:18:58     2 volume_grpc_client_to_master.go:41] checkWithMaster :9333: get master :9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp :19333: connect: connection refused"
I0516 03:19:00     2 volume_grpc_client_to_master.go:41] checkWithMaster :9333: get master :9333 configuration: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp :19333: connect: connection refused"
I0516 03:19:00     2 master.go:176] Start Seaweed Master 30GB 2.96  grpc server at :19333
I0516 03:19:01     2 masterclient.go:80] No existing leader found!
I0516 03:19:01     2 raft_server.go:146] Initializing new cluster
I0516 03:19:01     2 master_server.go:165] leader change event:  => :9333
I0516 03:19:01     2 master_server.go:168] [ :9333 ] :9333 becomes leader.
syscall(): unimplemented system call 137
I0516 03:19:02     2 disk_location.go:396] dir /tmp disk free 0.00% < required 1.00%
I0516 03:19:02     2 disk_location.go:182] Store started on dir: /tmp with 0 volumes max 8
I0516 03:19:02     2 disk_location.go:185] Store started on dir: /tmp with 0 ec shards
I0516 03:19:02     2 volume_grpc_client_to_master.go:50] Volume server start with seed master nodes: [:9333]
I0516 03:19:02     2 volume.go:364] Start Seaweed volume server 30GB 2.96  at :8080
I0516 03:19:02     2 volume_grpc_client_to_master.go:107] Heartbeat to: :9333
I0516 03:19:02     2 node.go:222] topo adds child DefaultDataCenter
I0516 03:19:02     2 node.go:222] topo:DefaultDataCenter adds child DefaultRack
I0516 03:19:02     2 node.go:222] topo:DefaultDataCenter:DefaultRack adds child :8080
I0516 03:19:02     2 node.go:222] topo:DefaultDataCenter:DefaultRack::8080 adds child 
I0516 03:19:02     2 master_grpc_server.go:72] added volume server 0: :8080
I0516 03:19:05     2 master_grpc_server.go:278] + client master@:9333
curl http://localhost:9333/dir/status?pretty=y
{
  "Topology": {
    "DataCenters": [
      {
        "Id": "DefaultDataCenter",
        "Racks": [
          {
            "DataNodes": [
              {
                "EcShards": 0,
                "Max": 8,
                "PublicUrl": ":8080",
                "Url": ":8080",
                "VolumeIds": " ",
                "Volumes": 0
              }
            ],
            "Id": "DefaultRack"
          }
        ]
      }
    ],
    "Free": 8,
    "Layouts": null,
    "Max": 8
  },
  "Version": "30GB 2.96 "
}

I do not know SeaweedFS much so I could not really tell how well it works. But it seems to respond to some curl calls.

The netlink support does not seem to be critical. From what I could tell it is used by golang to detect network interfaces (see https://go.dev/src/syscall/netlink_linux.go) but it seems to fall back to another mechanism or assumes some defaults. In any case, I am still planning to port the netlink implementation from the ipv6 branch. My initial experiments seem to confirm that it would be enough to satisfy the needs of golang network interfaces discovery logic.

I should be sending some patches soon - mostly to add a number of syscalls:

+    SYSCALL0(getgid);
+    SYSCALL0(getuid);
+    SYSCALL2(getcwd_syscall, char *, size_t);
+    SYSCALL3(unlinkat, int , const char *, int);
+    SYSCALL3(symlinkat, const char *, int, const char *);
+    SYSCALL3(getdents64, int, void *, size_t);
+    SYSCALL4(renameat, int, const char *, int, const char *);
+    SYSCALL3(lseek, int, off_t, int);

@Akilan1999
Copy link
Author

Akilan1999 commented May 16, 2022

Wow amazing !
Thanks a lot for the patch.

nyh pushed a commit that referenced this issue May 18, 2022
This patch adds 4 new syscalls that map one-to-one to the
four functions listed in the title above. These are required to
run SeaweedFS on OSv.

Refs #1188

Signed-off-by: Waldemar Kozaczuk <[email protected]>
Message-Id: <[email protected]>
wkozaczuk added a commit that referenced this issue May 27, 2022
V2: The only difference is removed delete_dir() function
was accidentally left from previous attempts to implement this
syscall.

It looks like the golang apps that need to iterate over entries
in a directory use a system call getdents64 which is documented
in https://man7.org/linux/man-pages/man2/getdents.2.html. Normally
this functionality is provided by the libc functions like opendir(),
readdir(), etc which actually do delegate to getdents64. Go is known
of bypassing libc in such cases.

So this patch implements the syscall getdents64 by adding a utility
function to VFS main.cc that is then called by syscall in linux.cc.
For details of how this function works please look at the comments.

This patch also adds a unit test to verify this syscall works.

Refs #1188

Signed-off-by: Waldemar Kozaczuk <[email protected]>
wkozaczuk added a commit that referenced this issue Jul 26, 2022
This patch modifies the OSv routecache to handle the case when a route
entry gets added for a non-loopback address and loopback device before
packets start coming from outside of the local network. The practical
example is an app that on startup calls itself on its HTTP endpoint but
using non-loopback address like 192.168.122.15 before any external
requests from outside of the guest. This is exactly what would happen
when running SeaWeedFS (see #1188). This bug can be also replicated
with a golang-httpclient and httpserver-api apps:

./scripts/build image=golang-httpclient,httpserver-monitoring-api //Change '0.0.0.0' in modules/httpserver-api/global_server.cc to '192.168.122.15'
./script/run.py -api -e '/httpclient.so http://192.168.122.15:8000/os/version 0 10' //The app will wait for 10 seconds before shutting down

After it starts, run 'curl http://localhost:8000/os/version' and
observe curl never receive the response.

Let us assume we have an app that binds to 192.168.122.15 and listens
on some port. When a call to connect on a non-loopback address comes from
the app itself, it goes through the layers of the networking stack and
eventually calls route_cache::lookup() to find a route entry for
192.168.122.15 as in the stack trace below.

  in route_cache::lookup (dst=dst@entry=0xffff8000014bb84c, fibnum=<optimized out>, ret=ret@entry=0xffff8000014bb860) at ./bsd/sys/net/routecache.hh:197
  in_pcbladdr (inp=inp@entry=0xffffa0000155a200, faddr=faddr@entry=0xffff8000014bb98c, laddr=laddr@entry=0xffff8000014bb988, cred=0x0) at bsd/sys/netinet/in_pcb.cc:881
  in_pcbconnect_setup (inp=0xffffa0000155a200, nam=0xffffa000008e0510, laddrp=0xffff8000014bb9e4, lportp=0xffff8000014bb9e2, faddrp=0xffffa0000155a264, fportp=0xffffa0000155a254, oinpp=0xffff8000014bb9e8, cred=0x0) at bsd/sys/netinet/in_pcb.cc:1056
  tcp_connect (tp=tp@entry=0xffffa00000f76800, nam=nam@entry=0xffffa000008e0510, td=<optimized out>) at bsd/sys/netinet/tcp_usrreq.cc:1089
  tcp_usr_connect (td=<optimized out>, nam=0xffffa000008e0510, so=<optimized out>) at bsd/sys/netinet/tcp_usrreq.cc:463
  tcp_usr_connect (so=<optimized out>, nam=0xffffa000008e0510, td=<optimized out>) at bsd/sys/netinet/tcp_usrreq.cc:436
  kern_connect (fd=<optimized out>, sa=0xffffa000008e0510) at bsd/sys/kern/uipc_syscalls.cc:374

Initially the route cache is empty so it adds new entry that associates the
address 192.168.122.15, the loopback device and its netmask 255.0.0.0.
Once all subsequent networking stack calls to handle this particular
HTTP request complete, at some point later external client calls the
same HTTP endpoint from outside of OSv guest. While handling this call,
at some point we get to this point as illustrated by the stack trace below:

  in route_cache::lookup (dst=<optimized out>, fibnum=<optimized out>, ret=0xffff800000c4ead0) at ./bsd/sys/net/routecache.hh:197
  in tcp_maxmtu (inc=<optimized out>, flags=0x0) at bsd/sys/netinet/tcp_subr.cc:1690
  in tcp_mssopt (inc=0xffffa00000f7b910) at bsd/sys/netinet/tcp_input.cc:3131
  in syncookie_generate (flowlabel=<synthetic pointer>, sc=0xffffa00000f7b900, sch=<optimized out>) at bsd/sys/netinet/tcp_syncache.cc:1536
  in _syncache_add (inc=<optimized out>, to=<optimized out>, th=<optimized out>, inp=<optimized out>, lsop=0xffff800000c4edc8, m=0xffffa00000f7bf00, toepcb=0x0, tu=0x0) at bsd/sys/netinet/tcp_syncache.cc:1210
  in tcp_input (m=0xffffa00000f7bf00, off0=<optimized out>) at bsd/sys/netinet/tcp_input.cc:941
  in ip_input (m=<optimized out>) at bsd/sys/netinet/ip_input.cc:774
  in netisr_dispatch_src (proto=1, source=<optimized out>, m=0xffffa00000f7bf00) at bsd/sys/net/netisr.cc:769
  in netisr_dispatch_src (proto=9, source=<optimized out>, m=0xffffa00000f7bf00) at bsd/sys/net/netisr.cc:769
  in virtio::net::receiver (this=0xffff900000ba3000) at drivers/virtio-net.cc:545

This time the destination address (dst) is 192.168.122.1 which is a
gateway for the non-loopback (eth0) network. However at this point we have
a single route entry created when handling the first request and the search()
method used by route_cache::lookup() simply compares this entry network
with the network of the IP address in question by applying netmask
255.0.0.0 and it matches so that entry is returned. The problem is that
this and all subsequent packets get routed to the loopback device
instead of the virtio device in this case. And the client never receives
relevant packets to complete the handshake and TCP connection does
not get established from the client perspective. Effectively the
application will never be able to receive any external requests, however
it can still handle any internal ones like the 1st one in this example.

What is interesting if we revert the sequence in this example and send
external request before the app makes an internal request to itself
everything works fine. It is because when handling the external call,
the initial and only route entry gets created for IP 192.168.122.1 and
netmask 255.255.255.0 and non-loopback (eth0) device. The subsequent
internal request would match the netmask and go through the ifloop()
like this illustrates:

0xffff80000119b040 /httpclient.so   0        15.368111615 net_packet_in        b'IP 192.168.122.15.46890 > 192.168.122.15.8000: Flags [S], seq 744996567, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 15368 ecr 0], length 0'
  log_packet_in(mbuf*, int) core/net_trace.cc:143
  netisr_queue_workstream(netisr_workstream*, unsigned int, netisr_work*, mbuf*, int*) [clone .constprop.0] bsd/sys/net/netisr.cc:633
  netisr_queue_src bsd/sys/net/netisr.cc:684
  netisr_queue_src bsd/sys/net/netisr.cc:712
  if_simloop bsd/sys/net/if_loop.cc:291
  ether_output bsd/sys/net/if_ethersubr.cc:256
  ip_output(mbuf*, mbuf*, route*, int, ip_moptions*, inpcb*) bsd/sys/netinet/ip_output.cc:621
  tcp_output bsd/sys/netinet/tcp_output.cc:1385
  tcp_usr_connect bsd/sys/netinet/tcp_usrreq.cc:465
  tcp_usr_connect bsd/sys/netinet/tcp_usrreq.cc:436
  kern_connect bsd/sys/kern/uipc_syscalls.cc:374

So this patch fixes this problem by changing the search() method
to handle the first scenario. In essence, instead of simply
comparing the networks of the entry IP address and dst, it first
identifies type of the device the entry is associated with. If
non-loopback it does the same as before, if device is loopback if checks
if dst is loopback in which case it is a match otherwise it compares
full IP addresses. With the patch applied, in first scenario
when the second external call is handled the entry device is
checked which is loopback, and new logic returns null as full addresses
would not match.

This patch has been backported from the original one by Jan-Michael Kho
contributed to the Spirent fork of OSv -
SpirentOrion@f6e6e54.

V2: Comparing to V1 this version adds more comments to the code
and removes some unnecessary includes.

Co-authored-by: "Jan-Michael Kho" <[email protected]>
Co-authored-by: Waldemar Kozaczuk <[email protected]>
Signed-off-by: Waldemar Kozaczuk <[email protected]>
@wkozaczuk
Copy link
Collaborator

@Akilan1999 I have recently fixed an important bug that makes running SeaweedFS much better on OSv (please see a0251df). In essence, it allows SeaweedFS to bind to an individual interface and talk to itself at the same time. That is why the older examples had to use the --ip 0.0.0.0 which made it impossible to test any practical examples.

I have also run some experiments and benchmarks. I have described pretty detailed steps in the README I have added to the app:

@Akilan1999
Copy link
Author

Awesome !
Thanks a lot for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants