Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

svc_destroy_it will destroy an xprt without xp_ops initialized #288

Open
xueqianhu47 opened this issue Feb 21, 2024 · 4 comments
Open

svc_destroy_it will destroy an xprt without xp_ops initialized #288

xueqianhu47 opened this issue Feb 21, 2024 · 4 comments

Comments

@xueqianhu47
Copy link

During the shutdown process in Ganesha, the sequence svc_shutdown() -> svc_xprt_shutdown() -> svc_destroy_it() is invoked to destroy all transports (xprts) in the svc_xprt_fd tree. We encountered a crash within svc_destroy_it() due to a transport (xprt) lacking initialized xp_ops being processed by this function. Examination of this xprt via the core file revealed it was not fully initialized, with most of its fields remaining unset (zeroed).

Upon reviewing the allocation, insertion, and initialization process for new xprts within the svc_xprt_fd tree, we identified the absence of synchronization mechanisms ensuring that an xprt is fully initialized before its insertion into the tree. For instance, svc_vc_ncreatef() first creates a new xprt and inserts it into the svc_xprt_fd tree using svc_xprt_lookup, followed by the initialization of xp_ops through svc_vc_rendezvous_ops(). However, there is no safeguard to guarantee that these steps are executed atomically. Should a Ganesha shutdown occur midway, a crash could result in svc_destroy_it due to a null pointer dereference. A similar process for creating an xprt is also present in svc_fd_ncreatef().

Is there a strategy to mitigate the aforementioned issues?

@ffilz
Copy link
Member

ffilz commented Feb 21, 2024

See also Ganesha issues:

nfs-ganesha/nfs-ganesha#1073
nfs-ganesha/nfs-ganesha#1086

I think there needs to be more synchronization for svc_shutdown.

@ffilz ffilz added the bug label Feb 21, 2024
@xueqianhu47
Copy link
Author

xueqianhu47 commented Mar 7, 2024

@ffilz this issue got hit again in another case: svc_rqst_xprt_task_recv -> svc_vc_rendezvous -> svc_destroy_it and xp_ops == null.

In svc_vc_rendezvous, svc_destroy_it will be called when makefd_xprt finds an existing xprt and the xprt does not have the SVC_XPRT_FLAG_INITIAL flag.

	newxprt = makefd_xprt(fd, req_xd->sx_dr.sendsz, req_xd->sx_dr.recvsz,
			      &si, SVC_XPRT_FLAG_CLOSE);
	if ((!newxprt) || (!(newxprt->xp_flags & SVC_XPRT_FLAG_INITIAL))) {
		close(fd);

		if (newxprt) {
			SVC_DESTROY(newxprt);
			/* Was never added to epoll */
			SVC_RELEASE(newxprt, SVC_RELEASE_FLAG_NONE);
		} else {
			close(fd);
		}

Looks like this is a case that an xprt without SVC_XPRT_FLAG_INITIAL has not yet been initialized for the xp_ops.

The SVC_DESTROY call was added last year, we used to just close(fd):
ac75536
I am just thinking, what if the svc_vc_rendezvous get called twice, the first time we get a newly created xprt(xp_ops has not yet been initialized), then second call comes, makefd_xprt-> svc_xprt_lookup finds the newly created xprt and clear the SVC_XPRT_FLAG_INITIAL flag:

	rpc_dplx_rli(rec);
	xp_flags = atomic_clear_uint16_t_bits(&xprt->xp_flags,
					      SVC_XPRT_FLAG_INITIAL);
	rpc_dplx_rui(rec);

and try to detroy the xprt when be back in svc_vc_rendezvous. Would you this could be the case

@ffilz
Copy link
Member

ffilz commented May 14, 2024

Has a proposed fix here: #289

@ffilz
Copy link
Member

ffilz commented Dec 5, 2024

The patch was merged. Does that resolve the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants