-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Edge pull, the system often encounters errors ret=1018 (Device or resource busy) and ret=1018 (No such file or directory). #511
Comments
This error means "cannot connect to the server.
|
Please specify the version, environment, and reproduction method.
|
Version: 2.0.195 Reproduction method:
|
By querying relevant technical documents, it is said that after setting the socket to non-blocking, calling the recv function before the data packet is sent will result in this error. The program needs to ignore this error and continue looping to read. I hope this information is helpful in fixing this issue. Thank you.
|
Well, let me see. It seems to be saying that it cannot connect to your http callback server. It shouldn't be a problem with recv yet.
|
After closing the hook, it is still the same, connecting to the publishing server also reports 1018 (No such file or directory).
|
Is the release server SRS?
|
Yes.
|
When there is a 1018 (No such file or directory) error on a connection of the edge server, the client can still connect to the edge SRS and play, but there will be frame drops and freezing.
|
Addendum, I am using version 2.0.197 and have set the srs as the Edge node. Upon investigating the code, the error message occurs in The problem seems to arise when the sourced connection is closed within a few seconds of no one watching after the last client disconnects. If there is a new stream playing for the first time during this period, it is prone to failure in the sourced connection. This issue persists in a loop and is also affected by different vhosts. This problem does not occur in version 1.0.
|
At the same time, the same app's same flow has two edge connections simultaneously, occurring twice. SrsComplexHandshake::handshake_with_server
|
Having two reverse source connections in the same stream will definitely cause a crash.
|
The accurate situation is that if the 'edge' node encounters repeated disconnections and quick reconnections from the same stream, it will enter into a loop error while pulling the stream from the third time onwards.
|
The Dragon God said that this problem is a thread issue. I have given him this bug.
|
fixed in 2.0.199 |
Version 2.0.199 indeed fixed the aforementioned issues, but a new problem emerged, which occurs less frequently. Multiple edge flows disconnect almost simultaneously when the SRS thread is closed, causing a core dump.
|
@zhengfl Please summon the Dragon God.
|
请问,,如果我现在下载下来最新2.0 release版本,还会存在这个问题吗? |
You can try the latest version, which is 209: https://github.com/ossrs/srs/tree/2.0release#history.
|
It seems like it was fixed in 2.0.203.
|
ENOENT should be caused by a runaway thread, set by other threads.
|
This is because when close(stfd) is performed, it is not closed correctly, resulting in SRS disabling the feature: disconnecting all client connections when deleting a virtual host. This feature triggers the ENOENT issue. The fd should not block on read and write during close.
|
2.0.211 fixed |
Fly FDFlyFD refers to the situation where FD runs away due to improper closure. When it flies away, it can lead to memory and FD leaks, or even the issue of FD being mysteriously closed. Therefore, FD should not fly, and the return value of close(stfd) must be 0, which we can ensure using assert. How can we ensure that close(stfd) is correct? When closing stfd, it should not be in a waiting state for reading or writing. Consider if a thread is reading or writing stfd:
It is not possible for a single thread to be reading or writing when closing stfd. However, if multiple threads are involved, for example, one thread is responsible for receiving data, another thread is responsible for sending and processing, and they need to exit, then we need to create a separate thread:
If the receiving thread is still active and the stfd is in the EBUSY state, it cannot be closed. To safely close it, the thread must be interrupted first:
Therefore, in the SRS, if there is a thread reading or writing to stfd, the thread must be stopped first before closing stfd, for example, in the case of the forwarder:
If the order is reversed and the thread is stopped first before closing stfd, it will crash.
|
st_thread_interrupt interrupts st_read and st_write.
If there is a system interrupt during the read system call, ST will retry it, and there is no problem with that.
|
Winlin brother:
The stack trace is as follows:
Second time: err=-1
|
fixed in 49853d2 |
…4126) 1. Should always stop coroutine before close fd, see #511, #1784 2. When edge forwarder coroutine quit, always set the error code. 3. Do not unpublish if invalid state. --------- Co-authored-by: Jacob Su <[email protected]>
1. Should always stop coroutine before close fd, see #511, #1784 2. When edge forwarder coroutine quit, always set the error code. 3. Do not unpublish if invalid state. --------- Co-authored-by: Jacob Su <[email protected]>
Journal
TRANS_BY_GPT3
The text was updated successfully, but these errors were encountered: