-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLUSTER: Pulling stream from a node within the origin server cluster, if disconnected and attempting to pull stream again within 3 seconds, an error will be reported. #2901
Comments
(Business Scenario Supplement:
Please make sure to maintain the markdown structure.
|
This seems to be a retry strategy problem, belonging to the boundary conditions that need optimization. You can reduce the retry interval a bit. However, no matter what, when edge sourcing, it is necessary to consider destroying the flow of sourcing to avoid a large number of flows sourcing. Therefore, there will always be a time window where playback is prohibited, which cannot be avoided regardless of optimization efforts. A better way to bypass this issue is to have the client support multiple edges and retry several times in case of failure. This approach can effectively solve the problem. Of course, the retry interval for the edges is worth optimizing.
|
First of all, thank you for patiently answering. Today, I tried following your suggestions and attempted to use flv.js. Since it only allows listening through events and executing reconnection, I had to try it out.
|
This comment was marked as off-topic.
This comment was marked as off-topic.
Thank you. I have read #2215 and also #2707. Indeed, from the discussion, it seems that this issue arises every few months. For now, we can only temporarily solve this problem by following the "listen and reconnect" approach mentioned above. We will try other solutions when we have a better one.
|
Um, client reconnection is definitely necessary. Don't rely solely on server improvements. Stability is a system engineering concern, and of course, the client needs to be considered. In general, there is always a way to bypass common problems. Solutions are always more abundant than problems.
|
I added an event that allows for notifications and waiting. Additionally, I am considering adding a configuration file to enable or disable this mode.
|
Sorry for keeping everyone waiting for so long, I have resolved it in 4.0.267. The solution of waiting for notification is good (although there is an even simpler solution). 👍 Hahaha, everyone actually gave quite a few solutions, but the best one is still SRS, so we don't need to use menstrual patches anymore. 😄 I checked and the retry time is around 3 seconds. The reason is that when the last playback is triggered, it will stop pulling the stream, and this stop [2022-10-10 08:01:45.207] Debug: Start to stop edge coroutine
[2022-10-10 08:01:48.207] Debug: Start to stop edge upstream
[2022-10-10 08:01:48.208] Debug: Start to unpublish source
[2022-10-10 08:01:48.209] Debug: Edge stopped Looking at the code, it is waiting for a fixed 3 seconds, so the mechanism here is not reasonable. void SrsPlayEdge::on_all_client_stop() {
if (state == SrsEdgeStatePlay || state == SrsEdgeStateIngestConnected) {
state = SrsEdgeStateIngestStopping;
ingester->stop();
}
void SrsEdgeIngester::stop() {
trd->stop();
upstream->close();
source->on_unpublish();
}
srs_error_t SrsEdgeIngester::cycle() {
while (true) {
do_cycle(); // Interrupted by stop.
srs_usleep(SRS_EDGE_INGESTER_CIMS); // 3s
}
} A more reasonable approach would be to improve the thread stopping mechanism. The problem is that when the thread is stopped, it interrupts After the improvement, the stop operation is completed within 2ms, which is generally not triggered by normal user operations.
Although condition variables or delayed release are also solutions to this problem, re-releasing the object (somewhat like restarting) is the simplest solution, and the state of the object is also very clear and simple, so the problem is minimized.
|
@winlinvip This modification is relatively simple, but if the edge nodes are attacked, it may bring down the origin server. This contradicts the purpose of our original servers, doesn't it?
|
Preventing attacks is a separate solution, and this retry interval is not designed for preventing attacks.
|
Description
origin.cluster.serverA.conf
origin.cluster.serverB.conf
origin.cluster.serverC.conf
origin.cluster.edge.conf
./objs/srs -c ./conf/origin.cluster.serverB.conf
../objs/srs -c ./conf/origin.cluster.serverC.conf
../objs/srs -c ./conf/origin.cluster.edge.conf
.ffmpeg -i rtsp://admin:[email protected]:554/h264/ch1/main/av_stream -c copy -f flv rtmp://192.168.1.185:19352/live/test01
.ffplay -i rtmp://192.168.1.185:1935/live/test01
.Expected Behavior (Expect)
> Describe your expectation
Please describe what you expect to happen.
Expectation in the source cluster environment
I expect that the edge streaming nodes, in the source cluster environment, can successfully pull streams just like the nodes in the previous standalone deployment, regardless of the time of pulling the stream (assuming the stream is being pushed).
TRANS_BY_GPT3
The text was updated successfully, but these errors were encountered: