Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLUSTER: Pulling stream from a node within the origin server cluster, if disconnected and attempting to pull stream again within 3 seconds, an error will be reported. #2901

Closed
wr-ning opened this issue Feb 7, 2022 · 10 comments
Assignees
Labels
Enhancement Improvement or enhancement. good first issue Easy to fix issues, good for newcomers TransByAI Translated by AI/GPT.
Milestone

Comments

@wr-ning
Copy link

wr-ning commented Feb 7, 2022

Note: Before asking a question, please read the FAQ (Please read FAQ before filing an issue) #2716

Description

Please describe your issue here

There are currently 3 nodes in the origin server cluster, A, B, and C, and one edge streaming node. There is a stream being pushed from node C.
When the player pulls the stream from the edge node for the first time, it plays normally. However, if the player stops and immediately tries to pull the stream again, it fails to play and the SRS log reports an error.
But when the player pulls the stream from node C in the origin server cluster, it plays normally. Even if the player stops and immediately tries to pull the stream again, it plays without any issues. This can be repeated multiple times.

After continuous testing, it has been discovered that when the player pulls the stream from the edge node, if it disconnects and tries to pull the stream again within 3 seconds, it will result in a failed stream pull.
Even the latest version of srs-server-4.0-b6 has been tested and it also experiences this issue.

After checking the issues, I thought it was #1520, but upon careful reading, I realized it is not the same issue. (#1520)

The error message is as follows:
[2022-02-07 16:34:54.525][Error][109157][t4269891][11] serve error code=2020 : service cycle : rtmp: stream service : rtmp: create consumer : play edge : state is stopping
From the error message, it seems that the current edge node is performing a stop operation, preventing the stream from being pulled.


  1. SRS Version: srs-server-4.0-b1(4.0.206)
  2. Operating System: ubuntu-16.04.6-server-amd64
  3. SRS Installation Package: srs-server-4.0-b1.tar.gz
  1. SRS edge node log is as follows (Log):
[2022-02-07 16:34:29.058][Trace][109157][jp1i16y3] XCORE-SRS/4.0.206(Leo)
[2022-02-07 16:34:29.058][Trace][109157][jp1i16y3] config parse complete
[2022-02-07 16:34:29.058][Trace][109157][jp1i16y3] write log to console
[2022-02-07 16:34:29.058][Trace][109157][jp1i16y3][MAIN] SRS/4.0.206(Leo), MIT
[2022-02-07 16:34:29.058][Trace][109157][jp1i16y3] authors: https://github.com/ossrs/srs/blob/4.0release/trunk/AUTHORS.txt
[2022-02-07 16:34:29.058][Trace][109157][jp1i16y3] contributors: Winlin<[email protected]> Winlin<[email protected]> Winlin<[email protected]> Wenjie<[email protected]> Wenjie<[email protected]> Wenjie<[email protected]> xiangcheng.liu<[email protected]> naijia.liu<[email protected]> alcoholyi<[email protected]> byteman<[email protected]> chad.wang<[email protected]> suhetao<[email protected]> Johnny<[email protected]> karthikeyan<[email protected]> StevenLiu<[email protected]> zhengfl<[email protected]> tufang14<[email protected]> allspace<[email protected]> niesongsong<[email protected]> rudeb0t<[email protected]> CallMeNP<[email protected]> synote<[email protected]> lovecat<[email protected]> panda1986<[email protected]> YueHonghui<[email protected]> ThomasDreibholz<[email protected]> JuntaoLiu<[email protected]> RocFang<[email protected]> MakarovYaroslav<[email protected]> MirkoVelic<[email protected]> HuiZhang(huzhang2)<[email protected]> OtterWa<[email protected]> walkermi<[email protected]> haofz<[email protected]> ME_Kun_Han<[email protected]> ljx0305<[email protected]> cenxinwei<[email protected]> StarBrilliant<[email protected]> xubin<[email protected]> intliang<[email protected]> flowerwrong<[email protected]> YLX<[email protected]> J<[email protected]> Harlan<[email protected]> hankun<[email protected]> JonathanBarratt<[email protected]> KeeganH<[email protected]> StevenLiu<[email protected]> liuxc0116<[email protected]> ChengdongZhang<[email protected]> lovacat<[email protected]> qiang.li<[email protected]> HungMingWu<[email protected]> Himer<[email protected]> XiaLixin<[email protected]> XiaLixin<[email protected]> XiaLixin<[email protected]> XiaLixin<[email protected]> alphonsetai<[email protected]> Michael.Ma<[email protected]> lam2003<[email protected]> ShiWei<[email protected]> ShiWei<[email protected]> XiaofengWang<[email protected]> XiaoZhihong<[email protected]> XiaoZhihong<[email protected]> XiaoZhihong<[email protected]> yanghuiwen<[email protected]> WuPengqiang<[email protected]> WuPengqiang<[email protected]> l<22312935+lam2003@github> xfalcon<x-falcon@github> ChenGuanghua<[email protected]> ChenGuanghua<[email protected]> LiPeng<[email protected]> LiPeng<[email protected]> yajun18<[email protected]> liulichuan<[email protected]> yapingcat<[email protected]> chenchengbin<[email protected]> ChenHaibo<[email protected]> ChenHaibo<[email protected]> jasongwq<[email protected]> yinjiaoyuan<[email protected]> PieerePi<[email protected]> JesseXi<[email protected]> PieerePi<40780488+PieerePi@github> ghostsf<[email protected]> xbpeng121<53243357+xbpeng121@github> johzzy<[email protected]> stone<[email protected]> cfw11<34058899+cfw11@github> Hung-YiChen<[email protected]> long<[email protected]> matthew1838<77285055+matthew1838@github> rise<[email protected]> 
[2022-02-07 16:34:29.058][Trace][109157][jp1i16y3] cwd=/home/wsm/srs-server-4.0-b1/trunk, work_dir=./, build: 2021-12-20 20:18:37, configure: --x86-x64, uname: Linux ubuntu 4.4.0-142-generic #168-Ubuntu SMP Wed Jan 16 21:00:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux, osx: 0
[2022-02-07 16:34:29.058][Trace][109157][jp1i16y3] configure detail: --prefix=/usr/local/srs --hls=on --hds=off --dvr=on --ssl=on --https=on --ssl-1-0=off --ssl-local=off --sys-ssl=off --transcode=on --ingest=on --stat=on --http-callback=on --http-server=on --stream-caster=on --http-api=on --utest=off --cherrypy=off --srt=off --rtc=on --simulator=off --cxx11=off --cxx14=off --ffmpeg-fit=on --nasm=on --srtp-nasm=on --clean=on --gperf=off --gmc=off --gmd=off --gmp=off --gcp=off --gprof=off --static=off --shared-st=off --shared-srt=off --shared-ffmpeg=off --log-verbose=off --log-info=off --log-trace=on --gcov=off --debug=off --debug-stats=off --cross-build=off --cc=gcc --cxx=g++ --ar=ar --ld=ld --randlib=randlib
[2022-02-07 16:34:29.058][Trace][109157][jp1i16y3] srs checking config...
[2022-02-07 16:34:29.059][Trace][109157][jp1i16y3] ips, iface[0] bond0 ipv4 0x11443 192.168.1.185, iface[1] bond0 ipv6 0x11443 fe80::1618:77ff:fe4d:57f%bond0
[2022-02-07 16:34:29.059][Trace][109157][jp1i16y3] devices, intranet bond0 192.168.1.185, intranet bond0 fe80::1618:77ff:fe4d:57f%bond0
[2022-02-07 16:34:29.059][Warn][109157][jp1i16y3][22] stats network use index=0, ip=192.168.1.185, ifname=bond0
[2022-02-07 16:34:29.059][Warn][109157][jp1i16y3][22] stats disk not configed, disk iops disabled.
[2022-02-07 16:34:29.059][Trace][109157][jp1i16y3] write log to console
[2022-02-07 16:34:29.059][Trace][109157][jp1i16y3] features, rch:on, dash:on, hls:on, hds:off, srt:off, hc:on, ha:on, hs:on, hp:on, dvr:on, trans:on, inge:on, stat:on, sc:on
[2022-02-07 16:34:29.059][Trace][109157][jp1i16y3] SRS on  amd64 x86_64, conf:./conf/origin.cluster.edge.conf, limit:1000, writev:1024, encoding:little-endian, HZ:100
[2022-02-07 16:34:29.059][Trace][109157][jp1i16y3] mw sleep:350ms. mr enabled:on, default:0, sleep:350ms
[2022-02-07 16:34:29.059][Trace][109157][jp1i16y3] gc:on, pq:30000ms, cscc:[0,16), csa:on, tn:on(may hurts performance), ss:auto(guess by merged write)
[2022-02-07 16:34:29.059][Trace][109157][jp1i16y3] system default latency(ms): mw(0-350) + mr(0-350) + play-queue(0-30000)
[2022-02-07 16:34:29.059][Warn][109157][jp1i16y3][22] SRS/4.0.206 is not stable
[2022-02-07 16:34:29.066][Trace][109157][jp1i16y3] fingerprint=C7:56:04:AE:A3:19:FD:1B:D3:2B:AD:B3:2E:A9:BF:6C:6E:6E:7E:12:C9:5F:D9:06:CF:5F:A1:7D:04:FB:A7:80
[2022-02-07 16:34:29.066][Trace][109157][jp1i16y3] CircuitBreaker: enabled=1, high=2x90, critical=1x95, dying=5x99
[2022-02-07 16:34:29.066][Trace][109157][jp1i16y3] http: root mount to ./objs/nginx/html
[2022-02-07 16:34:29.066][Trace][109157][jp1i16y3] server main cid=jp1i16y3, pid=109157, ppid=53212, asprocess=0
[2022-02-07 16:34:29.067][Trace][109157][jp1i16y3] write pid=109157 to objs/edge.pid success!
[2022-02-07 16:34:29.067][Trace][109157][jp1i16y3] RTMP listen at tcp://0.0.0.0:1935, fd=6
[2022-02-07 16:34:29.067][Trace][109157][jp1i16y3] signal installed, reload=1, reopen=10, fast_quit=15, grace_quit=3
[2022-02-07 16:34:29.067][Trace][109157][jp1i16y3] http: api mount /console to ./objs/nginx/html/console
[2022-02-07 16:34:29.068][Trace][109157][4s1606j8] Hybrid cpu=0.00%,11MB
[2022-02-07 16:34:29.068][Warn][109157][z993989r][22] use private address as ip: 192.168.1.185, ifname=bond0
[2022-02-07 16:34:29.068][Trace][109157][z993989r] Startup query id=c3ed94bc87f011ec84a7cfcef7e51d2e, eip=192.168.1.185, wait=312s
[2022-02-07 16:34:29.068][Trace][109157][mp510936] TCP: connection manager run, conns=0
[2022-02-07 16:34:29.069][Trace][109157][0tz9q447] RTC: connection manager run, conns=0
[2022-02-07 16:34:34.055][Trace][109157][4s1606j8] Hybrid cpu=0.00%,11MB
[2022-02-07 16:34:39.055][Trace][109157][4s1606j8] Hybrid cpu=1.00%,11MB
[2022-02-07 16:34:44.055][Trace][109157][4s1606j8] Hybrid cpu=1.00%,11MB, cid=1,1, timer=63,0,0, clock=0,49,1,0,0,0,0,0,0
[2022-02-07 16:34:49.055][Trace][109157][4s1606j8] Hybrid cpu=0.00%,11MB, cid=1,1, timer=63,0,0, clock=0,49,1,0,0,0,0,0,0
[2022-02-07 16:34:50.965][Trace][109157][55u9592w] RTMP client ip=192.168.1.210:62588, fd=7
[2022-02-07 16:34:50.970][Trace][109157][55u9592w] complex handshake success
[2022-02-07 16:34:50.971][Trace][109157][55u9592w] connect app, tcUrl=rtmp://192.168.1.185:1935/live, pageUrl=, swfUrl=, schema=rtmp, vhost=192.168.1.185, port=1935, app=live, args=null
[2022-02-07 16:34:50.971][Trace][109157][55u9592w] protocol in.buffer=0, in.ack=0, out.ack=0, in.chunk=128, out.chunk=128
[2022-02-07 16:34:51.052][Trace][109157][55u9592w] ignore AMF0/AMF3 command message.
[2022-02-07 16:34:51.093][Trace][109157][55u9592w] ignore AMF0/AMF3 command message.
[2022-02-07 16:34:51.093][Trace][109157][55u9592w] client identified, type=rtmp-play, vhost=192.168.1.185, app=live, stream=test01, param=, duration=-1ms
[2022-02-07 16:34:51.093][Trace][109157][55u9592w] connected stream, tcUrl=rtmp://192.168.1.185:1935/live, pageUrl=, swfUrl=, schema=rtmp, vhost=__defaultVhost__, port=1935, app=live, stream=test01, param=, args=null
[2022-02-07 16:34:51.093][Trace][109157][55u9592w] new source, stream_url=/live/test01
[2022-02-07 16:34:51.093][Trace][109157][55u9592w] source url=/live/test01, ip=192.168.1.210, cache=1, is_edge=1, source_id=/
[2022-02-07 16:34:51.093][Trace][109157][55u9592w] ignore disabled exec for vhost=__defaultVhost__
[2022-02-07 16:34:51.094][Trace][109157][55u9592w] dispatch cached gop success. count=0, duration=-1
[2022-02-07 16:34:51.094][Trace][109157][55u9592w] create consumer, active=1, queue_size=0.00, jitter=30000000
[2022-02-07 16:34:51.094][Trace][109157][55u9592w] set fd=7, SO_SNDBUF=87040=>175000, buffer=350ms
[2022-02-07 16:34:51.094][Trace][109157][55u9592w] start play smi=0ms, mw_sleep=350, mw_msgs=8, realtime=0, tcp_nodelay=0
[2022-02-07 16:34:51.094][Trace][109157][55u9592w] update source_id=ueeyh653/55u9592w
[2022-02-07 16:34:51.096][Trace][109157][ueeyh653] complex handshake success.
[2022-02-07 16:34:51.097][Trace][109157][ueeyh653] protocol in.buffer=0, in.ack=0, out.ack=0, in.chunk=128, out.chunk=128
[2022-02-07 16:34:51.176][Trace][109157][ueeyh653] connected, version=4.0.206.0, ip=127.0.0.1, pid=109153, id=0, dsu=1
[2022-02-07 16:34:51.176][Trace][109157][ueeyh653] edge-pull publish url rtmp://127.0.0.1:19351/live/test01, stream=test01 as test01
[2022-02-07 16:34:51.176][Trace][109157][ueeyh653] edge change from 100 to state 101 (pull).
[2022-02-07 16:34:51.216][Warn][109157][ueeyh653][11] RTMP redirect /live/test01 from 127.0.0.1:19351 to rtmp://127.0.0.1:19352/live/test01
[2022-02-07 16:34:51.219][Trace][109157][ueeyh653] complex handshake success.
[2022-02-07 16:34:51.219][Trace][109157][ueeyh653] protocol in.buffer=0, in.ack=0, out.ack=0, in.chunk=128, out.chunk=128
[2022-02-07 16:34:51.296][Trace][109157][ueeyh653] connected, version=4.0.206.0, ip=127.0.0.1, pid=109154, id=0, dsu=1
[2022-02-07 16:34:51.296][Trace][109157][ueeyh653] edge-pull publish url rtmp://127.0.0.1:19352/live/test01, stream=test01 as test01
[2022-02-07 16:34:51.297][Trace][109157][ueeyh653] got metadata, width=1920, height=1080, vcodec=7, acodec=7
[2022-02-07 16:34:51.297][Trace][109157][ueeyh653] 42B video sh,  codec(7, profile=High, level=4, 1920x1080, 0kbps, 0.0fps, 0.0s)
[2022-02-07 16:34:53.129][Warn][109157][ueeyh653][4] origin disconnected, retry, error code=1007 : recv message : recv interlaced message : read basic header : basic header requires 1 bytes : read bytes : read
thread [109157][ueeyh653]: ingest() [src/app/srs_app_edge.cpp:320][errno=4]
thread [109157][ueeyh653]: recv_message() [src/protocol/srs_rtmp_stack.cpp:372][errno=4]
thread [109157][ueeyh653]: recv_interlaced_message() [src/protocol/srs_rtmp_stack.cpp:859][errno=4]
thread [109157][ueeyh653]: read_basic_header() [src/protocol/srs_rtmp_stack.cpp:954][errno=4]
thread [109157][ueeyh653]: grow() [src/protocol/srs_protocol_stream.cpp:162][errno=4]
thread [109157][ueeyh653]: read() [src/protocol/srs_service_st.cpp:514][errno=4]
[2022-02-07 16:34:54.056][Trace][109157][4s1606j8] Hybrid cpu=0.00%,14MB, cid=1,1, timer=63,0,0, clock=0,49,1,0,0,0,0,0,0
[2022-02-07 16:34:54.392][Trace][109157][t4269891] RTMP client ip=192.168.1.210:62591, fd=9
[2022-02-07 16:34:54.406][Trace][109157][t4269891] complex handshake success
[2022-02-07 16:34:54.406][Trace][109157][t4269891] connect app, tcUrl=rtmp://192.168.1.185:1935/live, pageUrl=, swfUrl=, schema=rtmp, vhost=192.168.1.185, port=1935, app=live, args=null
[2022-02-07 16:34:54.406][Trace][109157][t4269891] protocol in.buffer=0, in.ack=0, out.ack=0, in.chunk=128, out.chunk=128
[2022-02-07 16:34:54.484][Trace][109157][t4269891] ignore AMF0/AMF3 command message.
[2022-02-07 16:34:54.524][Trace][109157][t4269891] ignore AMF0/AMF3 command message.
[2022-02-07 16:34:54.524][Trace][109157][t4269891] client identified, type=rtmp-play, vhost=192.168.1.185, app=live, stream=test01, param=, duration=-1ms
[2022-02-07 16:34:54.524][Trace][109157][t4269891] connected stream, tcUrl=rtmp://192.168.1.185:1935/live, pageUrl=, swfUrl=, schema=rtmp, vhost=__defaultVhost__, port=1935, app=live, stream=test01, param=, args=null
[2022-02-07 16:34:54.524][Trace][109157][t4269891] source url=/live/test01, ip=192.168.1.210, cache=1, is_edge=1, source_id=ueeyh653/55u9592w
[2022-02-07 16:34:54.525][Trace][109157][t4269891] TCP: before dispose resource(RtmpConn)(0x13f4090), conns=2, zombies=0, ign=0, inz=0, ind=0
[2022-02-07 16:34:54.525][Error][109157][t4269891][11] serve error code=2020 : service cycle : rtmp: stream service : rtmp: create consumer : play edge : state is stopping
thread [109157][t4269891]: do_cycle() [src/app/srs_app_rtmp_conn.cpp:217][errno=11]
thread [109157][t4269891]: service_cycle() [src/app/srs_app_rtmp_conn.cpp:414][errno=11]
thread [109157][t4269891]: playing() [src/app/srs_app_rtmp_conn.cpp:659][errno=11]
thread [109157][t4269891]: create_consumer() [src/app/srs_app_source.cpp:2582][errno=11]
thread [109157][t4269891]: on_client_play() [src/app/srs_app_edge.cpp:674][errno=11](Resource temporarily unavailable)
[2022-02-07 16:34:54.525][Trace][109157][mp510936] TCP: clear zombies=1 resources, conns=2, removing=0, unsubs=0
[2022-02-07 16:34:54.525][Trace][109157][t4269891] TCP: disposing #0 resource(RtmpConn)(0x13f4090), conns=2, disposing=1, zombies=0
[2022-02-07 16:34:55.340][Trace][109157][h94o04l7] RTMP client ip=192.168.1.210:62592, fd=9
[2022-02-07 16:34:55.355][Trace][109157][h94o04l7] complex handshake success
[2022-02-07 16:34:55.356][Trace][109157][h94o04l7] connect app, tcUrl=rtmp://192.168.1.185:1935/live, pageUrl=, swfUrl=, schema=rtmp, vhost=192.168.1.185, port=1935, app=live, args=null
[2022-02-07 16:34:55.356][Trace][109157][h94o04l7] protocol in.buffer=0, in.ack=0, out.ack=0, in.chunk=128, out.chunk=128
[2022-02-07 16:34:55.436][Trace][109157][h94o04l7] ignore AMF0/AMF3 command message.
[2022-02-07 16:34:55.476][Trace][109157][h94o04l7] ignore AMF0/AMF3 command message.
[2022-02-07 16:34:55.476][Trace][109157][h94o04l7] client identified, type=rtmp-play, vhost=192.168.1.185, app=live, stream=test01, param=, duration=-1ms
[2022-02-07 16:34:55.476][Trace][109157][h94o04l7] connected stream, tcUrl=rtmp://192.168.1.185:1935/live, pageUrl=, swfUrl=, schema=rtmp, vhost=__defaultVhost__, port=1935, app=live, stream=test01, param=, args=null
[2022-02-07 16:34:55.476][Trace][109157][h94o04l7] source url=/live/test01, ip=192.168.1.210, cache=1, is_edge=1, source_id=ueeyh653/55u9592w
[2022-02-07 16:34:55.477][Trace][109157][h94o04l7] TCP: before dispose resource(RtmpConn)(0x13f3a00), conns=2, zombies=0, ign=0, inz=0, ind=0
[2022-02-07 16:34:55.477][Error][109157][h94o04l7][11] serve error code=2020 : service cycle : rtmp: stream service : rtmp: create consumer : play edge : state is stopping
thread [109157][h94o04l7]: do_cycle() [src/app/srs_app_rtmp_conn.cpp:217][errno=11]
thread [109157][h94o04l7]: service_cycle() [src/app/srs_app_rtmp_conn.cpp:414][errno=11]
thread [109157][h94o04l7]: playing() [src/app/srs_app_rtmp_conn.cpp:659][errno=11]
thread [109157][h94o04l7]: create_consumer() [src/app/srs_app_source.cpp:2582][errno=11]
thread [109157][h94o04l7]: on_client_play() [src/app/srs_app_edge.cpp:674][errno=11](Resource temporarily unavailable)
[2022-02-07 16:34:55.477][Trace][109157][mp510936] TCP: clear zombies=1 resources, conns=2, removing=0, unsubs=0
[2022-02-07 16:34:55.477][Trace][109157][h94o04l7] TCP: disposing #0 resource(RtmpConn)(0x13f3a00), conns=2, disposing=1, zombies=0
[2022-02-07 16:34:56.129][Trace][109157][55u9592w] cleanup when unpublish
[2022-02-07 16:34:56.129][Trace][109157][55u9592w] edge change from 101 to 300 then 0 (init).
[2022-02-07 16:34:56.129][Trace][109157][55u9592w] TCP: before dispose resource(RtmpConn)(0x13656d0), conns=1, zombies=0, ign=0, inz=0, ind=0
[2022-02-07 16:34:56.129][Warn][109157][55u9592w][4] client disconnect peer. ret=1007
[2022-02-07 16:34:56.129][Trace][109157][mp510936] TCP: clear zombies=1 resources, conns=1, removing=0, unsubs=0
[2022-02-07 16:34:56.129][Trace][109157][55u9592w] TCP: disposing #0 resource(RtmpConn)(0x13656d0), conns=1, disposing=1, zombies=0
[2022-02-07 16:34:56.180][Trace][109157][5rz6t681] RTMP client ip=192.168.1.210:62595, fd=7
[2022-02-07 16:34:56.184][Trace][109157][5rz6t681] complex handshake success
[2022-02-07 16:34:56.184][Trace][109157][5rz6t681] connect app, tcUrl=rtmp://192.168.1.185:1935/live, pageUrl=, swfUrl=, schema=rtmp, vhost=192.168.1.185, port=1935, app=live, args=null
[2022-02-07 16:34:56.184][Trace][109157][5rz6t681] protocol in.buffer=0, in.ack=0, out.ack=0, in.chunk=128, out.chunk=128
[2022-02-07 16:34:56.266][Trace][109157][5rz6t681] ignore AMF0/AMF3 command message.
[2022-02-07 16:34:56.305][Trace][109157][5rz6t681] ignore AMF0/AMF3 command message.
[2022-02-07 16:34:56.305][Trace][109157][5rz6t681] client identified, type=rtmp-play, vhost=192.168.1.185, app=live, stream=test01, param=, duration=-1ms
[2022-02-07 16:34:56.305][Trace][109157][5rz6t681] connected stream, tcUrl=rtmp://192.168.1.185:1935/live, pageUrl=, swfUrl=, schema=rtmp, vhost=__defaultVhost__, port=1935, app=live, stream=test01, param=, args=null
[2022-02-07 16:34:56.305][Trace][109157][5rz6t681] source url=/live/test01, ip=192.168.1.210, cache=1, is_edge=1, source_id=/ueeyh653
[2022-02-07 16:34:56.305][Trace][109157][5rz6t681] ignore disabled exec for vhost=__defaultVhost__
[2022-02-07 16:34:56.305][Trace][109157][5rz6t681] dispatch cached gop success. count=0, duration=-1
[2022-02-07 16:34:56.305][Trace][109157][5rz6t681] create consumer, active=1, queue_size=0.00, jitter=30000000
[2022-02-07 16:34:56.305][Trace][109157][5rz6t681] set fd=7, SO_SNDBUF=87040=>175000, buffer=350ms
[2022-02-07 16:34:56.305][Trace][109157][5rz6t681] start play smi=0ms, mw_sleep=350, mw_msgs=8, realtime=0, tcp_nodelay=0
[2022-02-07 16:34:56.305][Trace][109157][5rz6t681] update source_id=rh46sb82/ueeyh653
[2022-02-07 16:34:56.308][Trace][109157][rh46sb82] complex handshake success.
[2022-02-07 16:34:56.308][Trace][109157][rh46sb82] protocol in.buffer=0, in.ack=0, out.ack=0, in.chunk=128, out.chunk=128
[2022-02-07 16:34:56.384][Trace][109157][rh46sb82] connected, version=4.0.206.0, ip=127.0.0.1, pid=109153, id=0, dsu=1
[2022-02-07 16:34:56.384][Trace][109157][rh46sb82] edge-pull publish url rtmp://127.0.0.1:19351/live/test01, stream=test01 as test01
[2022-02-07 16:34:56.384][Trace][109157][rh46sb82] edge change from 100 to state 101 (pull).
[2022-02-07 16:34:56.424][Warn][109157][rh46sb82][11] RTMP redirect /live/test01 from 127.0.0.1:19351 to rtmp://127.0.0.1:19352/live/test01
[2022-02-07 16:34:56.426][Trace][109157][rh46sb82] complex handshake success.
[2022-02-07 16:34:56.427][Trace][109157][rh46sb82] protocol in.buffer=0, in.ack=0, out.ack=0, in.chunk=128, out.chunk=128
[2022-02-07 16:34:56.504][Trace][109157][rh46sb82] connected, version=4.0.206.0, ip=127.0.0.1, pid=109154, id=0, dsu=1
[2022-02-07 16:34:56.504][Trace][109157][rh46sb82] edge-pull publish url rtmp://127.0.0.1:19352/live/test01, stream=test01 as test01
[2022-02-07 16:34:56.505][Trace][109157][rh46sb82] got metadata, width=1920, height=1080, vcodec=7, acodec=7
[2022-02-07 16:34:56.505][Trace][109157][rh46sb82] 42B video sh,  codec(7, profile=High, level=4, 1920x1080, 0kbps, 0.0fps, 0.0s)
[2022-02-07 16:34:58.830][Warn][109157][rh46sb82][4] origin disconnected, retry, error code=1007 : recv message : recv interlaced message : read basic header : basic header requires 1 bytes : read bytes : read
thread [109157][rh46sb82]: ingest() [src/app/srs_app_edge.cpp:320][errno=4]
thread [109157][rh46sb82]: recv_message() [src/protocol/srs_rtmp_stack.cpp:372][errno=4]
thread [109157][rh46sb82]: recv_interlaced_message() [src/protocol/srs_rtmp_stack.cpp:859][errno=4]
thread [109157][rh46sb82]: read_basic_header() [src/protocol/srs_rtmp_stack.cpp:954][errno=4]
thread [109157][rh46sb82]: grow() [src/protocol/srs_protocol_stream.cpp:162][errno=4]
thread [109157][rh46sb82]: read() [src/protocol/srs_service_st.cpp:514][errno=4]
[2022-02-07 16:34:59.056][Trace][109157][4s1606j8] Hybrid cpu=1.00%,14MB, cid=11,1, timer=63,0,0, clock=0,49,1,0,0,0,0,0,0, free=1, objs=(pkt:0,raw:0,fua:0,msg:60,oth:0,buf:0)
[2022-02-07 16:35:01.830][Trace][109157][5rz6t681] cleanup when unpublish
[2022-02-07 16:35:01.830][Trace][109157][5rz6t681] edge change from 101 to 300 then 0 (init).
[2022-02-07 16:35:01.830][Trace][109157][5rz6t681] TCP: before dispose resource(RtmpConn)(0x13656d0), conns=1, zombies=0, ign=0, inz=0, ind=0
[2022-02-07 16:35:01.830][Warn][109157][5rz6t681][4] client disconnect peer. ret=1007
[2022-02-07 16:35:01.830][Trace][109157][mp510936] TCP: clear zombies=1 resources, conns=1, removing=0, unsubs=0
[2022-02-07 16:35:01.830][Trace][109157][5rz6t681] TCP: disposing #0 resource(RtmpConn)(0x13656d0), conns=1, disposing=1, zombies=0
[2022-02-07 16:35:04.056][Trace][109157][4s1606j8] Hybrid cpu=1.00%,14MB, cid=11,1, timer=63,0,0, clock=0,49,1,0,0,0,0,0,0, free=1, objs=(pkt:0,raw:0,fua:0,msg:60,oth:0,buf:0)
[2022-02-07 16:35:09.056][Trace][109157][4s1606j8] Hybrid cpu=0.00%,14MB, cid=11,1, timer=63,0,0, clock=0,49,1,0,0,0,0,0,0, free=1, objs=(pkt:0,raw:0,fua:0,msg:60,oth:0,buf:0)
^C[2022-02-07 16:35:09.424][Trace][109157][jp1i16y3] sig=2, user terminate program, fast quit
[2022-02-07 16:35:10.060][Trace][109157][jp1i16y3] cleanup for quit signal fast=1, grace=0
[2022-02-07 16:35:10.060][Warn][109157][jp1i16y3][11] main cycle terminated, system quit normally.
[2022-02-07 16:35:10.160][Trace][109157][jp1i16y3] srs disposed
[2022-02-07 16:35:10.160][Trace][109157][jp1i16y3] srs terminated
  1. SRS configuration is as follows (Config):

origin.cluster.serverA.conf

# the config for srs origin-origin cluster
# @see https://github.com/ossrs/srs/wiki/v3_EN_OriginCluster
# @see full.conf for detail config.

listen              19350;
max_connections     1000;
daemon              off;
srs_log_tank        console;
pid                 ./objs/origin.cluster.serverA.pid;
http_api {
    enabled         on;
    listen          9090;
}
vhost __defaultVhost__ {
    cluster {
        mode            local;
        origin_cluster  on;
        coworkers       127.0.0.1:9091 127.0.0.1:9092;
    }
}

origin.cluster.serverB.conf

# the config for srs origin-origin cluster
# @see https://github.com/ossrs/srs/wiki/v3_EN_OriginCluster
# @see full.conf for detail config.

listen              19351;
max_connections     1000;
daemon              off;
srs_log_tank        console;
pid                 ./objs/origin.cluster.serverB.pid;
http_api {
    enabled         on;
    listen          9091;
}
vhost __defaultVhost__ {
    cluster {
        mode            local;
        origin_cluster  on;
        coworkers       127.0.0.1:9090 127.0.0.1:9092;
    }
}

origin.cluster.serverC.conf

# the config for srs origin-origin cluster
# @see https://github.com/ossrs/srs/wiki/v3_EN_OriginCluster
# @see full.conf for detail config.

listen              19352;
max_connections     1000;
daemon              off;
srs_log_tank        console;
pid                 ./objs/origin.cluster.serverC.pid;
http_api {
    enabled         on;
    listen          9092;
}
vhost __defaultVhost__ {
    cluster {
        mode            local;
        origin_cluster  on;
        coworkers       127.0.0.1:9090 127.0.0.1:9091;
    }
}

origin.cluster.edge.conf

# the config for srs origin-edge cluster
# @see https://github.com/ossrs/srs/wiki/v3_EN_OriginCluster
# @see full.conf for detail config.

listen              1935;
max_connections     1000;
pid                 objs/edge.pid;
daemon              off;
srs_log_tank        console;
vhost __defaultVhost__ {
    cluster {
        mode            remote;
        origin          127.0.0.1:19351 127.0.0.1:19350;
    }
}

Replay

How to replay bug?

  1. Open a shell window and navigate to the decompressed directory of SRS. Execute ./objs/srs -c ./conf/origin.cluster.serverA.conf.
  1. Open another shell window and navigate to the decompressed directory of SRS. Execute ./objs/srs -c ./conf/origin.cluster.serverB.conf.
  2. Open another shell window and navigate to the decompressed directory of SRS. Execute ./objs/srs -c ./conf/origin.cluster.serverC.conf.
  3. Open another shell window and navigate to the decompressed directory of SRS. Execute ./objs/srs -c ./conf/origin.cluster.edge.conf.
  4. Open CMD on your local computer and use FFmpeg to push the stream to the serverC node. Run the command: ffmpeg -i rtsp://admin:[email protected]:554/h264/ch1/main/av_stream -c copy -f flv rtmp://192.168.1.185:19352/live/test01.
  5. Open CMD on your local computer and use ffplay to pull and play the stream from the edge node. Run the command: ffplay -i rtmp://192.168.1.185:1935/live/test01.
  6. After the stream is playing normally, stop pulling the stream. Within 3 seconds, repeat step 6 to pull the stream again. This will reproduce the issue, and SRS logs will show an error. (The reason for using ffplay to pull the stream in step 6 is that it allows for quick repetition of the action within a short time. VLC operations would be slower.)

Expected Behavior (Expect)

> Describe your expectation
Please describe what you expect to happen.

Expectation in the source cluster environment
I expect that the edge streaming nodes, in the source cluster environment, can successfully pull streams just like the nodes in the previous standalone deployment, regardless of the time of pulling the stream (assuming the stream is being pushed).

TRANS_BY_GPT3

@wr-ning
Copy link
Author

wr-ning commented Feb 7, 2022

(Business Scenario Supplement:

  1. There is a video preview list page in the project, and a video preview detail page. The list page has 6 small video streaming windows playing. When you click on a small video, you will enter the detail page, which has more content to display, including video preview, content introduction, operations, etc.
  2. Because the process of entering the detail page triggers the above problem, the video instance is destroyed when leaving the list page, which means the connection with SRS is disconnected. When entering the detail page, the video instance is recreated, which means the connection with SRS is established again.
  3. The video streams in the project are all pulled from the edge nodes in SRS.)

Please make sure to maintain the markdown structure.

TRANS_BY_GPT3

@winlinvip winlinvip changed the title CLUSTER: 边缘节点向源站集群内某节点拉流正常断开后的3秒内再次拉流报错:serve error code=2020 : service cycle : rtmp: stream service : rtmp: create consumer : play edge : state is stopping CLUSTER: 边缘节点向源站集群内某节点拉流正常断开后的3秒内再次拉流报错 Feb 7, 2022
@winlinvip winlinvip changed the title CLUSTER: 边缘节点向源站集群内某节点拉流正常断开后的3秒内再次拉流报错 CLUSTER: 向源站集群内某节点拉流,断开后3秒内再次拉流报错 Feb 7, 2022
@winlinvip
Copy link
Member

winlinvip commented Feb 7, 2022

This seems to be a retry strategy problem, belonging to the boundary conditions that need optimization. You can reduce the retry interval a bit.

However, no matter what, when edge sourcing, it is necessary to consider destroying the flow of sourcing to avoid a large number of flows sourcing. Therefore, there will always be a time window where playback is prohibited, which cannot be avoided regardless of optimization efforts.

A better way to bypass this issue is to have the client support multiple edges and retry several times in case of failure. This approach can effectively solve the problem.

Of course, the retry interval for the edges is worth optimizing.

TRANS_BY_GPT3

@winlinvip winlinvip self-assigned this Feb 7, 2022
@winlinvip winlinvip added the Enhancement Improvement or enhancement. label Feb 7, 2022
@winlinvip winlinvip added this to the 5.0 milestone Feb 7, 2022
@wr-ning
Copy link
Author

wr-ning commented Feb 8, 2022

First of all, thank you for patiently answering.
Seeing this issue tagged with "5.0 milestone" instantly made me feel like it's still far away.

Today, I tried following your suggestions and attempted to use flv.js. Since it only allows listening through events and executing reconnection, I had to try it out.
During testing, there weren't any issues for now. I will observe the results once it's deployed online.

TRANS_BY_GPT3

@jinleileiking

This comment was marked as off-topic.

@wr-ning
Copy link
Author

wr-ning commented Feb 9, 2022

Thank you. I have read #2215 and also #2707. Indeed, from the discussion, it seems that this issue arises every few months.
The process of modifying the code, recompiling, and repackaging is a bit costly for us, who are working on other languages.

For now, we can only temporarily solve this problem by following the "listen and reconnect" approach mentioned above. We will try other solutions when we have a better one.
#2901 (comment)

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented Feb 10, 2022

Um, client reconnection is definitely necessary. Don't rely solely on server improvements. Stability is a system engineering concern, and of course, the client needs to be considered.

In general, there is always a way to bypass common problems. Solutions are always more abundant than problems.

TRANS_BY_GPT3

@winlinvip winlinvip added the good first issue Easy to fix issues, good for newcomers label Feb 12, 2022
@mingyang0921
Copy link

mingyang0921 commented Apr 15, 2022

I added an event that allows for notifications and waiting. Additionally, I am considering adding a configuration file to enable or disable this mode.
We have already updated our own server using this method. So far, there haven't been any issues.
@winlinvip

srs_error_t SrsPlayEdge::on_client_play()
{
    srs_error_t err = srs_success;
    
    // start ingest when init state.
    if (state == SrsEdgeStateInit) {
        state = SrsEdgeStatePlay;
        err = ingester->start();
    } else if (state == SrsEdgeStateIngestStopping) {
    	srs_cond_wait(ingester_wait);
		srs_trace("on_client_play srs_cond_wait.");
		if(state == SrsEdgeStateInit){
			state = SrsEdgeStatePlay;
			err = ingester->start();
		}else if(state == SrsEdgeStateIngestStopping){
			return srs_error_new(ERROR_RTMP_EDGE_PLAY_STATE, "state is stopping 1.");
		}else{
			return srs_error_new(ERROR_RTMP_EDGE_PLAY_STATE, "state is stopping 2.");
		}
    }
    
    return err;
}

void SrsPlayEdge::on_all_client_stop()
{
    // when all client disconnected,
    // and edge is ingesting origin stream, abort it.
    if (state == SrsEdgeStatePlay || state == SrsEdgeStateIngestConnected) {
        SrsEdgeState pstate = state;
        state = SrsEdgeStateIngestStopping;
		
		srs_trace("on_all_client_stop begin. ");
        ingester->stop();

        state = SrsEdgeStateInit;
        srs_trace("edge change from %d to %d then %d (init).", pstate, SrsEdgeStateIngestStopping, state);
        srs_cond_signal(ingester_wait);
		srs_trace("on_all_client_stop srs_cond_signal");
        return;
    }
}

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented Oct 10, 2022

Sorry for keeping everyone waiting for so long, I have resolved it in 4.0.267.

The solution of waiting for notification is good (although there is an even simpler solution). 👍

Hahaha, everyone actually gave quite a few solutions, but the best one is still SRS, so we don't need to use menstrual patches anymore. 😄

I checked and the retry time is around 3 seconds. The reason is that when the last playback is triggered, it will stop pulling the stream, and this stop coroutine operation takes a considerable amount of time. Other operations are completed in milliseconds.

[2022-10-10 08:01:45.207] Debug: Start to stop edge coroutine
[2022-10-10 08:01:48.207] Debug: Start to stop edge upstream
[2022-10-10 08:01:48.208] Debug: Start to unpublish source
[2022-10-10 08:01:48.209] Debug: Edge stopped

Looking at the code, it is waiting for a fixed 3 seconds, so the mechanism here is not reasonable.

void SrsPlayEdge::on_all_client_stop() {
    if (state == SrsEdgeStatePlay || state == SrsEdgeStateIngestConnected) {
        state = SrsEdgeStateIngestStopping;
        ingester->stop();
}

void SrsEdgeIngester::stop() {
    trd->stop();
    upstream->close();
    source->on_unpublish();
}

srs_error_t SrsEdgeIngester::cycle() {
    while (true) {
        do_cycle(); // Interrupted by stop.
        srs_usleep(SRS_EDGE_INGESTER_CIMS); // 3s
    }
}

A more reasonable approach would be to improve the thread stopping mechanism. The problem is that when the thread is stopped, it interrupts do_cycle(), but the outer sleep is not interrupted (it sleeps again after the interruption). We just need to solve this problem in a simple way.

After the improvement, the stop operation is completed within 2ms, which is generally not triggered by normal user operations.

[2022-10-10 08:07:07.455] Debug: Start to stop edge coroutine
[2022-10-10 08:07:07.457] Debug: Edge stopped

Although condition variables or delayed release are also solutions to this problem, re-releasing the object (somewhat like restarting) is the simplest solution, and the state of the object is also very clear and simple, so the problem is minimized.

Note: Some friends hope that we can implement delayed release, but in fact, there is no perceptible difference in testing compared to re-pulling the stream. However, delayed release of the stream requires a timer, which can make it more complicated. Moreover, delayed release of the stream will cause the stream to be pulled even when no one is playing it, which can also cause issues in certain scenarios.

TRANS_BY_GPT3

@mingyang0921
Copy link

mingyang0921 commented Oct 19, 2022

@winlinvip This modification is relatively simple, but if the edge nodes are attacked, it may bring down the origin server. This contradicts the purpose of our original servers, doesn't it?

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented Dec 25, 2022

Preventing attacks is a separate solution, and this retry interval is not designed for preventing attacks.

TRANS_BY_GPT3

@winlinvip winlinvip changed the title CLUSTER: 向源站集群内某节点拉流,断开后3秒内再次拉流报错 CLUSTER: Pulling stream from a node within the origin server cluster, if disconnected and attempting to pull stream again within 3 seconds, an error will be reported. Jul 28, 2023
@winlinvip winlinvip added the TransByAI Translated by AI/GPT. label Jul 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Improvement or enhancement. good first issue Easy to fix issues, good for newcomers TransByAI Translated by AI/GPT.
Projects
None yet
Development

No branches or pull requests

4 participants