Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two-way audio based on SIP #928

Open
nanosonde opened this issue Dec 8, 2020 · 32 comments
Open

Two-way audio based on SIP #928

nanosonde opened this issue Dec 8, 2020 · 32 comments

Comments

@nanosonde
Copy link

@Sunoo @longzheng
After reading the two-way audio issue, I think it is worth to create a seperate follow-up issue which only refers to video doorbells that provide two-way audio based on SIP.
(See former discussion here: #738 (comment))

Example devices:

Video is very often just implemented as a MJPEG or H.264 stream via HTTP/RTSP.
I guess there are also SIP video doorbells which offer video as part of the SIP session.
In the former case, the video is normally completely seperate from the audio part via SIP/RTP.
In the later case, I would assume that the video+audio uses SIP early media feature to show video+audio before actually picking up the call (during SIP RINGING).

I think that the homebridge-ring plugin could probably serve as a good starting point as it shows how to implement the SIPclient based on @kirm sip.js lib. It should be easy to then extract the relevant SDP from the SIP INVITE media negotiation.

Remark:
Of course there are a few SIP apps out there which could also somehow cover the use case and also offer Apple VoIP notification feature (like linphone) to receive calls even if the app is not in the foreground.
However, this feature request is to use SIP video doorbells as Homekit video doorbells without any additional app. Only homebridge will talk to the SIP video doorbell.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Dec 15, 2020
@Sunoo
Copy link
Collaborator

Sunoo commented Dec 15, 2020

I’m really not sure if SIP is appropriate for this plugin. If it was to be added, it would be a ways down the line.

I also don’t have any cameras that support SIP, so development and testing would be a bit difficult.

@github-actions github-actions bot removed the stale label Dec 15, 2020
@nanosonde
Copy link
Author

nanosonde commented Dec 17, 2020

After investing a bit more and looking at various Homekit projects, I came to the conclusion that it is really out of scope of this plugin.

So I think I will go this way:

  • install something like linphone-nogtk to establish and receive SIP calls to/from the SIP video doorbell.
  • configure the SIP client to use ALSA loopback devices or similar to get the full-duplex audio on two diffrent lookback cards
  • use this plugin with two-way audio: use each loopback device in each FFMPEG process for each direction

Do you think that this is feasible?

@Sunoo
Copy link
Collaborator

Sunoo commented Dec 17, 2020

Seems like it should be yes. If you write up your results, I'll be happy to point people towards that if they want to do a similar thing.

@nanosonde
Copy link
Author

Ok, I will close it for now.

@nanosonde
Copy link
Author

nanosonde commented Dec 17, 2020

Ah, sorry. One question that should fit in the scope of this plugin.

Could you please provide a "returnAudioTarget" command line for FFMPEG which just sends the the FFMPEG output to a local sound card?

@nanosonde nanosonde reopened this Dec 17, 2020
@Sunoo
Copy link
Collaborator

Sunoo commented Dec 17, 2020

Sure, I have one in my notes, I'll dig it up this afternoon.

@nanosonde
Copy link
Author

nanosonde commented Dec 23, 2020

@Sunoo
In the mean time I have read a lot about ffmpeg/gstreamer RTP, SRTP with AVP(F)/SAVP(F), ALSA, Pulseaudio, SIP, baresip and so on.

My approach using ALSA loopback device and some SIP client (I use baresip) seems to be quite promising during my first experiments.

I have used this ffmpeg config as a starting point:

ffmpeg -f mjpeg -r 15 -i http://192.168.10.22:8080/?action=stream \
 -f alsa -i hw:1,1\
 -vcodec libx264 -x264-params keyint=25:min-keyint=25 -f rawvideo -preset ultrafast -tune zerolatency -payload_type 99 -ssrc 16132552 -an -sn -dn -flags global_header \
 -f rtp "rtp://192.168.10.101:58536?rtcpport=58537&localrtcpport=58537&localrtpport=58536&pkt_size=1316" \
 -acodec libfdk_aac -profile:a aac_eld -flags +global_header -payload_type 100 -ssrc 17132553 -ar 48000 -ac 2 -vn -sn -dn \
 -f rtp "rtp://192.168.10.101:58538?rtcpport=58539&localrtcpport=58539&localrtpport=58538&pkt_size=1316" \

I have loaded the ALSA loopback module: sudo modprobe snd-aloop

Now I have installed baresip-core in Ubuntu.
In the baresip config under ~/.baresip/config I setup the audio config like this:

audio_player		alsa,hw:1,0
audio_source		alsa,hw:1,0
audio_alert		alsa,hw:1,0
ausrc_srate		48000
auplay_srate		48000
ausrc_channels		2
auplay_channels         2

So baresip will play and record the SIP audio to the one and only loopback device. It is full-duplex, so it will work simultaneously with playback and capture.

The ffmpeg command line from above received what is played from baresip on hw:1,1 and streams it via RTP.
BTW: I use VLC without SRTP for testing at the moment.

To test "return audio" I played an MP3 file: mpg123 -a hw:1,1 test.mp3
The doorbell gave back the audio without any problems.

So what that all mean to this plugin?
What I would require is a PRE and POST hook before/after the FFMPEG invocation to be able to setup everything and take it down again.
Especially when using ALSA loopback it is important that "problematic programs" open the loopback device FIRST so that they can freely setup sample rate, number channels, sample format, etc. Another user of the loopback device -in our case ffmpeg - would have to "live" with the configured settings. However, this is not a problem for ffmpeg as it can convert audio to whatever is required.

Do you think you could add some pre/post script execution config commands that get executed?

@Sunoo
Copy link
Collaborator

Sunoo commented Dec 23, 2020

This is not the first use case that’s come to me that could use pre or post execution jobs. I have some ideas as to how to implement that somewhat cleanly. I’ll probably work on it after Christmas. There is another version I need to push before I dive into that, but that one shouldn’t be too hard.

@github-actions github-actions bot added stale and removed stale labels Dec 30, 2020
@homebridge-plugins homebridge-plugins deleted a comment from github-actions bot Dec 30, 2020
@Sunoo
Copy link
Collaborator

Sunoo commented Mar 2, 2021

@nanosonde Would something like one of these options resolve your use case? #929 (comment)

I’m still giving some thoughts on how best to handle this sort of thing.

@nanosonde
Copy link
Author

@Sunoo

I have read your suggested options.

The issue we should consider for option 2 and 3 is that we need some kind of handshake BEFORE the actual ffmpeg process is started. This is required because it shall be possible to setup loopback devices that ffmpeg shall use when started afterwards.
So the plugin would have to wait for some ACK, before it proceeds to start ffmpeg. Maybe with some default timeout in case the external script does not work properly.

If I would have to choose, I would go with MQTT instead of HTTP. I have a broker running anyway.
I guess that people who need an advanced setup with external scripts should be able to handle the broker requirement.

@nanosonde
Copy link
Author

@startuml
Plugin -> Script: Request Prepare Resource
Script--> Plugin : ACK

Plugin --> Plugin : Use resource (e.g. audio device as input device in ffmpeg)

Plugin -> Script: Request Shutdown Resource
Script--> Plugin : ACK
@enduml

grafik

@Sunoo
Copy link
Collaborator

Sunoo commented Mar 2, 2021

Hmm, good point on the ACK, hadn’t thought about that. There would probably have to be a fairly short timeout on waiting on a response from the script if I did wait for a response.

Also, some scripts that hook into this probably would likely have no reason to delay the stream, but I suppose either a configuration setting or just documentation that they should ACK immediately in that case should solve that.

Just thinking out loud a bit, but if it would just be two way audio that would need the ACK, I wonder if there is a reasonable way to start sending return audio towards your script and just have it pick that up and start working with it as soon as possible. This would have the best user experience, since loading the video wouldn’t be delayed, it may just take a second or two for return audio to start working after it loads. Configuring the two way audio setting in the plugin to point at a FIFO or something could be the solution for that.

@nanosonde
Copy link
Author

nanosonde commented Mar 2, 2021

What I would like to do is use an existing SIP command line client to handle the SIP communication with two-way audio. As I do not want to maintain the SIP part.

If the PrepareResourceRequest comes in, I would like to start the SIP command line client,return immediately and send the ACK towards the plugin. This will make sure that it already grabbed the corresponding ALSA devices.
In parallel the the SIP client already starts initiating the SIP call which the plugin can already start the ffmpeg processing to get the video stream as early as possible.

So the timeout could really be fairly small I think. Just enough time to start another linux process which opens some device files.

@nanosonde
Copy link
Author

BTW: I am not sure if we need an ACK during shutdown. It will be called AFTER the ffmpeg process is finished. So no delay is necessary here.

@Sunoo
Copy link
Collaborator

Sunoo commented Mar 2, 2021

I totally understand not wanting to maintain a SIP implementation, that's not my idea of fun. I'm going to keep thinking on this. Perhaps trying to come up with a one-size-fits-all solution as I have been isn't worth it. Though delaying video until the ACK also allows for the potential need to set some process up to pull video from as well.

I'm not sure how patient HomeKit is when waiting for frames, I'll have to do some testing at some point (might be the same ~22 seconds that it waits for snapshots). Obviously any delay negatively impacts user experience though, and should be avoided where possible.

Also, I agree, no ACK is likely required on shutdown, as there is no reason (and in many cases, no ability) to delay stopping the stream.

@spbroot
Copy link

spbroot commented Oct 12, 2021

Maybe someone will be useful.
I use this option to implement two-way audio with SIP intercom https://github.com/spbroot/sipdoorbell (Homebridge-camera-ffmpeg + Baresip + ALSA loopback).

@nanosonde
Copy link
Author

Maybe someone will be useful. I use this option to implement two-way audio with SIP intercom https://github.com/spbroot/sipdoorbell (Homebridge-camera-ffmpeg + Baresip + ALSA loopback).

Which SIP intercom are you using?

@Sunoo
Copy link
Collaborator

Sunoo commented Oct 12, 2021

@spbroot Nice, if you share how you did your setup somewhere, I can add the instructions to the project site.

@nanosonde
Copy link
Author

nanosonde commented Oct 12, 2021

@spbroot Nice, if you share how you did your setup somewhere, I can add the instructions to the project site.

Couldn't the HTTP calls to answer and hangup be executed as part of the pre- and post-hooks when running the FFMPEG process?

So I think it is enough if the camera-ffmpeg plugin just calls a webhook before and after executing FFMPEG without waiting for a reply.

The audio loopback device can always be opened by FFMPEG. The video stream is always there for most SIP intercoms where the video is independent from the audio part.

@spbroot
Copy link

spbroot commented Oct 14, 2021

Maybe someone will be useful. I use this option to implement two-way audio with SIP intercom https://github.com/spbroot/sipdoorbell (Homebridge-camera-ffmpeg + Baresip + ALSA loopback).

Which SIP intercom are you using?

Hi. I am using an analog intercom and a SIP converter is connected to it.

@spbroot
Copy link

spbroot commented Oct 14, 2021

@spbroot Nice, if you share how you did your setup somewhere, I can add the instructions to the project site.

Ok, I will do it.

@spbroot
Copy link

spbroot commented Oct 14, 2021

@spbroot Nice, if you share how you did your setup somewhere, I can add the instructions to the project site.

Couldn't the HTTP calls to answer and hangup be executed as part of the pre- and post-hooks when running the FFMPEG process?

So I think it is enough if the camera-ffmpeg plugin just calls a webhook before and after executing FFMPEG without waiting for a reply.

The audio loopback device can always be opened by FFMPEG. The video stream is always there for most SIP intercoms where the video is independent from the audio part.

Hi.
I would also like to add SIP call control only through the Homekit functionality, but for this I need to come up with an interaction with the plugin.
At first I had the idea to track the creation of an FFMPEG process that is launched in the system with the parameters of my device, but this is not an option for me, because I have several Apple TVs in my house that notify about the doorbell, and they request video, thereby initiating the start of the FFMPEG process when the doorbell rings.
I think the best option is to establish a SIP connection when you press the TALK button, and disconnect it when you press it again.
So far, it has only been possible to implement this through parsing Homebridge logs when "Two-way FFmpeg Debug Logging" is enabled, but this is a bad solution. (I have updated the information with a new script that does this).

I think if the functionality of executing external scripts will be added in the future plugin, it would be nice to add the execution of external commands when the TALK button is pressed and disabled (if possible).
It would also be nice to access the plugin via HTTP indicating the device (something like http: // homebridge: 8080 / status? Doorbell) and receive a response with the status: is the TALK button pressed, etc. and everyone will be able to parse the parameters and state they need.

@mrMiimo
Copy link

mrMiimo commented Oct 29, 2022

@Sunoo can we have a version (pull request still open) that supports SIP calls ?,
or how do I install the fork that contains it?
thnks a lot!

@Sunoo
Copy link
Collaborator

Sunoo commented Oct 29, 2022

I’ll try to get that merged soon, I can’t truly test it myself, but I suppose it must work.

@longzheng
Copy link
Contributor

I’ll try to get that merged soon, I can’t truly test it myself, but I suppose it must work.

Is there a SIP doorbell you'd be interested in installing? Maybe we can chip in one for you.

@Sunoo
Copy link
Collaborator

Sunoo commented Oct 30, 2022

I’d be open to installing one, not sure what’s even available as far as those go to be honest.

@mrMiimo
Copy link

mrMiimo commented Oct 30, 2022

I’ll try to get that merged soon, I can’t truly test it myself, but I suppose it must work.

is there a way I can test it? maybe if you can create a branch ...

@stephanlinke87
Copy link

Willing to test this! :) I do have a doorbird that can initiate sip calls upon ringing. Their API documentation has the SIP stuff starting at page 33 :)

@edelmaca
Copy link

This one would be really nice. I'm using a 2N IP Verso 2. That is capable of initiating SIP calls as well.

@VCTGomes
Copy link

VCTGomes commented Aug 4, 2023

Well, I finally connected successful my Hikvision doorbell with SIP to Baresip. It's registering and even ringing (I just test the indoor panel to phone).

However if I try to open the live video on HomeKit, I got the following error:

[8/3/2023, 11:22:11 PM] [Camera FFmpeg] [Doorbell] [h264 @ 0x562a76a025c0] [error] non-existing PPS 0 referenced
[8/3/2023, 11:22:11 PM] [Camera FFmpeg] [Doorbell] [h264 @ 0x562a76a025c0] [error] non-existing PPS 0 referenced
[8/3/2023, 11:22:11 PM] [Camera FFmpeg] [Doorbell] [h264 @ 0x562a76a025c0] [error] decode_slice_header error
[8/3/2023, 11:22:11 PM] [Camera FFmpeg] [Doorbell] [h264 @ 0x562a76a025c0] [error] no frame!
[8/3/2023, 11:22:11 PM] [Camera FFmpeg] [Doorbell] [h264 @ 0x562a76a025c0] [error] non-existing PPS 0 referenced
[8/3/2023, 11:22:11 PM] [Camera FFmpeg] [Doorbell] [h264 @ 0x562a76a025c0] [error] non-existing PPS 0 referenced
[8/3/2023, 11:22:11 PM] [Camera FFmpeg] [Doorbell] [h264 @ 0x562a76a025c0] [error] decode_slice_header error
[8/3/2023, 11:22:11 PM] [Camera FFmpeg] [Doorbell] [h264 @ 0x562a76a025c0] [error] no frame!
[8/3/2023, 11:22:11 PM] [Camera FFmpeg] [Doorbell] [h264 @ 0x562a76a025c0] [error] non-existing PPS 0 referenced
[8/3/2023, 11:22:11 PM] [Camera FFmpeg] [Doorbell] [h264 @ 0x562a76a025c0] [error] non-existing PPS 0 referenced
[8/3/2023, 11:22:11 PM] [Camera FFmpeg] [Doorbell] [h264 @ 0x562a76a025c0] [error] decode_slice_header error
[8/3/2023, 11:22:11 PM] [Camera FFmpeg] [Doorbell] [h264 @ 0x562a76a025c0] [error] no frame!
[8/3/2023, 11:22:11 PM] [Camera FFmpeg] [Doorbell] [h264 @ 0x562a76a025c0] [verbose] Reinit context to 1920x1088, pix_fmt: yuvj420p
[8/3/2023, 11:22:12 PM] [Camera FFmpeg] [Doorbell] [warning] Guessed Channel Layout for Input Stream #0.1 : mono
[8/3/2023, 11:22:12 PM] [Camera FFmpeg] [Doorbell] [info] Input #0, rtsp, from 'rtsp://admin:[email protected]:554/Streaming/channels/101':
[8/3/2023, 11:22:12 PM] [Camera FFmpeg] [Doorbell] [info]   Metadata:
[8/3/2023, 11:22:12 PM] [Camera FFmpeg] [Doorbell] [info]     title           : Media Presentation
[8/3/2023, 11:22:12 PM] [Camera FFmpeg] [Doorbell] [info]   Duration: N/A, start: 0.000000, bitrate: N/A
[8/3/2023, 11:22:12 PM] [Camera FFmpeg] [Doorbell] [info]   Stream #0:0: Video: h264 (Baseline), 1 reference frame, yuvj420p(pc, bt709, progressive, left), 1920x1080 (1920x1088), 30 fps, 30 tbr, 90k tbn
[8/3/2023, 11:22:12 PM] [Camera FFmpeg] [Doorbell] [info]   Stream #0:1: Audio: pcm_mulaw, 8000 Hz, mono, s16, 64 kb/s
[8/3/2023, 11:22:12 PM] [Camera FFmpeg] [Doorbell] ALSA lib confmisc.c:165:(snd_config_get_card) Cannot get card index for Loopback
[8/3/2023, 11:22:12 PM] [Camera FFmpeg] [Doorbell] [alsa @ 0x562a76a76380] [error] cannot open audio device sipdoorbell_main (No such device)
[8/3/2023, 11:22:12 PM] [Camera FFmpeg] [Doorbell] [error] sipdoorbell_main: Input/output error
[8/3/2023, 11:22:12 PM] [Camera FFmpeg] [Doorbell] FFmpeg exited with code: 1 and signal: null (Error)
[8/3/2023, 11:22:12 PM] [Camera FFmpeg] [Doorbell] Stopped video stream.
[8/3/2023, 11:22:14 PM] [Camera FFmpeg] [Doorbell] [Two-way] FFmpeg exited with code: null and signal: SIGKILL (Forced)

I already added the ALSA configuration on both directories (/usr/share/alsa/alsa.conf and /etc/asound.conf).

Where is my mistake?

@jmnovak50
Copy link

jmnovak50 commented Aug 4, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants