Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Matter Casting support for "Audio Player Architecture" with a new"Casting Audio Player device type" and "Audio Player endpoint" ("Casting Audio Player" and adding "Basic Audio Player") + look into adding multi-room music streaming? #31389

Open
Hedda opened this issue Jan 12, 2024 · 12 comments

Comments

@Hedda
Copy link

Hedda commented Jan 12, 2024

Feature description

I hope it is OK to submit this large feature request here as I do not have the skills or resources to implement this myself. As such, this is just a feature suggestion meant as an open letter to Matter members for discussion, and not a feature proposal from me.

To summerize; this is a feature request where I ask you and others to consider having a separate Matter architecture for an "Audio Player Architecture" and adding a new "Casting Audio Player device type" and "Audio Player endpoint" ("Casting Audio Player" and "Basic Audio Player"), including speaker setup, multi-room groups, and advanced control from Matter Castings manufacturer’s apps.

That is, Matter needs support for universal audio-only casting as a standard for music services and other streaming audio sources, cast to speaker-only devices, (i.e. devices such as example smart speakers that are not at all designed to handle video playback).

Also need an example similar to tv-casting-app but perhaps instead for audio-only casting of music, so speaker-casting-app or?

Perhaps also an example audio player app similar to tv-app but for music playback (preferably multi-room synchronization)?

By the way, off-topic but FYI, there is currently no Audio Output Cluster but no Audio Input Cluster in the chip data-model, though there is a generic Media Input Cluster, also, there is only a generic Media Playback Cluster but no Audio Playback Cluster for audio-only to can have an optimized pipeline for high-quality music playback.

Anyway, I think that there is a need for a proper "Audio Player Architecture" and I see no reason that it could not initially be based on the existing "Video Player Architecture" for Matter, and need a new "Casting Audio Player device type" as well which could be based on the existing "Casting Video Player device type". However think there are then needs that have to be different between video player and audio player meant for HiFi music playback. As such I think you should try to aim to design an architecture primarily for music playback that works for a combination of "smart speaker", "home audio", and "high fidelity", which I understand may have different but at least more similar use cases if talking about voice control versus music playback.

While some video-specific features could be removed if basing it on the existing "Video Player Architecture", I think it would be preferable to also extend a dedicated "Audio Player Architecture" with some audio-specific features to optimize for home audio setups with Hi-Fi quality amplifiers and speakers designed for music playback, and not solely for embedded smart speakers.

An alternative could be to redesign and rename the "Video Player Architecture" into a more generic "Media Player Architecture"?

An example feature is real-time audio synchronization between different smart speakers to allow for synchronized multi-room audio playback of music on several Matter Audio Casting enabled speakers installed in the same home, (also known as distributed audio system). This needs support for "Audio Group", usually named "Speaker Group" and perhaps also "Audio Zone" ("Speaker Zone" or area). Preferably also need to have separate volume controls for each speaker and/or zone to compensate for differences in apparent volume due to room size and shape as well as speaker products used in different rooms.

https://en.wikipedia.org/wiki/Multi-room_audio

Multi-room audio:

  • Single source, single zone = Scenario: One music app streaming to a single audio player/speaker).
  • Single source, multiple zones = Scenario: One music app streaming to an audio group containing multiple audio players/speakers.
  • Multiple source, multiple zones = Scenario: Several music apps stream to separate audio players/speakers in the same home.

Another argument for a separate audio-only architecture and audio player specialized for just music playback could enable it to run with even fewer resources on constrained devices.

Background: The existing Matter specification does feature a "Video Player Architecture" with a "Casting Video Player device type" and "Video Player endpoint" ("Casting Video Player" and "Basic Video Player"). What looks to currently missing but is directly related is a "pure" audio architecture with Matter Casting Audio Player device type, and maybe an Audio Input cluster as well.

Product use case: A client/server design that works well for music/audio apps and smart speaker products, + products with audio line-in, (i.e. devices designed for only pure audio output and/or input that are normally used just music playback, including multi-room sound systems. Probably sometimes but not always including microphone input for voice assistant. The point is that it means products that lack any kind of video output like with video screens such as televisions and/or smart control displays/screens).

The main problem to solve: There are many different audio streaming protocols for commercial use, from basic to audiophile-class audio quality, and there are plenty more music streaming services around today that do not support all of those. Having plenty of proprietary and closed-sourced solutions from different commercials means fragmentation, audio players/receivers and music services that do not communicate with one another, and no way for users to control all of their music from a single interface or stream the audio to different ecosystems at the same time.

I think that other than the obvious smart speakers with voice assistants, another real-world market and use-case for pure audio players are high-quality speakers and Hi-Fi grade sound systems for music playback, whether or not they would be used for being as a single-point and multi-room sound systems for music playback, their primary audience would probably be users of music streaming from example streaming music from different commercial music services apps like Amazon Music, Spotify, SiriusXM, Pandora, Tidal, Qobuz, Deezer, YouTube Music, Apple Music, as well as additional audio-streaming services for other types of content (like example Amazon Audible for audiobooks) if and when they add support “Matter Casting” streaming protocol for audio to their apps.

If implemented, please be sure to include support for the concept of so-called "speakerless devices", meaning audio-output dongles (with TOSLINK and Phono AUX-out or line-out ports for external speakers and sound systems from third parties), such as Google's original "Chromecast Audio" product which enables adding Google Cast audio player capability to any third-party speaker / sound system, as well as "Amazon Echo Input", "Amazon Echo Link", "Echo Link Amp" which similarly also adds AUX output to third-party speaker / sound system (but Amazon Echo products also have embedded voice assistant via built-in microphones).

https://en.wikipedia.org/wiki/Amazon_Echo#Speakerless_devices

https://en.wikipedia.org/wiki/Chromecast#Chromecast_Audio

A popular example of Hi-Fi audio streamer products without a built-in voice assistant is the WiiM series from Linkplay Technology:

A new product idea to consider accommodation for in a new audio-only architecture would also include support for the concept of audio-input dongles. That is, audio-steaming server dongles with "line-in" and/or "microphone" input ports that basically work as stand-alone soundcards on the network act as embedded audio digitizer appliance devices for streaming "Matter Casting Client” of audio-only which can be streamed to any set "Audio Player endpoint", which can either be a single endpoint of a grouped endpoints (audio group) for multiroom music playback. This would allow a user to connect any legacy audio source, like an LP record player (phonograph turntable), cassette deck, or CD-player (for Audio-CDs) to such an audio-input dongle and stream that audio to any “Matter Casting” enabled audio player.

As far as I know there are no commercial products on the market, but check out this "Vinyl Cast" app as a proof-of-concept:

Platform

all

Platform Version(s)

No response

Anything else?

Perhaps an existing Matter group member would be willing to contribute their existing technology solutions as a base for audio grouping and synchronized multi-room audio support? If not the whole thing then perhaps parts of the specifications, patents on software for relative technologies.

Amazon Alexa features multi-room music support:

Google has "Google Cast" (Chromecast Audio) which supports multi-room audio with grouping of speakers and multiroom synchronized playback so maybe they could be convinced to contribute components?

Apple features multiroom support for AirPlay 2 audio streaming:

DTS Play-Fiis a premium wireless audio ecosystem for whole-home music and TV audio, supporting low-latency and high-resolution 24-bit/192kHz lossless streaming, and sub-millisecond playback accuracy synchronization technology

Espressif ESP-ADF (Espressif Audio Development Framework ) do support ESP Multi-Room Music but not synchronized on its own?

Sonos, perhaps the largest on the market for multi-room audio speaker systems, and is now at least a member of CSA today:

IKEA of Sweden AB currently has a partnership with Sonos to make Wi-Fi speakers with multi-rooms audio support:

Yamaha MusicCast (Yamaha is not yet a member of the CSA), however Yamaha MusicCast prove need for high fidelity quality:

Roon Ready (Roon’s RAAT streaming technology by RoonLabs), not CSA member but again prove interoperability needed:

BluOS is a wireless hi-res multi-room platform that lets you manage all your music and stream it to any BluOS Enabled player using a phone, tablet, or computer. BluOS is an operating system that manages and controls all your music. They were the 2023 "Mark of Excellence" winner of Consumer Technology Association Smart Home Division.

HEOS (HEOS® Built-in) from Denon is multi-room speaker technology built-in to newer audio equipment from Denon:

There are also other open-source and closed-source multi-room audio solutions for multi-room audio synchronisation. Example:

Snapcast

SlimProto & SliMP3 protocols for Logitech Squeezebox players (for Logitech Media Server, a.k.a. LMS/SlimServer, SqueezeCenter)

Strobe audio

Music Player Daemon (MPD)

PS: FYI, maybe relative is that last year Google won over Sonos in a patent infringement lawsuit about multi-room audio groups:

https://www.engadget.com/google-brings-back-smart-speaker-grouping-after-sonos-lawsuit-victory-081200931.html

@bzbarsky-apple
Copy link
Contributor

@decenzo please take a look.

@chrisdecenzo
Copy link
Contributor

Thanks for the suggestion. Please join Matter so you can volunteer to lead this effort!

@Hedda
Copy link
Author

Hedda commented Jan 16, 2024

Sorry, I do not have the capacity for that myself. Would think that Amazon and/or Google might be best suited to look into this?

Again, both Amazon and Google have competing smart speakers with their own technology already implementing these features.

Apple also has the technology + use case with AirPlay and their HomePod series, but not as sure they would lead such a project(?).

Is there someone from Amazon who leads Matter's matching "Video Player Architecture" + "Casting Video Player" development?

@pgregorr-amazon or @sharadb-amazon could you maybe refer Amazon leads to look into and consider this feature request?

Could this audio-only casting track perhaps be tackled there as an extension and continued part of that video casting project?

Referencing to Amazon driving "Matter Casting" for their video playback devices and them also having many smart speakers too:

https://www.aboutamazon.com/news/devices/amazon-ces-2024-announcements

https://www.theverge.com/2021/12/9/22824559/matter-tv-streaming-devices-smart-home-casting-protocol-support

https://www.theverge.com/2024/1/9/24030324/amazon-matter-casting-echo-show-fire-tv-prime-video

https://www.streamtvinsider.com/video/amazon-drops-matter-casting-capabilities-panasonic-and-fire-tv-os-partnership-ces

https://9to5google.com/2024/01/09/amazon-fire-tv-matter-casting/

https://www.aftvnews.com/matter-casting-is-coming-to-fire-tvs-and-the-echo-show-15-an-industry-first-by-amazon/

https://www.aftvnews.com/new-matter-casting-video-player-for-fire-tv-gets-certified/

@Hedda Hedda changed the title [Feature] "Audio Player Architecture" with a new"Casting Audio Player device type" and "Audio Player endpoint" ("Casting Audio Player" and adding "Basic Audio Player") + look into adding multi-room audio support? [Feature] Matter Casting support for "Audio Player Architecture" with a new"Casting Audio Player device type" and "Audio Player endpoint" ("Casting Audio Player" and adding "Basic Audio Player") + look into adding multi-room music streaming? May 21, 2024
@Hedda
Copy link
Author

Hedda commented May 21, 2024

Please join Matter so you can volunteer to lead this effort!

@marcelveldt as "the Matter guy at Nabu Casa" (and Home Assistant) perhaps this is something that you and ESPHome developers at Nabu Casa would be interested in helping architecture and develop for the Matter project? I would think this functionality might be very relative to your roadmap now that Home Assistant's recent announcents about both your "Music Assistant" and the Open Home Foundation + the Home Assistant's voice assistant work, which are separate things that I believe all align in spirit with this concept at a high-level?

PS: I also read that Nabu Casa is developing your own ESPHome based smart-speakers (and/or smart-display) hardware, and for such devices to also work as music streaming and audio player (A/V-receiver endpoint) without your "Music Assistant" integration acting as middleware I am guessing you are eventually going to want to add support for some cross-ecosystem support for some kind of standardized audio streaming protocol like Matter Casting with audio player endpoint features?

@github-actions github-actions bot added the linux label May 21, 2024
@Hedda
Copy link
Author

Hedda commented Jun 16, 2024

Any feedback or input on these feature request ideas about Matter Casting support fod audio-only streaming and multi-room support for syncronized music stream playback in multiple rooms?

https://community.home-assistant.io/t/matter-casting-matter-casting-client-support-in-home-assistant-cast-new-upcoming-open-protocol-standard-for-local-video-and-audio-streaming/671645

@chrisdecenzo
Copy link
Contributor

There is an effort within Matter right now to define use cases for audio players / smart speakers. Please join us!

@nagyrobi
Copy link

nagyrobi commented Jun 17, 2024

A new product idea to consider accommodation for in a new audio-only architecture would also include support for the concept of audio-input dongles.

Regarding audio inputs, there's need for something much more simple: have the possibility to play local audio sources through the smart device. You don't want separate speakers for your TV and various players do you? Television sets built-in speakers generally suck, you'd love to hear TV sound on your new speakers but they lack a TosLink / ARC input or some RCA Line-Ins.
You don't necessarily need to stream the sound of the TV through the network to the other rooms, but what you do need is that the sound of the speakers remains in sync with the picture so delay is added. Pretty simple to accomplish in hardware, actually:

https://forum.raspiaudio.com/t/suggestion-for-espmuse-multiple-analog-inputs/401
esphome/feature-requests#1750
esphome/feature-requests#1751
sle118/squeezelite-esp32#227

Product designers have to start thinking in hardware too, not software-only approach.

Sonos soundbars have this; they can learn the Volume Up/Down IR commands of the TV remote (already feasible by ESPHome); nowdays TVs can be set to only output audio through TosLink / HDMI ARC. And volume can be adjusted from TV's remote.

The selling point here would be to have this speaker/preamp/player dongle integrated with the HA system and have eg. announcements mute/dim TV sound and restore it afterwards. Also support multiple sources like being an USB soundcard / Aux Line ins and stream sources and handle them the same way. When you turn on the TV, change to the TV sound source automatically.

Check the links above with POCs and use cases explained.

@jonsmirl
Copy link
Contributor

jonsmirl commented Jul 4, 2024

There is an effort within Matter right now to define use cases for audio players / smart speakers. Please join us!

@chrisdecenzo There are groups of developers (myself included) who work on Matter projects but can't participate in CHIP because our employers won't join the CSA for various reasons. CHIP would benefit from having an 'Invited Expert' membership (like the W3C has) allowing individuals caught in this situation to participate. 'Invited Expert' would allow access to Slack and the draft spec with no voting ability.

@Hedda
Copy link
Author

Hedda commented Aug 6, 2024

There is an effort within Matter right now to define use cases for audio players / smart speakers.

Related suggestion; what do you think of using new "ReSpeaker Lite" as a reference hardware platform for development of this?

They have two "ReSpeaker Lite" kit variants that include their "ReSpeaker Lite Voice Assistant Kit" board that is based on the combination of an XMOS XU-316 AI (xCORE XU316) Sound and Audio Processor Microcontroller IC chipset for advanced voice processing with an Espressif's ESP32-S3 SoC for running application firmware and WiFi/BLE connectivity, and most interestingly it features an (onboard) standard 3.5mm stereo audio output jack for connecting any external speakers or external audio amplifier for Hi-Fi playback:

In their demos they are however only showing of its microphone array for voice and not showing any media playback for music:

Similar XMOS xCORE chips from XMOS is compatible with popular voice assistants and already supported for Amazon Alexa Voice Service (AVS) Development Kit(s) solutions and I believe even used in some Amazon Echo products and other existing smart speakers products too maybe?

PS: ESP32 + XMOS xCORE hardware specifications of the "ReSpeaker Lite" will be very similar to the upcoming Home Assistant voice-kit hardware that Nabu Casa is currently developing on as a reference platform for Home Assistant Voice Assistant hardware running ESPHome firmware:

@jonsmirl
Copy link
Contributor

jonsmirl commented Aug 6, 2024

Don't waste your time with the proprietary chips, we wasted a year on them. Instead use a generic ARM CPU running Linux. Amazon has an Alexa qualified hot word processing library for ARM if you can convince them to give it to you. Another alternative is to use TensorFlow and an ARM CPU with an NPU. Google around github and you'll find example code. Allwinner and Rockchip make low cost CPUs specifically designed for this use case.

@Hedda
Copy link
Author

Hedda commented Aug 6, 2024

Off-topic but FYI, even if XMOS is proprietary hardware their chips are very popular and have open-source compatible libraries.

As far as I can tell the complete source code for XMOS's xcode-voice firmware is available on Github under sln_voice repo:

More information about that in their user-guide for their XK-VOICE-L71:

By the way, I stumbled on this new "voice-kit" GitHub repository where ESPHome firmware developers are developing new and improved components for I2S audio, including XMOS output and input support, as well as a their own ESP32 native media player with support for FLAC, WAV/PCM, MP3, etc. for the upcoming Home Assistant voice-kit hardware platform from Nabu Casa (that as mentioned will be based on a combination on an XMOS xCORE chip and ESP32-S3):

Cutting-edge however they so far already added features/functions or improvements/enhancements to ESPHome, such as:

  • New: Nabu Media Player - new "nabu" media player from Nabu Casa running natively on ESP32
    • Music Assistant streams work (both mp3 and flac), but since it requires resampling, the audio quality isn't great
  • New: Added support for FLAC files
  • New: Added a proper WAV decoder (that parses WAV headers with LIST, INFO, etc. chunks.)
  • New: Initial support for playing back local files
  • New: Playback Control for the VoiceKit
  • New: Added an is_paused condition for media players.
  • New: Add Click to Converse to button
  • New: LED animation
  • New: Scripts for controlling LEDs
  • New: Update Button Behaviour for the Voice kit
  • New: Dial Volume Control
  • New: Timer basic implementation
  • New: Dial Volume Control
  • New: Added HTTP(s) OTA updates
  • New: Dial Volume Control
  • New: Added Buttons for force ota update.
  • New: Software Mute Switch
  • Improvement: A basic resampler adjusts sample rates
  • Improvement: Configurable output sample rate (for experimental 48kHz XMOS firmware)
  • Improvement: The DAC mute state is read on boot
  • Improvement: volume/mute control via the DAC (the wheel works for increasing/decreasing volume)
  • Improvement: Logs what element failed if the pipeline breaks
  • Improvement: Fails gracefully if the incoming stream can't be processed
  • Improvement: Differentiate between user facing LED Ring and Internal LED ring
  • Point external component to dev branch

They also have many TODO inline coments in the code there if anyone are interested in helping them:

https://github.com/search?q=repo%3Aesphome%2Fvoice-kit%20todo&type=code

Note! Be aware that there are many comments there to that most of the new stuff are not yet stable.

@andy31415 andy31415 removed the linux label Oct 24, 2024
@Hedda
Copy link
Author

Hedda commented Nov 7, 2024

There is an effort within Matter right now to define use cases for audio players / smart speakers.

@chrisdecenzo any updates on Matter Casting use cases to add music playback via dedicated audio players and smart speakers?

Read that Matter 1.4 seems to be adding messaging for Matter Casting but again is only focusing on video and smart displays.

Looks at least like HiFi Streamers / Music Streamers (network audio players) are trending and becoming more popular as products.

Again, there are no open standards/protocols for multi-room audio (distributed audio) systems made for home multiroom audio.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants