-
-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement automatic driver radio transcription #124
base: develop
Are you sure you want to change the base?
feat: implement automatic driver radio transcription #124
Conversation
@kyujin-cho is attempting to deploy a commit to the f1-dash Team on Vercel. A member of the Team first needs to authorize it. |
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Quick question don't we already have the audio file when it is sent via the live socket? Couldn't we just use that or am I missing something? |
@SpatzlHD AFAIK the pathname to the audio file - not the actual file - is the data client only receives via SSE. |
But there has to be an audio file to play it or? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this PR. I also wanted to implement this directly into f1-dash after seeing someone from the community discord do it with a local python server.
I was thinking about doing it with rust and web assembly but also using the whisper model. Not sure if performance would be any better.
Please take a look at my comments and also CORS also is weird to me because we can play the audio with no CORS problem.
Also currently the build is failing because of webkitAudioContext
not existing in the types.
dash/src/app/(nav)/settings/page.tsx
Outdated
|
||
const transcriptionStorage = localStorage.getItem("transcription"); | ||
const transcriptionSettings: TranscriptionSettings = transcriptionStorage ? JSON.parse(transcriptionStorage) : { enableTranscription: false, whisperModel: "" }; | ||
|
||
setEnableTranscription(transcriptionSettings.enableTranscription); | ||
setTranscriptionModel(transcriptionSettings.whisperModel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a separate Context for either transcription or settings in general would be better than adding it to the mode one. As its primarily used for the swishy thingy in the top right.
dash/src/app/(nav)/settings/page.tsx
Outdated
<select value={transcriptionModel} onChange={(s) => { | ||
setTranscriptionModel(s.target.value); | ||
handleTranscriptionSettingUpdate("model", s.target.value); | ||
}}> | ||
<option value="distil-whisper/distil-small.en">High Quality</option> | ||
<option value="Xenova/whisper-base">Balanced</option> | ||
<option value="Xenova/whisper-tiny">Low Latency</option> | ||
</select> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Id prefer to either style this a bit or even use the already installed headless library to create a complete custom f1-dash dropdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just replaced vanilla select
component with headless' one, but I really have no idea about extra styling (sorry I am a backend engineer :p) so I'd just leave the rest up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, I will do the styling :)
const audioRef = useRef<HTMLAudioElement | null>(null); | ||
const intervalRef = useRef<NodeJS.Timeout | null>(null); | ||
|
||
const [playing, setPlaying] = useState<boolean>(false); | ||
const [duration, setDuration] = useState<number>(10); | ||
const [progress, setProgress] = useState<number>(0); | ||
|
||
const transcriptionElement = useMemo(() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is useMemo used here, I am not too familiar with it so I am genuinely interested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically it is to minimize re-rendering DOM, so in this case we can benefit the cost of re-rendering by updating transcriptionElement
only when transcription
is changed. But it is my own coding style so it is totally up to you to leave it as is or just lift the useMemo().
</motion.li> | ||
); | ||
} | ||
|
||
const SkeletonTranscription = () => { | ||
const animateClass = "h-6 animate-pulse rounded-md bg-zinc-800"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems a bit tall, either do one or two thinner ones please
dash/src/lib/constants.ts
Outdated
function mobileTabletCheck() { | ||
// https://stackoverflow.com/questions/11381673/detecting-a-mobile-browser | ||
let check = false; | ||
(function (a: string) { | ||
if ( | ||
/(android|bb\d+|meego).+mobile|avantgo|bada\/|blackberry|blazer|compal|elaine|fennec|hiptop|iemobile|ip(hone|od)|iris|kindle|lge |maemo|midp|mmp|mobile.+firefox|netfront|opera m(ob|in)i|palm( os)?|phone|p(ixi|re)\/|plucker|pocket|psp|series(4|6)0|symbian|treo|up\.(browser|link)|vodafone|wap|windows ce|xda|xiino|android|ipad|playbook|silk/i.test( | ||
a, | ||
) || | ||
/1207|6310|6590|3gso|4thp|50[1-6]i|770s|802s|a wa|abac|ac(er|oo|s\-)|ai(ko|rn)|al(av|ca|co)|amoi|an(ex|ny|yw)|aptu|ar(ch|go)|as(te|us)|attw|au(di|\-m|r |s )|avan|be(ck|ll|nq)|bi(lb|rd)|bl(ac|az)|br(e|v)w|bumb|bw\-(n|u)|c55\/|capi|ccwa|cdm\-|cell|chtm|cldc|cmd\-|co(mp|nd)|craw|da(it|ll|ng)|dbte|dc\-s|devi|dica|dmob|do(c|p)o|ds(12|\-d)|el(49|ai)|em(l2|ul)|er(ic|k0)|esl8|ez([4-7]0|os|wa|ze)|fetc|fly(\-|_)|g1 u|g560|gene|gf\-5|g\-mo|go(\.w|od)|gr(ad|un)|haie|hcit|hd\-(m|p|t)|hei\-|hi(pt|ta)|hp( i|ip)|hs\-c|ht(c(\-| |_|a|g|p|s|t)|tp)|hu(aw|tc)|i\-(20|go|ma)|i230|iac( |\-|\/)|ibro|idea|ig01|ikom|im1k|inno|ipaq|iris|ja(t|v)a|jbro|jemu|jigs|kddi|keji|kgt( |\/)|klon|kpt |kwc\-|kyo(c|k)|le(no|xi)|lg( g|\/(k|l|u)|50|54|\-[a-w])|libw|lynx|m1\-w|m3ga|m50\/|ma(te|ui|xo)|mc(01|21|ca)|m\-cr|me(rc|ri)|mi(o8|oa|ts)|mmef|mo(01|02|bi|de|do|t(\-| |o|v)|zz)|mt(50|p1|v )|mwbp|mywa|n10[0-2]|n20[2-3]|n30(0|2)|n50(0|2|5)|n7(0(0|1)|10)|ne((c|m)\-|on|tf|wf|wg|wt)|nok(6|i)|nzph|o2im|op(ti|wv)|oran|owg1|p800|pan(a|d|t)|pdxg|pg(13|\-([1-8]|c))|phil|pire|pl(ay|uc)|pn\-2|po(ck|rt|se)|prox|psio|pt\-g|qa\-a|qc(07|12|21|32|60|\-[2-7]|i\-)|qtek|r380|r600|raks|rim9|ro(ve|zo)|s55\/|sa(ge|ma|mm|ms|ny|va)|sc(01|h\-|oo|p\-)|sdk\/|se(c(\-|0|1)|47|mc|nd|ri)|sgh\-|shar|sie(\-|m)|sk\-0|sl(45|id)|sm(al|ar|b3|it|t5)|so(ft|ny)|sp(01|h\-|v\-|v )|sy(01|mb)|t2(18|50)|t6(00|10|18)|ta(gt|lk)|tcl\-|tdg\-|tel(i|m)|tim\-|t\-mo|to(pl|sh)|ts(70|m\-|m3|m5)|tx\-9|up(\.b|g1|si)|utst|v400|v750|veri|vi(rg|te)|vk(40|5[0-3]|\-v)|vm40|voda|vulc|vx(52|53|60|61|70|80|81|83|85|98)|w3c(\-| )|webc|whit|wi(g |nc|nw)|wmlb|wonu|x700|yas\-|your|zeto|zte\-/i.test( | ||
a.substr(0, 4), | ||
) | ||
) | ||
check = true; | ||
})( | ||
navigator.userAgent || | ||
navigator.vendor || | ||
("opera" in window && typeof window.opera === "string" | ||
? window.opera | ||
: ""), | ||
); | ||
return check; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a huge fan of this regex stuff. is there any other way?
also the function does not belong in the constants file rather in a separate file, also make it an arrow function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detecting the device type helps the viewer to assume approx. size of the device's RAM, as it is the fatal point of concern when loading the model. But I totally agree with your take on the chunky implementation; how do you think about just replacing the whole logic with ua-parser-js library?
Co-authored-by: slowlydev <[email protected]>
Abstract
This patch adds new feature which displays transcript of every driver radio.
What's changed
live-backend
/api/audio
API addedAs F1TV's live timing CDN (
https://livetiming.formula1.com/static
) does not permit cross-origin requests, every calls to obtain the speech file should be proxied through the backend. Since routing every request to the file can marginally increase traffic burden of thelive-backend
(and also potential IP ban from F1TV CDN), I have decided to make this API only as an optional feature, which can be opted in by definingENABLE_AUDIO_FETCH
environment variable when loading the server process.dash
This pipeline accepts a sampled audio data and then inferences the transcription data with help of Transformers.js and OpenAI's Whisper Model. There are loads of whisper-based models, but based on my experiences, I have made three models as available option in this project (check
dash/src/app/(nav)/settings/page.tsx
). Those options will be labeled asMore Quality
,Balanced
andLow Latency
as it stands.That says, only the computational resource of the client browser will be affected when executing the pipeline; API backend will not take part of the process.