Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

同时使用audio跟speech是否冗余了 #87

Open
miumiuc opened this issue Nov 30, 2024 · 2 comments
Open

同时使用audio跟speech是否冗余了 #87

miumiuc opened this issue Nov 30, 2024 · 2 comments

Comments

@miumiuc
Copy link

miumiuc commented Nov 30, 2024

根据video-salmonn的代码,speech是将语音信号转为梅尔语谱图,在用whisper的encoder提取特征,encoder中的操作是卷积+transformer;audio是将语音信号转为fbank特征,再用BEATs的encoder提取特征,也是一个transformer结构,请问这样提取的audio特征跟speech特征意义是有什么不同吗

@TCL606
Copy link
Member

TCL606 commented Dec 2, 2024

只是由于 Whisper 和 BEATs 接受的输入特征不同

@miumiuc
Copy link
Author

miumiuc commented Dec 2, 2024

只是由于 Whisper 和 BEATs 接受的输入特征不同

那请问为什么这里音频数据要采用两种处理方法呢(whisper和BEATs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants