Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add method to disable speech plugin from python #213

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ShawnBuckley
Copy link

Adds the ability to shut down speech recognition plugin via python.

The problem I ran into was when adding a push to talk voice recognition was that while the push to talk button was not held speech was being recognized and queued internally, and as soon as the push to talk button was pressed all previously said phrases that should not have been recognized would fire.

This change would allow a user to configure speech recognition in a push to talk fashion like this:

if joystick[0].getDown(0):
  if speech.said("form on me"):
    # execute keyboard macro
else:
  speech.disable()

Rapidly enabling and disabling the speech recognition engine doesn't seem to have any performance impact.

@AndersMalmgren
Copy link
Owner

Hi, there must be a softer way of disabling speech recognition than disposing the whole thing?

@ShawnBuckley
Copy link
Author

Sure. The goal I'd like to shoot for is to have speech.said not return true for statements that were said before speech.said was called. I'm going to look at other ways of accomplishing that. I think it would be nice to not have the recognition engine running while the user disabled/paused recognition but it is not the important feature.

@MarijnS95
Copy link
Contributor

MarijnS95 commented Mar 28, 2022

It seems the recognition engine is specifically programmed to look for specific words, ie. after speech.said() is initially called. Perhaps this API needs to be reworked to allow removing words/grammar from the recognition engine again with UnloadGrammar(Grammar) instead of disposing the whole thing, and stopping recognition if no active grammars are loaded through RecognizeAsyncStop().

This could either be through some speech.stop("form on me") or with an object representing the Grammar:

form_on_me = None
if joystick[0].getDown(0):
  if form_on_me is None:
    form_on_me = speech.recognize("form on me")
  elif form_on_me.said():
    # execute keyboard macro
    form_on_me.reset() # Or stop(). What if it's said again? You usually don't want the macro to be executed every time FreePIE iterates the script
else:
  form_on_me and form_on_me.stop()
  form_on_me = None

Could also have a .start() function and keep it initialized through speech.recognize just once.

@ShawnBuckley
Copy link
Author

Another way of doing it could be have an active bool within the speech class that causes recognition by the recognitionEngine to be ignored. This would allow the user a minimal change to the python API, allow for using a push to talk style of recognition, and not dispose of the recognition enable to pause recognition.

It could look something like this:

speech.enable_recognition(joystick[0].getDown(0))
if speech.said("form on me"):
  # execute keyboard macro

@ShawnBuckley ShawnBuckley force-pushed the disable-speech-recognition-support branch from b21a376 to ce3ccf7 Compare March 28, 2022 14:43
@ShawnBuckley
Copy link
Author

Made the change to support the example above. The plugin can still be fully stopped via the python interface but is not needed to pause the recognition engine from running to support to push to recognize.

@AndersMalmgren
Copy link
Owner

That seems much better!
I think we should remove stop. It makes no sense having the contrl to dispose the API from python.

More so since there is no corresponding start method. And no other plugin have exposed start/stop methods.

@MarijnS95
Copy link
Contributor

Instead of Stop() should there perhaps be a Reset() method to get rid of all programmed grammars and mark all of recognizerResults as false (or empty the dict entirely)?

Currently it seems possible to call speech.said() once, the text is recognized but .said() is never called to check it again and set Result back to false, then after the user disables and re-enables the recognizer they will still receive true from speech.said() if called again. But perhaps that is intended.

In addition this enable function could use RecognizeAsyncStop() mentioned above to save some resources while FreePIE is ignoring results?

@AndersMalmgren
Copy link
Owner

Instead of Stop() should there perhaps be a Reset() method to get rid of all programmed grammars and mark all of recognizerResults as false (or empty the dict entirely)?

Currently it seems possible to call speech.said() once, the text is recognized but .said() is never called to check it again and set Result back to false, then after the user disables and re-enables the recognizer they will still receive true from speech.said() if called again. But perhaps that is intended.

In addition this enable function could use RecognizeAsyncStop() mentioned above to save some resources while FreePIE is ignoring results?

It was made so it would be as simple as possible for a non programmer.

@AndersMalmgren
Copy link
Owner

Maybe we should in DoBeforeNextExecute take all results and cache them until next DoBeforeNextExecute. And clear the dictionary. This way you only get one "frame" worth of results

@AndersMalmgren
Copy link
Owner

That might be a better solution becasue then you dont even need the enable feature

1 similar comment
@AndersMalmgren
Copy link
Owner

That might be a better solution becasue then you dont even need the enable feature

@MarijnS95
Copy link
Contributor

It was made so it would be as simple as possible for a non programmer.

For the record I wasn't commenting on the original design at all!

Just evaluating a bunch of scenarios and thinking up how we can make this as simple yet predictable as possible.

Maybe we should in DoBeforeNextExecute take all results and cache them until next DoBeforeNextExecute. And clear the dictionary. This way you only get one "frame" worth of results

Yeah, I think I like resetting it after a frame directly instead of resetting it only when speech.said was called. However, that may have severe implications when called in if blocks and might break existing scripts that rely on the current semantics.

@AndersMalmgren
Copy link
Owner

It was made so it would be as simple as possible for a non programmer.

For the record I wasn't commenting on the original design at all!

Just evaluating a bunch of scenarios and thinking up how we can make this as simple yet predictable as possible.

Maybe we should in DoBeforeNextExecute take all results and cache them until next DoBeforeNextExecute. And clear the dictionary. This way you only get one "frame" worth of results

Yeah, I think I like resetting it after a frame directly instead of resetting it only when speech.said was called. However, that may have severe implications when called in if blocks and might break existing scripts that rely on the current semantics.

Yep its for sure a breaking change. Might be worth it though

@ShawnBuckley ShawnBuckley force-pushed the disable-speech-recognition-support branch from ce3ccf7 to c3813c0 Compare March 30, 2022 22:21
@ShawnBuckley
Copy link
Author

Slimmed down the change to just include the speech.enableRecognition change

@ShawnBuckley
Copy link
Author

@AndersMalmgren would there be anything else you'd like to see in this PR before it can be pulled?

@AndersMalmgren
Copy link
Owner

@AndersMalmgren would there be anything else you'd like to see in this PR before it can be pulled?

I want to do a POC on frame capture of speech input. I think it's a cleaner approach. The current state of the plugin is counter intuitive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants