dragonfly-grammars/README.md

67 lines
2.8 KiB
Markdown
Raw Normal View History

# Dragonfly Grammars for Coding by Voice
[![N|Solid](https://gitlab.com/uploads/-/system/project/avatar/13627082/dragonfly-161745.png?width=64)](https://gitlab.com/onecybernomad/dragonfly-grammars)
A simple way to setup voice coding in Linux, and Windows using Kaldi on Linux and WSR on Windows
_Grammar avalible for Python, Javascript, React.js, HTML, CSS, Java, and C#_
# Setup an installation
- Clone this repo using GIT
- If you haven't installed dragonfly yet run this command
-- For linux
```sh
pip install 'dragonfly2[kaldi]'
```
-- For windows
```sh
pip install dragonfly2
```
- Install a Kaldi model from (https://github.com/daanzu/kaldi-active-grammar/releases)
-- I recommend kaldi_model_daanzu_20211030-biglm.zip
-- If you are feeling risky you can train your own model (http://jrmeyer.github.io/asr/2016/12/15/DNN-AM-Kaldi.html)
-- download and place into the directory you clone the project.
- On Linux you may need install some extra software
```sh
sudo apt install wmctrl xdotool xsel
```
- You may now try it out
```sh
python kaldi_module_loader_plus.py
```
- On some linux distributions you may need to use this command instead
```sh
python3 kaldi_module_loader_plus.py
```
- On Windows you may need to use the python loader py
```sh
py -3.6 kaldi_module_loader_plus.py
```
## How to use
- Once it is running say wake up. If everything is working you should hear a voice prompt stating that it's awake.
- Say enable [language of your choice]. You should hear a voice prompt stating that the language has been activated
- You can combine some grammars just be aware that some may have conflicting rules
-- For example I often combine HTML and Javascript
- You can manipulate the cursor by saying up, down, left, right plus the number of times you want it to move
- You can combine commands for example "shift" + "direction" + "number of moves" or "ctrl" + "up/down" + "number of moves"
## Tech
This project uses:
- [Python] - An awesome and powerful interpreted programming language
- [Dragonfly2] - A framework that allows you to code by voice
- [Festival] - A tts engine
- [Kaldi] - A open source speech recognition tool kit
- WSR - Microsoft Windows built in speech recognition tool kit
- [Tkinter](https://docs.python.org/3/library/tkinter.html) - A GUI for programming user interfaces in Python
## License
MIT
[//]: # (These are reference links used in the body of this note and get stripped out when the markdown processor does its job. There is no need to format nicely because it shouldn't be seen. Thanks SO - http://stackoverflow.com/questions/4823468/store-comments-in-markdown-syntax)
[Python]: <https://www.python.org/>
[Festival]: <https://www.cstr.ed.ac.uk/projects/festival/>
[Dragonfly2]: <https://github.com/dragonflyoss/Dragonfly2>
[Kaldi]: <https://www.kaldi-asr.org/>