Merge pull request #191 from jhj0517/feature/add-api

Add fastapi script
jhj0517 · Dec 16, 2024 · a8b0006 · a8b0006
2 parents ca1f04d + 1146e9b
commit a8b0006
Show file tree

Hide file tree

Showing 50 changed files with 1,698 additions and 104 deletions.
diff --git a/.github/workflows/ci-shell.yml b/.github/workflows/ci-shell.yml
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -13,8 +13,7 @@ on:
       - intel-gpu
 
 jobs:
-  build:
-
+  test:
     runs-on: ubuntu-latest
     strategy:
       matrix:
@@ -40,4 +39,63 @@ jobs:
         run: pip install -r requirements.txt pytest jiwer
 
       - name: Run test
-        run: python -m pytest -rs tests
+        run: python -m pytest -rs tests
+
+  test-backend:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python: ["3.10", "3.11", "3.12"]
+
+    env:
+      DEEPL_API_KEY: ${{ secrets.DEEPL_API_KEY }}
+      TEST_ENV: true
+
+    steps:
+      - name: Clean up space for action
+        run: rm -rf /opt/hostedtoolcache
+
+      - uses: actions/checkout@v4
+      - name: Setup Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python }}
+
+      - name: Install git and ffmpeg
+        run: sudo apt-get update && sudo apt-get install -y git ffmpeg
+
+      - name: Install dependencies
+        run: pip install -r backend/requirements-backend.txt pytest pytest-asyncio jiwer
+
+      - name: Run test
+        run: python -m pytest -rs backend/tests
+
+  test-shell-script:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python: [ "3.10", "3.11", "3.12" ]
+
+    steps:
+      - name: Clean up space for action
+        run: rm -rf /opt/hostedtoolcache
+
+      - uses: actions/checkout@v4
+      - name: Setup Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python }}
+
+      - name: Install git and ffmpeg
+        run: sudo apt-get update && sudo apt-get install -y git ffmpeg
+
+      - name: Execute Install.sh
+        run: |
+          chmod +x ./Install.sh
+          ./Install.sh
+
+      - name: Execute start-webui.sh
+        run: |
+          chmod +x ./start-webui.sh
+          timeout 60s ./start-webui.sh || true
+
diff --git a/.github/workflows/publish-docker.yml b/.github/workflows/publish-docker.yml
@@ -6,7 +6,7 @@ on:
       - master
 
 jobs:
-  build-and-push:
+  build-and-push-webui:
     runs-on: ubuntu-latest
 
     steps:
@@ -38,3 +38,36 @@ jobs:
 
       - name: Log out of Docker Hub
         run: docker logout
+
+  build-and-push-backend:
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Clean up space for action
+        run: rm -rf /opt/hostedtoolcache
+
+      - name: Log in to Docker Hub
+        uses: docker/login-action@v2
+        with:
+          username: ${{ secrets.DOCKER_USERNAME }}
+          password: ${{ secrets.DOCKER_PASSWORD }}
+
+      - name: Checkout repository
+        uses: actions/checkout@v3
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v3
+
+      - name: Set up QEMU
+        uses: docker/setup-qemu-action@v3
+
+      - name: Build and push Docker image
+        uses: docker/build-push-action@v5
+        with:
+          context: .
+          file: ./backend/Dockerfile
+          push: true
+          tags: ${{ secrets.DOCKER_USERNAME }}/whisper-webui-backend:latest
+
+      - name: Log out of Docker Hub
+        run: docker logout
diff --git a/.gitignore b/.gitignore
@@ -2,13 +2,12 @@
 *.png
 *.mp4
 *.mp3
-.idea/
-.pytest_cache/
-venv/
-modules/ui/__pycache__/
+**/.env
+**/.idea/
+**/.pytest_cache/
+**/venv/
+**/__pycache__/
 outputs/
-modules/__pycache__/
 models/
 modules/yt_tmp.wav
-configs/default_parameters.yaml
-__pycache__/
+configs/default_parameters.yaml
diff --git a/README.md b/README.md
@@ -31,6 +31,9 @@ If you wish to try this on Colab, you can do it in [here](https://colab.research
       1. https://huggingface.co/pyannote/speaker-diarization-3.1
       2. https://huggingface.co/pyannote/segmentation-3.0
 
+### Pipeline Diagram
+![Transcription Pipeline](https://github.com/user-attachments/assets/1d8c63ac-72a4-4a0b-9db0-e03695dcf088)
+
 # Installation and Running
 
 - ## Running with Pinokio
@@ -81,7 +84,7 @@ Please follow the links below to install the necessary software:
 
 After installing FFmpeg, **make sure to add the `FFmpeg/bin` folder to your system PATH!**
 
-### Automatic Installation
+### Installation Using the Script Files
 
 1. git clone this repository
 ```shell
@@ -104,19 +107,14 @@ According to faster-whisper, the efficiency of the optimized whisper model is as
 If you want to use an implementation other than faster-whisper, use `--whisper_type` arg and the repository name.<br>
 Read [wiki](https://github.com/jhj0517/Whisper-WebUI/wiki/Command-Line-Arguments) for more info about CLI args.
 
-## Available models
-This is Whisper's original VRAM usage table for models.
+If you want to use a fine-tuned model, manually place the models in `models/Whisper/` corresponding to the implementation.
 
-|  Size  | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
-|:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:|
-|  tiny  |    39 M    |     `tiny.en`      |       `tiny`       |     ~1 GB     |      ~32x      |
-|  base  |    74 M    |     `base.en`      |       `base`       |     ~1 GB     |      ~16x      |
-| small  |   244 M    |     `small.en`     |      `small`       |     ~2 GB     |      ~6x       |
-| medium |   769 M    |    `medium.en`     |      `medium`      |     ~5 GB     |      ~2x       |
-| large  |   1550 M   |        N/A         |      `large`       |    ~10 GB     |       1x       |
+Alternatively, if you enter the huggingface repo id (e.g, [deepdml/faster-whisper-large-v3-turbo-ct2](https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2)) in the "Model" dropdown, it will be automatically downloaded in the directory.
 
+![image](https://github.com/user-attachments/assets/76487a46-b0a5-4154-b735-ded73b2d83d4)
 
-`.en` models are for English only, and the cool thing is that you can use the `Translate to English` option from the "large" models!
+# REST API
+If you're interested in deploying this app as a REST API, please check out [/backend](https://github.com/jhj0517/Whisper-WebUI/tree/master/backend).
 
 ## TODO🗓
 
@@ -126,7 +124,8 @@ This is Whisper's original VRAM usage table for models.
 - [x] Integrate with insanely-fast-whisper
 - [x] Integrate with whisperX ( Only speaker diarization part )
 - [x] Add background music separation pre-processing with [UVR](https://github.com/Anjok07/ultimatevocalremovergui)  
-- [ ] Add fast api script
+- [x] Add fast api script
+- [ ] Add CLI usages
 - [ ] Support real-time transcription for microphone
 
 ### Translation 🌐

diff --git a/app.py b/app.py
@@ -255,7 +255,7 @@ def launch(self):
                         files_audio = gr.Files(type="filepath", label=_("Upload Audio Files to separate background music"))
                         dd_uvr_device = gr.Dropdown(label=_("Device"), value=self.whisper_inf.music_separator.device,
                                                     choices=self.whisper_inf.music_separator.available_devices)
-                        dd_uvr_model_size = gr.Dropdown(label=_("Model"), value=uvr_params["model_size"],
+                        dd_uvr_model_size = gr.Dropdown(label=_("Model"), value=uvr_params["uvr_model_size"],
                                                         choices=self.whisper_inf.music_separator.available_models)
                         nb_uvr_segment_size = gr.Number(label="Segment Size", value=uvr_params["segment_size"],
                                                         precision=0)

diff --git a/backend/Dockerfile b/backend/Dockerfile
@@ -0,0 +1,35 @@
+FROM debian:bookworm-slim AS builder
+
+RUN apt-get update && \
+    apt-get install -y curl git python3 python3-pip python3-venv && \
+    rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/* && \
+    mkdir -p /Whisper-WebUI
+
+WORKDIR /Whisper-WebUI
+
+COPY backend/ backend/
+
+RUN python3 -m venv venv && \
+    . venv/bin/activate && \
+    pip install -U -r backend/requirements-backend.txt
+
+
+FROM debian:bookworm-slim AS runtime
+
+RUN apt-get update && \
+    apt-get install -y curl ffmpeg python3 && \
+    rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*
+
+WORKDIR /Whisper-WebUI
+
+COPY . .
+COPY --from=builder /Whisper-WebUI/venv /Whisper-WebUI/venv
+
+VOLUME [ "/Whisper-WebUI/models" ]
+VOLUME [ "/Whisper-WebUI/outputs" ]
+VOLUME [ "/Whisper-WebUI/backend" ]
+
+ENV PATH="/Whisper-WebUI/venv/bin:$PATH"
+ENV LD_LIBRARY_PATH=/Whisper-WebUI/venv/lib64/python3.11/site-packages/nvidia/cublas/lib:/Whisper-WebUI/venv/lib64/python3.11/site-packages/nvidia/cudnn/lib
+
+ENTRYPOINT ["uvicorn", "backend.main:app", "--host", "0.0.0.0", "--port", "8000"]
diff --git a/backend/README.md b/backend/README.md
@@ -0,0 +1,80 @@
+# Whisper-WebUI REST API
+REST API for Whisper-WebUI. Documentation is auto-generated upon deploying the app.
+<br>[Swagger UI](https://github.com/swagger-api/swagger-ui) is available at `app/docs` or root URL with redirection. [Redoc](https://github.com/Redocly/redoc) is available at `app/redoc`.
+
+# Setup and Installation
+
+Installation assumes that you are in the root directory of Whisper-WebUI
+
+1. Create `.env` in `backend/configs/.env`
+```
+HF_TOKEN="YOUR_HF_TOKEN FOR DIARIZATION MODEL (READ PERMISSION)"
+DB_URL="sqlite:///backend/records.db"
+```
+`HF_TOKEN` is used to download diarization model, `DB_URL` indicates where your db file is located. It is stored in `backend/` by default.
+
+2. Install dependency
+```
+pip install -r backend/requirements-backend.txt
+```
+
+3. Deploy the server with `uvicorn` or whatever. 
+```
+uvicorn backend.main:app --host 0.0.0.0 --port 8000
+```
+
+## Configuration
+You can set some server configurations in [config.yaml](https://github.com/jhj0517/Whisper-WebUI/blob/feature/add-api/backend/configs/config.yaml).
+<br>For example, initial model size for Whisper or the cleanup frequency and TTL for cached files.
+<br>If the endpoint generates and saves the file, all output files are stored in the `cache` directory, e.g. separated vocal/instrument files for `/bgm-separation` are saved in `cache` directory.
+
+## Docker
+The Dockerfile should be built when you're in the root directory of Whisper-WebUI.
+
+1. git clone this repository
+```
+git clone https://github.com/jhj0517/Whisper-WebUI.git
+```
+2. Mount volume paths with your local paths in `docker-compose.yaml`
+https://github.com/jhj0517/Whisper-WebUI/blob/1dd708ec3844dbf0c1f77de9ef5764e883dd4c78/backend/docker-compose.yaml#L12-L15
+3. Build the image
+```
+docker compose -f backend/docker-compose.yaml build
+```
+4. Run the container
+```
+docker compose -f backend/docker-compose.yaml up
+```
+
+5. Then you can read docs at `localhost:8000` (default port is set to `8000` in `docker-compose.yaml`) and run your own tests. 
+
+
+# Architecture
+
+![diagram](https://github.com/user-attachments/assets/37d2ab2d-4eb4-4513-bb7b-027d0d631971)
+
+The response can be obtained through [the polling API](https://docs.oracle.com/en/cloud/saas/marketing/responsys-develop/API/REST/Async/asyncApi-v1.3-requests-requestId-get.htm).
+Each task is stored in the DB whenever the task is queued or updated by the process.
+
+When the client first sends the `POST` request, the server returns an `identifier` to the client that can be used to track the status of the task. The task status is updated by the processes, and once the task is completed,  the client can finally obtain the result.
+
+The client needs to implement manual API polling to do this, this is the example for the python client:
+```python
+def wait_for_task_completion(identifier: str,
+                             max_attempts: int = 20,
+                             frequency: int = 3) -> httpx.Response:
+    """
+    Polls the task status every `frequency` until it is completed, failed, or the `max_attempts` are reached.
+    """
+    attempts = 0
+    while attempts < max_attempts:
+        task = fetch_task(identifier)
+        status = task.json()["status"]
+        if status == "COMPLETED":
+            return task["result"]
+        if status == "FAILED":
+            raise Exception("Task polling failed")
+        time.sleep(frequency)
+        attempts += 1
+    return None
+```
diff --git a/backend/__init__.py b/backend/__init__.py
diff --git a/backend/cache/cached_files_are_generated_here b/backend/cache/cached_files_are_generated_here