#165 Fixed audio generation in Windows OS issue: Normalize path separ…

…ators for cross-platform compatibility
souzatharsis · Nov 8, 2024 · 752f190 · 752f190
1 parent 94a8224
commit 752f190
Show file tree

Hide file tree

Showing 6 changed files with 11 additions and 7 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,6 +1,6 @@
 # Changelog
 
-## [0.3.1] - 2024-11-07
+## [0.3.3] - 2024-11-08
 
 ### Breaking Changes
 - Loading images from 'path' has been removed for security reasons. Please specify images by passing an 'url'.
@@ -15,6 +15,9 @@
 - Start TESTIMONIALS.md
 - Add apps using Podcastfy to README.md
 
+### Fixed
+- #165 Fixed audio generation in Windows OS issue: Normalize path separators for cross-platform compatibility
+
 ## [0.2.3] - 2024-10-15
 
 ### Added

diff --git a/README.md b/README.md
@@ -72,9 +72,12 @@ This sample collection is also [available at audio.com](https://audio.com/thatup
 ## Updates 🚀
 
 ### v0.3.0+ release
+- Generate podcasts from input topic using real-time internet search
 - Integrate with 100+ LLM models (OpenAI, Anthropic, Google etc) for transcript generation
 - Integrate with Google's Multispeaker TTS model for high-quality audio generation
 
+See [CHANGELOG](CHANGELOG.md) for more details.
+
 ## Quickstart 💻
 
 ### Prerequisites
@@ -108,8 +111,6 @@ python -m podcastfy.client --url <url1> --url <url2>
 
 - [CLI](usage/cli.md)
 
-- [Docker Image](usage/docker.md)
-
 - [How to](usage/how-to.md)
 
 Experience Podcastfy with our [HuggingFace](https://huggingface.co/spaces/thatupiso/Podcastfy.ai_demo) 🤗 Spaces app. (Note: This UI app is less extensively tested than the Python package.)

diff --git a/podcastfy/__init__.py b/podcastfy/__init__.py
@@ -1,2 +1,2 @@
 # This file can be left empty for now
-__version__ = "0.3.1"  # or whatever version you're on
+__version__ = "0.3.3"  # or whatever version you're on
diff --git a/podcastfy/text_to_speech.py b/podcastfy/text_to_speech.py
@@ -134,7 +134,7 @@ def _generate_audio_segments(self, text: str, temp_dir: str) -> List[str]:
             for speaker_type, content in [("question", question), ("answer", answer)]:
                 temp_file = os.path.join(
                     temp_dir, f"{idx}_{speaker_type}.{self.audio_format}"
-                )
+                ).replace('\\', '/')  # Normalize path separators for cross-platform compatibility
                 voice = provider_config.get("default_voices", {}).get(speaker_type)
                 model = provider_config.get("model")
 

diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "podcastfy"
-version = "0.3.1"
+version = "0.3.3"
 description = "An Open Source alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI"
 authors = ["Tharsis T. P. Souza"]
 license = "Apache-2.0"

diff --git a/usage/conversation_custom.md b/usage/conversation_custom.md
@@ -187,7 +187,7 @@ creativity: 0.7
 - The `word_count` is a target, and the AI may generate more or less than the specified word count. Low word counts are more likely to generate high-level discussions, while high word counts are more likely to generate detailed discussions.
 - The `output_language` defines both the language of the transcript and the language of the audio. Here's some relevant information:
   - Bottom-line: non-English transcripts are good enough but non-English audio is work-in-progress.
-  - Transcripts are generated using Google's Gemini 1.5 Pro, which supports 100+ languages by default.
+  - Transcripts are generated using Google's Gemini 1.5 Pro by default, which supports 100+ languages. Other user-defined models may or may not support non-English languages.
   - Audio is generated using `openai` (default), `elevenlabs`, `gemini`,or `edge` TTS models. 
     - The `gemini`(Google) TTS model is English only.
     - The `openai` TTS model supports multiple languages automatically, however non-English voices still present sub-par quality in my experience.