Skip to content

Commit

Permalink
#165 Fixed audio generation in Windows OS issue: Normalize path separ…
Browse files Browse the repository at this point in the history
…ators for cross-platform compatibility
  • Loading branch information
souzatharsis committed Nov 8, 2024
1 parent 94a8224 commit 752f190
Show file tree
Hide file tree
Showing 6 changed files with 11 additions and 7 deletions.
5 changes: 4 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Changelog

## [0.3.1] - 2024-11-07
## [0.3.3] - 2024-11-08

### Breaking Changes
- Loading images from 'path' has been removed for security reasons. Please specify images by passing an 'url'.
Expand All @@ -15,6 +15,9 @@
- Start TESTIMONIALS.md
- Add apps using Podcastfy to README.md

### Fixed
- #165 Fixed audio generation in Windows OS issue: Normalize path separators for cross-platform compatibility

## [0.2.3] - 2024-10-15

### Added
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,9 +72,12 @@ This sample collection is also [available at audio.com](https://audio.com/thatup
## Updates 🚀

### v0.3.0+ release
- Generate podcasts from input topic using real-time internet search
- Integrate with 100+ LLM models (OpenAI, Anthropic, Google etc) for transcript generation
- Integrate with Google's Multispeaker TTS model for high-quality audio generation

See [CHANGELOG](CHANGELOG.md) for more details.

## Quickstart 💻

### Prerequisites
Expand Down Expand Up @@ -108,8 +111,6 @@ python -m podcastfy.client --url <url1> --url <url2>

- [CLI](usage/cli.md)

- [Docker Image](usage/docker.md)

- [How to](usage/how-to.md)

Experience Podcastfy with our [HuggingFace](https://huggingface.co/spaces/thatupiso/Podcastfy.ai_demo) 🤗 Spaces app. (Note: This UI app is less extensively tested than the Python package.)
Expand Down
2 changes: 1 addition & 1 deletion podcastfy/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# This file can be left empty for now
__version__ = "0.3.1" # or whatever version you're on
__version__ = "0.3.3" # or whatever version you're on
2 changes: 1 addition & 1 deletion podcastfy/text_to_speech.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ def _generate_audio_segments(self, text: str, temp_dir: str) -> List[str]:
for speaker_type, content in [("question", question), ("answer", answer)]:
temp_file = os.path.join(
temp_dir, f"{idx}_{speaker_type}.{self.audio_format}"
)
).replace('\\', '/') # Normalize path separators for cross-platform compatibility
voice = provider_config.get("default_voices", {}).get(speaker_type)
model = provider_config.get("model")

Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "podcastfy"
version = "0.3.1"
version = "0.3.3"
description = "An Open Source alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI"
authors = ["Tharsis T. P. Souza"]
license = "Apache-2.0"
Expand Down
2 changes: 1 addition & 1 deletion usage/conversation_custom.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ creativity: 0.7
- The `word_count` is a target, and the AI may generate more or less than the specified word count. Low word counts are more likely to generate high-level discussions, while high word counts are more likely to generate detailed discussions.
- The `output_language` defines both the language of the transcript and the language of the audio. Here's some relevant information:
- Bottom-line: non-English transcripts are good enough but non-English audio is work-in-progress.
- Transcripts are generated using Google's Gemini 1.5 Pro, which supports 100+ languages by default.
- Transcripts are generated using Google's Gemini 1.5 Pro by default, which supports 100+ languages. Other user-defined models may or may not support non-English languages.
- Audio is generated using `openai` (default), `elevenlabs`, `gemini`,or `edge` TTS models.
- The `gemini`(Google) TTS model is English only.
- The `openai` TTS model supports multiple languages automatically, however non-English voices still present sub-par quality in my experience.
Expand Down

0 comments on commit 752f190

Please sign in to comment.