Speech Note 4.8.0

Linux Desktop

Video presentation of all new features: https://www.youtube.com/watch?v=ww6skKOOzZ8

Changes:

General
- Case-sensitive matching in Rules
User Interface
- Speech Note has been translated into Arabic, Catalan, Spanish, Turkish and French-Canadian languages.
- Command line option and DBus API for exporting synthesized speech to an audio file instead of playing it aloud. Use --output-file together with start-reading-clipboard or start-reading-text actions.
Speech to Text
- New CrisperWhisper model for FasterWhisper engine. CrisperWhisper is designed for fast, precise, and verbatim speech recognition with accurate word-level timestamps. Unlike the original Whisper, which tends to omit disfluencies and follows more of a intended transcription style, CrisperWhisper aims to transcribe every spoken word exactly as it is, including fillers, pauses, stutters and false starts. CrisperWhisper model is enabled only for English and German languages.
- New KBLab Whisper models for Swedish. The National Library of Sweden has released fine-tuned STT models trained on its library collections. The models have significantly improved accuracy compared to regular Whisper models.
- FUTO Whisper models. New models used in the FUTO mobile keyboard app.
- Using an existing note as the initial context in decoding. This has the potential to improve transcription quality and reduce "hallucination" problem. If you observe a degradation in quality, turn off the Use note as context option.
- Option to pause listening while processing. This option can be useful when Listening mode is Always on. By default, listening continues even when a piece of audio data is being processed. Using this option, you can temporarily pause listening for the duration of processing.
- Option to play an audible tone when starting and stopping listening
Text to Speech
- Kokoro TTS engine. Kokoro is a compact yet powerful open-source multilingual TTS engine. Despite its modest size (trained on less than 100 hours of audio), it delivers impressive results. Kokoro voices are enabled for: English, Chinese, Japanese, Hindi, Italian, French, Spanish and Portuguese.
- F5-TTS engine. The F5-TTS provides exceptional voice cloning capabilities. The currently enabled model works with English and Chinese languages. F5-TTS works best with CUDA acceleration. CPU only processing can be very slow.
- Parler-TTS engine. Parler-TTS can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc). The speaker's characteristics are defined by a text description (prompt). To use Parler-TTS models, you need to configure a Text voice profile. This can be done in the Voice profiles menu. Parler-TTS primarily supports English, but a multilingual model for French, Spanish, Portuguese, Polish, German, Dutch and Italian is also included. Currently, the multilingual model provides rather poor quality and not entirely usable speech. Parler-TTS works best with CUDA acceleration. CPU only processing can be very slow.
- S.A.M. TTS engine. S.A.M. is a small speech synthesizer designed for the Commodore 64. It features a robotic voice that evokes a strong sense of nostalgia. The S.A.M. voice is available in English only.
- Normalize audio setting option. Use this option to enable/disable audio volume normalization. The volume is normalized independently for each sentence, which can lead to unstable volume levels in different sentences. Disable this option if you observe this problem.
- New Piper voices for Dutch, Finnish, German and Luxembourgish
- New RHVoice voice for Spanish
- Updated RHVoice voice for Czech
Translator
- New models: English to Chinese, English to Arabic, Arabic to English, English to Korean, English to Japanese
Accessibility (Wayland)
- Support for Insert into active window under Wayland. Using start-listening-active-window or start-listening-translate-active-window actions you can directly insert the decoded text into any window which is currently in focus. This feature worked under X11 only, but now it is also supported under Wayland. For actions to work, ydotool daemon must be installed and running. If you are using Flatpak, also make sure that the application has permission to access ydotool daemon socket file.
- Support for Global keyboard shortcuts under Wayland. Global keyboard shortcuts allow you to start or stop listening and reading using keyboard even when the application is not active (e.g. minimized or in the background). Until now, this capability was only available under X11. Now integration with XDG Desktop Portal has been added, making global keyboard shortcuts possible also under Wayland. For shortcuts to work, your desktop environment has to support GlobalShortcuts interface on XDG Desktop Portal service. Right now, GlobalShortcuts is only supported in KDE Plasma and latest GNOME.
Flatpak
- Python support enabled in Tiny and ARM packages. Python libraries are not included in Tiny or ARM packages, but using the Location of Python libraries option, you can set an external directory that contains the libraries. Make sure that the Flatpak application has permissions to access this directory.
- Flatpak runtime update to version 5.15-24.08

Sailfish OS

Changes:

User Interface
- Speech Note has been translated into Arabic, Catalan, Spanish, Turkish and French-Canadian languages.
Speech to Text
- New KBLab Whisper models for Swedish. The National Library of Sweden has released fine-tuned STT models trained on its library collections. The models have significantly improved accuracy compared to regular Whisper models.
- FUTO Whisper models. New models used in the FUTO mobile keyboard app.
- Using an existing note as the initial context in decoding. This has the potential to improve transcription quality and reduce "hallucination" problem. If you observe a degradation in quality, turn off the Use note as context option.
- Option to pause listening while processing. This option can be useful when Listening mode is Always on. By default, listening continues even when a piece of audio data is being processed. Using this option, you can temporarily pause listening for the duration of processing.
- Option to play an audible tone when starting and stopping listening
Text to Speech
- S.A.M. TTS engine. S.A.M. is a small speech synthesizer designed for the Commodore 64. It features a robotic voice that evokes a strong sense of nostalgia. The S.A.M. voice is available in English only.
- Normalize audio setting option. Use this option to enable/disable audio volume normalization. The volume is normalized independently for each sentence, which can lead to unstable volume levels in different sentences. Disable this option if you observe this problem.
- New Piper voices for Dutch, Finnish, German and Luxembourgish
- New RHVoice voice for Spanish
- Updated RHVoice voice for Czech
Translator
- New models: English to Chinese, English to Arabic, Arabic to English, English to Korean, English to Japanese

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speech Note 4.8.0

Linux Desktop

Sailfish OS

Uh oh!