einops>=0.8.1 einx>=0.3.0 omegaconf>=2.3.0 soundfile>=0.12.1 torch>=2.0.0 transformers>=4.30.0 huggingface_hub>=0.19.0 torchaudio>=0.13.0 librosa>=0.10.0 sounddevice>=0.4.6 scipy>=1.10.0 numpy>=1.24.0 soxr>=0.3.5


ComfyUI_SparkTTS is a custom ComfyUI node implementation of SparkTTS, an advanced text-to-speech system that harnesses the power of large language models (LLMs) to generate highly accurate and natural-sounding speech.
ComfyUI-SparkTTS provides the following main functionalities:
Comfyui-SparkTTS and installinstall requirment.txt in the ComfyUI-SparkTTS folder
“`bash
./ComfyUI/python_embeded/python -m pip install -r requirements.txt
“`
“`bash
cd ComfyUI/custom_nodes
git clone https://github.com/1038lab/ComfyUI-SparkTTS
“`
install requirment.txt in the ComfyUI-SparkTTS folder
“`bash
./ComfyUI/python_embeded/python -m pip install -r requirements.txt
“`
Ensure pip install comfy-cli is installed.
Installing ComfyUI comfy install (if you don’t have ComfyUI Installed)
install the ComfyUI-SparkTTS, use the following command:
“`bash
comfy node registry-install Comfyui-Spark-TTS
“`
install requirment.txt in the ComfyUI-SparkTTS folder
“`bash
./ComfyUI/python_embeded/python -m pip install -r requirements.txt
“`
ComfyUI/models/TTS/SparkTTS/ when first time using the custom node.ComfyUI/models/TTS/SparkTTS/ folder.This node allows you to create a customized voice by adjusting parameters.
Inputs:
text: Text to synthesize.gender: Gender of the voice (female or male).pitch: Pitch level of the voice (very_low, low, moderate, high, very_high).speed: Speed level of the voice (very_low, low, moderate, high, very_high).batch_texts (optional): Additional texts for better control over pacing and intonation.Outputs:
audio: Generated audio with the customized voice.This node allows you to clone a voice from a reference audio sample.
Inputs:
text: Text to synthesize with the cloned voice.reference_audio: The audio sample to clone the voice from.reference_text: Transcript of the reference audio to improve cloning quality.max_tokens: Controls the maximum length of generated speech.batch_texts (optional): Additional texts for better control over pacing and intonation.Outputs:
audio: Generated audio with the cloned voice.This node allows you to clone a voice from a reference audio with control over pitch and speed.
Inputs:
text: Text to synthesize with the cloned voice.reference_audio: The audio sample to clone the voice from.reference_text: Transcript of the reference audio to improve cloning quality.pitch: Pitch level of the voice.speed: Speed level of the voice.max_tokens: Controls the maximum length of generated speech.batch_texts (optional): Additional texts for better control over pacing and intonation.Outputs:
audio: Generated audio with the cloned voice.This node allows you to directly record audio.
Inputs:
recording: Set to True to start recording audio.recording_duration: Recording duration in seconds.sample_rate: Audio sample rate.noise_threshold: Noise reduction threshold.smoothing_kernel_size: Size of the kernel used for smoothing the audio signal.Outputs:
audio: Recorded audio data.Check the example_workflows directory for example workflows.
SparkTTS currently supports the following languages:
GPL-3.0 License