descript-audio-codec huggingface-hub numpy pydantic soundfile torch torchaudio triton
Make Dia avialbe in ComfyUI.
A TTS model capable of generating ultra-realistic dialogue in one pass.
Dia is a 1.6B parameter text to speech model created by Nari Labs. Dia directly generates highly realistic dialogue from a transcript. You can condition the output on audio, enabling emotion and tone control. The model can also produce nonverbal communications like laughter, coughing, clearing throat, etc.
cd ComfyUI/custom_nodes
git clone https://github.com/Yuan-ManX/ComfyUI-Dia.git
cd ComfyUI-Dia
pip install -r requirements.txt
Pretrained model checkpoints – The model weights are hosted on Hugging Face. The model only supports English generation at the moment.