ComfyUI_FL-SongGen

ComfyUI_FL-SongGen
★ 59

歌曲生成人声合成风格迁移ComfyUI节点
基于腾讯SongGeneration(LeVo)模型的ComfyUI节点,用歌词生成完整歌曲(人声+伴奏),支持风格迁移、预设、自动下载模型与最长4分30秒输出。
💡 根据分段歌词和风格参考生成完整人声伴奏歌曲。
🍴 13 Forks💻 Python🔄 2026-01-24
📦
网盘下载
复制链接后前往夸克网盘下载
https://pan.quark.cn/s/e00a65475347
📦 requirements.txt
torch>=2.0.0
torchaudio>=2.0.0
omegaconf>=2.3.0
transformers>=4.37.0
einops>=0.8.0
einops-exts>=0.0.4
diffusers>=0.27.0
huggingface-hub>=0.25.0
librosa>=0.11.0
soundfile>=0.12.0
numpy>=1.24.0
safetensors>=0.4.0
filelock>=3.13.0
openunmix>=1.3.0
alias-free-torch>=0.0.6
descript-audio-codec>=1.0.0
julius>=0.2.7
lameenc>=1.8.1
k-diffusion>=0.1.1
vector-quantize-pytorch>=1.14.0
x-transformers>=2.0.0
packaging>=21.0
Workflow Preview
📄 README

FL Song Gen

AI-powered song generation nodes for ComfyUI based on Tencent’s SongGeneration (LeVo) model. Generate complete songs with vocals and instrumentals from lyrics.

[](https://github.com/AslpLab/SongGeneration)

[](https://www.patreon.com/Machinedelusions)


Features

  • Full Song Generation – Complete songs with vocals and instrumentals
  • Dual-Track Output – Separate vocal and background music tracks
  • Lyrics-to-Song – Structured lyrics with sections (verse, chorus, bridge, intro, outro)
  • Style Transfer – Use reference audio to guide the musical style
  • Text Descriptions – Control gender, timbre, genre, emotion, and BPM
  • Auto Style Presets – Quick generation with Pop, Rock, Jazz, and more
  • Long-Form Generation – Up to 4 minutes 30 seconds per song
  • Automatic Downloads – Models download automatically on first use

  • Nodes

    | Node | Description |

    |——|————-|

    | Model Loader | Load SongGeneration model with memory options |

    | Lyrics Formatter | Build properly formatted lyrics from sections |

    | Description Builder | Create style descriptions from components |

    | Generate | Main generation with text conditioning |

    | Style Transfer | Generate using reference audio for style |

    | Auto Style | Generate with preset style prompts |


    Installation

    ComfyUI Manager (Recommended)

    Search for “FL Song Gen” in ComfyUI Manager and install.

    Manual Installation

    cd ComfyUI/custom_nodes
    git clone https://github.com/filliptm/ComfyUI_FL-SongGen.git
    cd ComfyUI_FL-SongGen
    pip install -r requirements.txt


    Models

    Models download automatically on first use to ComfyUI/models/songgen/.

    | Model | Max Duration | VRAM | Languages |

    |——-|:————:|:—-:|———–|

    | songgeneration_base | 2m 30s | 10-16 GB | Chinese |

    | songgeneration_base_new | 2m 30s | 10-16 GB | Chinese, English |

    | songgeneration_base_full | 4m 30s | 12-18 GB | Chinese, English |

    | songgeneration_large | 4m 30s | 22-28 GB | Chinese, English |

    Note: VRAM range shows low memory mode vs normal mode. Enable low_mem in the Model Loader for reduced VRAM usage.


    Quick Start

  • Add FL Song Gen Model Loader and select model variant
  • Add FL Song Gen Lyrics Formatter to build your lyrics
  • Add FL Song Gen Description Builder for style (optional)
  • Connect to FL Song Gen Generate node
  • Connect outputs to audio save/preview nodes

  • Prompting Guide

    Getting the best results requires understanding how to format lyrics and descriptions properly.

    Lyrics Format

    Basic Structure

    Lyrics use section tags separated by ; (space-semicolon-space) with phrases separated by periods .:

    [intro-short] ; [verse] First line. Second line. Third line ; [chorus] Chorus line one. Chorus line two ; [outro-short]

    Structure Labels

    Instrumental sections (no lyrics):

    | Tag | Duration | Description |

    |—–|:——–:|————-|

    | [intro-short] | ~0-10s | Short instrumental intro |

    | [intro-medium] | ~10-20s | Medium instrumental intro |

    | [inst-short] | ~0-10s | Short instrumental break |

    | [inst-medium] | ~10-20s | Medium instrumental break |

    | [outro-short] | ~0-10s | Short instrumental outro |

    | [outro-medium] | ~10-20s | Medium instrumental outro |

    Lyrical sections (lyrics required):

    | Tag | Description |

    |—–|————-|

    | [verse] | Verse – typically tells the story |

    | [chorus] | Chorus – the catchy, repeated hook |

    | [bridge] | Bridge – contrasting part before final chorus |

    Formatting Rules

  • Sections are separated by ; (with spaces)
  • Lyrics within sections are separated by periods .
  • Each period represents a phrase/line break
  • Do NOT add lyrics to instrumental tags
  • Complete Song Example

    [intro-short] ; [verse] These faded memories of us. I can't erase the tears you cried before. Unchained this heart to find its way. My peace won't beg you to stay ; [chorus] Like a fool begs for supper. I find myself waiting for her. Only to find the broken pieces of my heart. That was needed for my soul to love again ; [inst-short] ; [verse] Silhouettes where you once stood. Life's rhythm changed its beat for good. Numb to whispers we once knew. My path won't circle back to you ; [chorus] Like a fool begs for supper. I find myself waiting for her. Only to find the broken pieces of my heart. That was needed for my soul to love again ; [outro-short]

    Style Descriptions

    Format

    "gender, timbre, genre, emotion, instruments, the bpm is X"

    All dimensions are optional and can be combined in any order.

    Available Options

    | Dimension | Options |

    |———–|———|

    | Gender | male, female |

    | Timbre | dark, bright, warm, soft, rock |

    | Genre | pop, rock, jazz, hip hop, R&B, folk, electronic, blues, country, classical, soul, reggae, k-pop |

    | Emotion | sad, happy, emotional, angry, uplifting, romantic, melancholic, intense |

    | Instruments | See list below |

    | BPM | the bpm is 120 (use this exact phrase format) |

    Common Instrument Combinations

  • piano and drums
  • guitar and drums
  • synthesizer and piano
  • acoustic guitar and piano
  • piano and strings
  • guitar and synthesizer
  • piano and saxophone
  • electric guitar and drums
  • synthesizer and drums
  • acoustic guitar and drums
  • Example Descriptions

    female, warm, pop, emotional, piano and drums, the bpm is 120

    male, dark, hip hop, sad, synthesizer and drums

    female, bright, jazz, romantic, piano and saxophone, the bpm is 90

    male, rock, intense, electric guitar and drums, the bpm is 140

    Auto Style Presets

    When using Auto Style mode, select from these presets:

    | Preset | Description |

    |——–|————-|

    | Pop | Modern pop music |

    | R&B | Rhythm and blues |

    | Dance | Electronic dance music |

    | Jazz | Jazz style |

    | Folk | Folk/acoustic |

    | Rock | Rock music |

    | Chinese Style | Modern Chinese pop |

    | Chinese Tradition | Traditional Chinese music |

    | Chinese Opera | Chinese opera style |

    | Metal | Heavy metal |

    | Reggae | Reggae style |

    | Auto | Let the model choose |

    Style Transfer (Reference Audio)

    Use a 10-second reference audio to guide the musical style:

  • Only the first 10 seconds of the audio will be used
  • Using the chorus section of a reference song works best
  • Influences: genre, instrumentation, rhythm, and voice characteristics
  • Combining with Descriptions

    You can optionally provide a text description alongside reference audio to further guide the generation. This can be useful to:

  • Specify voice gender when the reference audio is ambiguous
  • Add specific emotions or timbres
  • Set a specific BPM
  • Note: If the description conflicts with the reference audio style, results may be unpredictable. Use complementary descriptions for best results.

    Tips for Better Results

    Lyrics Tips

  • Keep phrases natural and singable
  • Use repetition strategically, especially in the chorus
  • Match syllable counts roughly between verses
  • Use emotionally evocative language
  • Description Tips

  • Use commas to separate attributes
  • Stick to predefined tags for best results
  • Don’t overload with too many conflicting descriptors
  • BPM must use the exact format: the bpm is X
  • General Tips

  • Start with shorter songs to test your prompts
  • The base_new model is recommended for English lyrics
  • Enable low_mem mode if you’re running low on VRAM
  • Instrumental sections help create natural song flow

  • Requirements

    | Requirement | Specification |

    |————-|—————|

    | Python | 3.10+ |

    | CUDA | 11.8+ (for GPU acceleration) |

    | RAM | 16 GB minimum (32 GB+ recommended) |

    | VRAM | 10-28 GB (depends on model) |

    Note: CPU-only mode is supported but very slow. Mac MPS may have limited support.


    License

    Apache 2.0


    Credits

    Based on SongGeneration (LeVo) by Tencent AI Lab.