ComfyUI_Qwen2-Audio-7B-Instruct-Int4

ComfyUI_Qwen2-Audio-7B-Instruct-Int4
★ 16

音频理解多模态指令式生成Int4量化
在 ComfyUI 中整合 Qwen2-Audio-7B-Instruct-Int4,支持音频与文本查询并生成字幕、描述或指令式回复,便于多模态工作流调用。
💡 将语音或文字查询转为字幕、描述或问答回复,用于多模态工作流。
🍴 2 Forks💻 JavaScript🔄 2025-04-02
📦
网盘下载
复制链接后前往夸克网盘下载
https://pan.quark.cn/s/a1f1f564f19c
📦 requirements.txt
torch
huggingface_hub
bitsandbytes
librosa
transformers>=4.45.0
Chat_with_text_workflow preview
Chat_with_audio_workflow preview
📄 README

ComfyUI_Qwen2-Audio-7B-Instruct-Int4

This is an implementation of Qwen2-Audio-7B-Instruct-Int4 by ComfyUI, including support for text-based queries and audio queries to generate captions or responses.


Basic Workflow

  • Text-based Query: Users can submit textual queries to request information or generate descriptions. For instance, a user might input a description like “What is the meaning of life?”
  • Audio Query: When a user uploads an audio file, the system can analyze the content and generate a detailed caption or a summary of the entire audio. For example, “Tell me what you hear in this audio clip.”
  • Installation

  • Install from ComfyUI Manager (search for Qwen2)
  • Download or git clone this repository into the ComfyUI\custom_nodes\ directory and run:
  • pip install -r requirements.txt

    Download Models

    All the models will be downloaded automatically when running the workflow if they are not found in the ComfyUI\models\prompt_generator\ directory.