ComfyUI_Qwen2-Audio-7B-Instruct-Int4

★ 16

音频理解多模态指令式生成Int4量化

在 ComfyUI 中整合 Qwen2-Audio-7B-Instruct-Int4，支持音频与文本查询并生成字幕、描述或指令式回复，便于多模态工作流调用。

💡 将语音或文字查询转为字幕、描述或问答回复，用于多模态工作流。

🍴 2 Forks💻 JavaScript🔄 2025-04-02

🔗 GitHub 原文

📦

网盘下载

复制链接后前往夸克网盘下载

https://pan.quark.cn/s/a1f1f564f19c

📦 requirements.txt

torch
huggingface_hub
bitsandbytes
librosa
transformers>=4.45.0

📄 README

ComfyUI_Qwen2-Audio-7B-Instruct-Int4

This is an implementation of Qwen2-Audio-7B-Instruct-Int4 by ComfyUI, including support for text-based queries and audio queries to generate captions or responses.

Basic Workflow

Text-based Query: Users can submit textual queries to request information or generate descriptions. For instance, a user might input a description like “What is the meaning of life?”

Audio Query: When a user uploads an audio file, the system can analyze the content and generate a detailed caption or a summary of the entire audio. For example, “Tell me what you hear in this audio clip.”

Installation

Install from ComfyUI Manager (search for Qwen2)

Download or git clone this repository into the ComfyUI\custom_nodes\ directory and run:

pip install -r requirements.txt

Download Models

All the models will be downloaded automatically when running the workflow if they are not found in the ComfyUI\models\prompt_generator\ directory.