ComfyUI_pixtral_large

ComfyUI_pixtral_large
★ 21

多模态理解大规模模型多语言OCR高分辨率批量处理
在ComfyUI中集成Mistral的Pixtral Large(124B),提供多模态图像理解、128K上下文、批量高分辨率处理、多语言OCR与可调参数。
💡 在ComfyUI中批量分析高分辨率图片并生成多语言描述
🍴 5 Forks💻 Python🔄 2025-07-21
📦
网盘下载
复制链接后前往夸克网盘下载
https://pan.quark.cn/s/8f9eee5e2cdb
📦 requirements.txt
requests
Pillow
pixtral large 1
pixtral large 3
📄 README

ComfyUI Pixtral Large Extension

A ComfyUI custom node that integrates Mistral AI’s Pixtral Large vision model, enabling powerful multimodal AI capabilities within ComfyUI. Pixtral Large is a 124B parameter model (123B decoder + 1B vision encoder) that can analyze up to 30 high-resolution images simultaneously.

Features

  • 🖼️ Process up to 30 high-resolution images in a single request
  • 🧠 Leverages Pixtral Large’s 124B parameter architecture
  • 📝 Generate detailed descriptions and analysis of images
  • 📊 Support for documents, charts, and natural images
  • 🌐 128K context window for extensive image processing
  • 🔤 Multilingual capabilities including:
  • English
  • Hebrew (עברית)
  • Arabic (العربية)
  • Chinese (中文)
  • Japanese (日本語)
  • Korean (한국어)
  • And many more languages
  • 📚 Advanced OCR in multiple languages and scripts
  • 🛠️ Customizable parameters for fine-tuned responses
  • Installation

  • Clone this repository into your ComfyUI’s custom_nodes directory:
  • cd ComfyUI/custom_nodes
    https://github.com/ShmuelRonen/ComfyUI_pixtral_large.git
  • Restart ComfyUI
  • Included Nodes

    The extension adds three powerful nodes to ComfyUI:

    1. Pixtral Large

    Main node for image analysis using Pixtral Large.

    Parameters:

  • prompt: Your query about the image(s) – can be in any supported language
  • images: Input images to analyze
  • api_key: Your Mistral AI API key
  • temperature: Response randomness (0.0-1.5)
  • maximum_tokens: Max response length (1-32768)
  • top_p: Nucleus sampling parameter (0.0-1.0)
  • Use Cases:

  • Image analysis and description
  • Document text extraction
  • Chart and graph interpretation
  • Mathematical reasoning
  • Cross-lingual image understanding
  • 2. Multi Images Input

    Specialized node for combining multiple images into a batch for analysis.

    Parameters:

  • inputcount: Number of image inputs (2-30)
  • Dynamic image inputs (generated based on inputcount)
  • Optional parameters for each image slot
  • Features:

  • Automatic batch creation
  • Support for up to 30 simultaneous images
  • Compatible with all ComfyUI image outputs
  • Maintains image quality and resolution
  • Efficient memory handling
  • Use Cases:

  • Batch document processing
  • Multiple page analysis
  • Comparative image analysis
  • Sequential image storytelling
  • Before/after image analysis
  • 3. Preview Text

    Advanced text output display node for viewing Pixtral Large results.

    Parameters:

  • text: Input text to display (automatically connected to Pixtral Large output)
  • Dynamic sizing
  • Auto-formatting
  • Features:

  • RTL language support
  • Unicode text display
  • Formatted output
  • Multi-paragraph handling
  • Supports all languages
  • Copy-paste functionality
  • Use Cases:

  • Displaying analysis results
  • Debugging outputs
  • Text verification
  • Intermediate result inspection
  • Documentation generation
  • Node Connections and Workflow Examples

    Basic Single Image Analysis

    graph LR
        A[Load Image] --> B[Pixtral Large]
        B --> C[Preview Text]

    Multi-Image Analysis

    graph LR
        A[Load Image 1] --> C[Multi Images Input]
        B[Load Image 2] --> C
        C --> D[Pixtral Large]
        D --> E[Preview Text]

    Complex Document Analysis

    graph LR
        A[Load Image 1] --> D[Multi Images Input]
        B[Load Image 2] --> D
        C[Load Image 3] --> D
        D --> E[Pixtral Large]
        E --> F[Preview Text]

    Multilingual Capabilities

    Pixtral Large offers robust multilingual support for both input and output:

    Text Recognition (OCR)

  • Recognizes text in multiple scripts and languages
  • Particularly strong with:
  • Hebrew (עברית) – including modern and historical texts
  • Latin scripts
  • CJK characters (Chinese, Japanese, Korean)
  • Arabic script
  • Cyrillic
  • And more
  • Analysis and Response

  • Can understand prompts in multiple languages
  • Provides responses in the same language as the prompt
  • Handles mixed-language content effectively
  • Accurate translation and transcription capabilities
  • Example Usage

    # Hebrew prompt example
    prompt = "תאר את התמונה בעברית"
    
    # Mixed language example
    prompt = "Analyze this image and provide the response in Hebrew (עברית)"

    Getting the free API Key

  • Visit Mistral AI and sign up or log into your account.
  • Navigate to the API section and follow the instructions to generate a new API key.
  • Once you have your API key, enter it into the node configuration as described in the setup instructions.
  • Error Handling

    Common error messages and solutions:

    Multi Images Input Errors

  • “At least 2 images are required”: Add more images to input slots
  • “Exceeded maximum image count”: Reduce number of input images to 30 or less
  • “Invalid image format”: Ensure images are in supported format
  • Pixtral Large Errors

  • “API Error”: Verify API key and internet connection
  • “Invalid prompt”: Check prompt formatting
  • “Token limit exceeded”: Reduce maximum_tokens parameter
  • Preview Text Errors

  • “Unicode decode error”: Check text encoding
  • “Display buffer full”: Reduce output size
  • Contributing

    Contributions are welcome! Please feel free to submit a Pull Request.

    License

    This project is licensed under the MIT License – see the LICENSE file for details.

    Acknowledgments

  • Thanks to Mistral AI for providing the Pixtral Large model
  • Built for the ComfyUI community
  • Version History

  • 1.0.0: Initial release
  • Full node suite implementation
  • Multi-image support
  • Multilingual capabilities including Hebrew
  • Advanced text preview features
  • Support

    If you encounter any issues or have questions:

  • Check the Issues page
  • Create a new issue if needed