ComfyUI_OmniParser

ComfyUI_OmniParser
★ 39

屏幕解析视觉GUI代理OmniParser依赖兼容
在ComfyUI中集成OmniParser,实现基于视觉的屏幕界面解析,便于构建纯视觉的GUI代理;注意ultralytics版本兼容性。
💡 把屏幕图像解析为供视觉GUI智能体理解的信息。
🍴 2 Forks💻 Python🔄 2025-03-12
📦
网盘下载
复制链接后前往夸克网盘下载
https://pan.quark.cn/s/c1eafc754fbb
📦 requirements.txt
#torch
easyocr
#torchvision
#supervision==0.18.0
supervision
#openai==1.3.5
#transformers
ultralytics
#azure-identity
#numpy
opencv-python
opencv-python-headless
#gradio
dill
#accelerate
#timm
#einops==0.8.0
#paddlepaddle
#paddleocr
📄 README

ComfyUI_OmniParser

Try OmniParser in ComfyUI which a simple screen parsing tool towards pure vision based GUI agent.


Notice 2024/12/06

  • 因为这个方法调用了ultralytics库,所以如果你在2024/12/04-12/05更新了ultralytics或者安装了ultralytics,请务必检查安装的ultralytics版本是否是8.3.41版本,如果是,请及时删除。查看方法:pip
  • show ultralytics

  • Because this method calls the ‘ultralytics’ library, if you updated ‘ultralytic’s or installed ‘ultralytics‘ on December 5, 2024 or December 5, 2024 , please make sure to check if the installed version of ‘ultralytics’ is 8.3.41. If so, please delete it!! Viewing method: pip show ultralytics
  • 1.Installation


    In the ./ComfyUI /custom_node directory, run the following:

    git clone https://github.com/smthemex/ComfyUI_OmniParser.git


    2.requirements


    pip install -r requirements.txt
    


    3.Checkpoints


    huggingface-OmniParser


    4.Example



    5.Citation


    microsoft/OmniParser

    @misc{lu2024omniparserpurevisionbased,
          title={OmniParser for Pure Vision Based GUI Agent}, 
          author={Yadong Lu and Jianwei Yang and Yelong Shen and Ahmed Awadallah},
          year={2024},
          eprint={2408.00203},
          archivePrefix={arXiv},
          primaryClass={cs.CV},
          url={https://arxiv.org/abs/2408.00203}, 
    }

    Some codes form # @aliencaocao