#torch easyocr #torchvision #supervision==0.18.0 supervision #openai==1.3.5 #transformers ultralytics #azure-identity #numpy opencv-python opencv-python-headless #gradio dill #accelerate #timm #einops==0.8.0 #paddlepaddle #paddleocr
Try OmniParser in ComfyUI which a simple screen parsing tool towards pure vision based GUI agent.
Notice 2024/12/06
show ultralytics
1.Installation
In the ./ComfyUI /custom_node directory, run the following:
git clone https://github.com/smthemex/ComfyUI_OmniParser.git
2.requirements
pip install -r requirements.txt
3.Checkpoints
4.Example
5.Citation
microsoft/OmniParser
@misc{lu2024omniparserpurevisionbased,
title={OmniParser for Pure Vision Based GUI Agent},
author={Yadong Lu and Jianwei Yang and Yelong Shen and Ahmed Awadallah},
year={2024},
eprint={2408.00203},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.00203},
}
Some codes form # @aliencaocao