OmniParse is a platform that ingests/parses any unstructured data into structured, actionable data optimized for GenAI (LLM) applcaitons. Whether work

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2024-07-02 09:00:04

OmniParse is a platform that ingests/parses any unstructured data into structured, actionable data optimized for GenAI (LLM) applcaitons. Whether working with documents, tables, images, videos, audio files, or web pages, OmniParse prepares your data to be clean, structured and ready for AI applications, such as RAG , fine-tuning and more.

✅ Completely local, no external APIs ✅ Fits in a T4 GPU ✅ Supports ~20 file types ✅ Convert documents, multimedia, and web pages to high-quality structured markdown ✅ Table extraction, image extraction/captioning, audio/video transcription, web page crawling ✅ Easily deployable using Docker and Skypilot ✅ Colab friendly ✅ Interative UI powered by Gradio \

It's challenging to process data as it comes in different shapes and sizes. OmniParse aims to be an ingestion/parsing platform where you can ingest any type of data, such as documents, images, audio, video, and web content, and get the most structured and actionable output that is GenAI (LLM) friendly.

Note: The server only works on Linux-based systems. This is due to certain dependencies and system-specific configurations that are not compatible with Windows or macOS. To install OmniParse, you can use pip:

Leave a Comment