A new open-source alternative to Tesseract for document processing, named Surya, has been released, boasting accurate line-level text detection and compatibility with both CPU and GPU, enhancing processing of PDFs with complex layouts for LLMs. Concurrently, LlamaCloud, utilized by MendableAI for PDF parsing, has been launched. Additionally, LlamaParse, a tool developed by Jerry Liu and LlamaIndex, in conjunction with RAGStack, now enables developers to convert intricate PDFs into vectors swiftly. LlamaParse has been praised for its tabular extraction capabilities and cost-effectiveness compared to Adobe's offering. LlamaIndex also introduced a stack combining LlamaParse with AstraDB for advanced document processing, including parsing, indexing into a vector database, and recursive retrieval with LlamaIndex and RAGStack.
Stack for Advanced RAG over Complex PDFs 📚: LlamaParse + @AstraDB (@DataStax) Excited to share a collection of cookbooks showing you how to 1) parse a complex PDF with LlamaParse, 2) index it into a vector database (Astra), and 3) run @llama_index recursive retrieval to answer… https://t.co/TvH5Hp62dx
Llamaparse by @llama_index helps fill a hole in the document processing space: the open source PDF parsers don’t cut it and Adobe’s one is super pricey https://t.co/9L8RYe2DwP Great tabular extraction. Does it do vector images though?
Cheers to Jerry Liu & LlamaIndex for launching LlamaParse today! With LlamaIndex and RAGStack, developers can now convert intricate PDFs into vectors within minutes. Check it out --> https://t.co/6LwLDPWBrn #LlamaParse #RAGStack #Python #GenAI #LlamaIndex
LlamaCloud is out now! We use it to power @mendableai's PDF parsing 🚀 https://t.co/Xfpn6gtvJj
Open-source Tesseract alternative released for document processing 💯 Providing accurate line-level text detection, and also Tesseract is CPU-based, and surya is CPU or GPU. ✨ Should be great for LLMs processing pdfs with complex layouts that require conversion into text… https://t.co/NLbnwFL27O