so i found myself making another RAG bot (for the 2342148th time) and meanwhile, explaining to my juniors about why we should use chunking in our RAG

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2024-11-10 16:30:05

so i found myself making another RAG bot (for the 2342148th time) and meanwhile, explaining to my juniors about why we should use chunking in our RAG bots, only to realise that i would have to write chunking all over again unless i use the bloated software library X or the extremely feature-less library Y. WHY CAN I NOT HAVE SOMETHING JUST RIGHT, UGH?

šŸš€ Feature-rich: All the CHONKs you'd ever need āœØ Easy to use: Install, Import, CHONK āš” Fast: CHONK at the speed of light! zooooom šŸŒ Wide support: Supports all your favorite tokenizer CHONKS šŸŖ¶ Light-weight: No bloat, just CHONK šŸ¦› Cute CHONK mascot: psst it's a pygmy hippo btw ā¤ļø Moto Moto's favorite python library

Chonkie follows the rule to have minimal defualt installs, read the DOCS to know the installation for your required chunker, or simply install all if you don't want to think about it (not recommended).

Chonkie provides several chunkers to help you split your text efficiently for RAG applications. Here's a quick overview of the available chunkers:

Leave a Comment