Chunking
We recommend some implementations of commonly used text chunking approaches.
Text Chunking is more a micro optimization, compared to the Flow Design.
It's recommended to start with the Naive Chunking and optimize later.
Example Python Code Samples
1. Naive (Fixed-Size) Chunking
Splits text by a fixed number of words, ignoring sentence or semantic boundaries.
However, sentences are often cut awkwardly, losing coherence.
2. Sentence-Based Chunking
However, might not handle very long sentences or paragraphs well.
3. Other Chunking
Paragraph-Based: Split text by paragraphs (e.g., newlines). Large paragraphs can create big chunks.
Semantic: Use embeddings or topic modeling to chunk by semantic boundaries.
Agentic: Use an LLM to decide chunk boundaries based on context or meaning.
Last updated