Chunking
Last updated
Last updated
BrainyFlow does NOT provide built-in utilities
Instead, we offer examples that you can implement yourself. This approach gives you over your project's dependencies and functionality.
We recommend some implementations of commonly used text chunking approaches.
Text Chunking is more a micro optimization, compared to the Flow Design.
It's recommended to start with the Naive Chunking and optimize later.
Splits text by a fixed number of characters (not words, as the Python example implies), ignoring sentence or semantic boundaries.
However, sentences are often cut awkwardly, losing coherence.
Groups a fixed number of sentences together. Requires a sentence tokenizer library.
However, might not handle very long sentences or paragraphs well.
Paragraph-Based: Split text by paragraphs (e.g., newlines). Large paragraphs can create big chunks.
Semantic: Use embeddings or topic modeling to chunk by semantic boundaries.
Agentic: Use an LLM to decide chunk boundaries based on context or meaning.