Map Reduce
Last updated
Last updated
The MapReduce pattern is a powerful way to process large datasets by breaking down a complex task into two main phases: Map (processing individual items) and Reduce (aggregating results). BrainyFlow provides the perfect abstractions to implement this pattern efficiently, especially with its ParallelFlow
for concurrent mapping.
Trigger Node:
Role: Triggers the fan-out to Mapper Nodes from a list of items.
Implementation: A standard Node
whose post
method iterates through the input collection and calls this.trigger()
for each item, passing item-specific data as forkingData
.
Mapper Node:
Role: Takes a single item from the input collection and transforms it.
Implementation: A standard Node
that receives an individual item (often via forkingData
in its local memory) and performs a specific computation (e.g., summarizing a document, extracting key-value pairs).
Output: Stores the processed result for that item in the global Memory
object, typically using a unique key (e.g., memory.results[item_id] = processed_data
).
Reducer Node:
Role: Collects and aggregates the results from all Mapper Nodes.
Implementation: A standard Node
that reads all individual results from the global Memory
and combines them into a final output. This node is usually triggered after all Mapper Nodes have completed.
We use a ParallelFlow
for the mapping phase for performance gain.
Let's create a flow to summarize multiple text files using the pattern we just created.
This node will summarize a single file.
This node will combine all individual summaries.
This example demonstrates how to implement a MapReduce pattern using BrainyFlow, leveraging ParallelFlow
for concurrent processing of the map phase and the Memory
object for collecting results before the reduce phase.
For simplicity, these will be overly-simplified mock tools/nodes. For a more in-depth implementation, check the implementations in our cookbook for - more TypeScript examples coming soon (!).