So, let’s imagine that we want a topology that generates random messages, where the ones that contain the letter ‘z’ are filtered out, and it keeps aggregating the messages for 3 seconds and prints them.
Apache Storm is a free and open-source distributed realtime computation system, which provides: fault-tolerance, scalability, guarantees data processing, and processes extremely good unbound streams of data. Storm can integrate with distinct databases and queuing systems (SQL and NoSQL databases, RabbitMQ, Kafka, and others). You can see more information here : https://storm.apache.org/
Before starting the implementation, some definitions about Apache Storm must be clear. The data model consists of two elements:
For implementing a topology, which is nothing more than a set of spouts and bolts that represents the logic of real-time Storm application, a structure must be defined, as the next one for the topology that is going to be developed.
Even though a spout can be used to read for external sources and it can be reliable (the spout can reply that the tuple that has failed to be processed by Storm), for this example, our spout will internally generate messages, using a RandomMessageUtil, each second, and it will be unreliable (the spout doesn’t reply since it is going to use a fire-and-forget mechanism to emit the tuples).