posted 4 years ago
For extremely large volume of JSON data which has to be processed using batch processing, the data will be having customer, equipmentId and an array of values for various attributes. Each customer will have multiple equipments and and each equipment will have this array. This will be computed against threshold conditions stored in database for some of the attributes of an equipment of a customer. The result will be true or false based on whether condition passes or not. If the result is true, it will call a REST API.
Mapper - Read the input and form the key value pairs to be emitted from the mapper as below:
Key to be emitted from mapper : combination of customerId and equipmentId.
Value to be emitted from mapper: the array of attributes.
Reducer: For each key (customerId, equipmentId combination), do the below:
Fetch its list of conditions from the database and compute results (true/false). If the result is true, call the REST API.
Is this the right way to implement this hadoop map reduce program?
Thanks.