How to log current batch number and increment it in pipeline executor when using row grouping? #6566
-
|
Hello Hop Community, I have a usecase where source system gives me records in huge batches. The target system accepts smaller batch size, so I need to chunk src records into smaller batches with the chunk/batch size = max allowed by target system. Eg. SRC gives 1k records, TGT accepts 200 records, so I need to create 5 batches as folllowing tgt_batch_1.json, tgt_batch_2.json, ....etc. I was able to create smaller batches of the data using pipeline executor row grouping as explained in How can I do this? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 6 replies
-
|
Your best bet would be using some transforms before entering the Pipeline Executor loop:
The output of the Calculator will be your batch number. And you can use it to group the rows for the Pipeline Executor, too ;) Hope this helps :) |
Beta Was this translation helpful? Give feedback.
Hi @ossDataEngineer,
I had a quick look of your pipelines/workflow.
First, you are not implementing the CURRENT_BATCH_COUNT as described in the previous post. The value of the counter must be input from the child pipeline, not calculated into the grandchild one. If you need to use that as a "variable", it actually has to be passed as a parameter in the grandchild pipeline.
Regarding the SUCCESS_ and FAILED_RECORD_COUNT, the behaviour is coherent with the fact that you're not returning the variable to the workflow after setting it, so it is reset at each iteration. To get what you need, I would proceed like this: