How are tasks assigned to ForEach iterations

All we need is an easy explanation of the problem, so here it is.

The Lookup-ForEach pattern is common in Azure Data Factory (ADF). How are items produced by the Lookup allocated to the ForEach’s workers, the number of which is controlled by Batch Count?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

They are allocated round-robin, in the order they are produced by the Lookup. Although I can find no documentation to assert this, it is my observation. I can reliably reproduce it with a simple example.

To a new pipeline I added an array variable, a ForEach, and inside the ForEach a Wait (pipeline JSON is included at the end). The array variable feeds the items of the ForEach. Although the question mentions Lookup the result is the same when an array variable is used. The Wait’s duration is determined by the array’s values, simply to spread the itterations over time to make them more easily observable.

I ran the pipeline and collected the ADF output. I used the Duration to calcualte an End time (shown below in mm:ss). I added three further columns, one for each "worker". To understand which worker executed which itteration I followed the Start-End chains.

Task       Start                         Duration  End    A  B  C
========== ============================  ========  =====  =  =  =
9 Wait1    2021-06-01T05:17:39.6100287Z  00:01:32  19:11        c
8 Wait1    2021-06-01T05:17:09.3290518Z  00:01:32  18:41     b
7 Wait1    2021-06-01T05:16:40.6095309Z  00:01:32  18:12  a
6 Wait1    2021-06-01T05:16:07.9623704Z  00:01:32  17:39        c
5 Wait1    2021-06-01T05:15:37.7842636Z  00:01:32  17:09     b
4 Wait1    2021-06-01T05:15:07.9421207Z  00:01:32  16:39  a
3 Wait1    2021-06-01T05:15:06.3028589Z  00:01:02  16:08        c
2 Wait1    2021-06-01T05:15:06.3028589Z  00:00:32  15:38     b
1 Wait1    2021-06-01T05:15:06.2872305Z  00:00:02  15:08  a
  ForEach1 2021-06-01T05:15:05.896518Z   00:04:06

I arbitrarily assign Task 1 to worker A. It finished at 15:08 so I find a task that started then (within the precision of the reported durations) which is task 4 and assign it to worker A also, and so on. Tasks 2 and 3 start their own chains similarly. The resulting pattern is that of round-robin allocation.

This analysis on much more complicated workloads, involving many more tasks, larger Batch Counts and varying durations, produces the same evidence for round-robin allocation. For example, serving 21 tasks when Batch Count is 20 I see the 21st task never starts until the 1st one completes.

{
    "name": "ForEach RoundRobin Demo",
    "properties": {
        "activities": [
            {
                "name": "ForEach1",
                "type": "ForEach",
                "dependsOn": [],
                "userProperties": [],
                "typeProperties": {
                    "items": {
                        "value": "@variables('Intervals')",
                        "type": "Expression"
                    },
                    "batchCount": 3,
                    "activities": [
                        {
                            "name": "Wait1",
                            "type": "Wait",
                            "dependsOn": [],
                            "userProperties": [],
                            "typeProperties": {
                                "waitTimeInSeconds": {
                                    "value": "@item()",
                                    "type": "Expression"
                                }
                            }
                        }
                    ]
                }
            }
        ],
        "variables": {
            "Intervals": {
                "type": "Array",
                "defaultValue": [
                    1,
                    30,
                    60,
                    180,
                    180,
                    180,
                    180,
                    180,
                    180
                ]
            }
        },
        "annotations": []
    }
}

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply