Paralleling
Paralleling in general refers to the technique of running multiple tasks or operations simultaneously. Consequently, parallel execution is taking advantage of multi-core processors to speed up whole process. Especially, this approach is really useful when dealing with IO tasks, time-consuming tasks or handling large datasets. Paralleling is included to PowerShell.
Types of Paralleling in PowerShell
In PowerShell 7 there are two commonly used approaches to paralleling:
- Job
- ForEach-Object -Parallel
First one is well known approach to deal with simultaneously processing. It allows to process the block of the code in the background.
Second one is a new kid in the block. It allows to add for foreach-object
parallelism by adding one keyword. Consequently, it improves readability and allows to refactor the existing code quite simple. Additionally, by adding another keyword there is an option to create a pool
of processes with is more than welcome in critical systems where resource consumption has a limits.
Case: ForEach-Object -Parallel
Let me look at the second type because is new and rally useful.
The typical syn
$object | ForEach-Object { code_block }
And example with output:
$numbers = @("One", "Two", "Three", "Four") $numbers | ForEach-Object { "Number: $_!" } Number: One! Number: Two! Number: Three! Number: Four!
Here one question comes in: what about the case when code inside loop is not just printing the variable but some time consuming task which wait for IO (disk, network) or need a bit more memory.
$startTime = Get-Date $webs = @("https://www.google.com/", "https://www.facebook.com/", "https://twitter.com/", "https://stackoverflow.com/") $webs | ForEach-Object { "Get web: $_!" Invoke-RestMethod -Uri $_ -Method get -MaximumRetryCount 5 } $executionTime = (Get-Date) - $startTime $executionTime.TotalSeconds
The time in my case is: 2.2057898 sec. Not too much however it takes some noticeable amount of time. Getting one page consumes some time but is quite fast to have a filling that it is performing in real-time. When the script is performing step by step some operation those small steps added up to each other. At the end of the day there is a lag. Paralleling to the rescue!
Code
The code is quote simple. Only one word is enough.
$object | ForEach-Object -Parallel { code_block }
Code
Below is refactored code. Surprisingly there is only one change, one word: -Parallel
$startTime = Get-Date $webs = @("https://www.google.com/", "https://www.facebook.com/", "https://twitter.com/", "https://stackoverflow.com/") $webs | ForEach-Object -Parallel { "Get web: $_!" Invoke-RestMethod -Uri $_ -Method get -MaximumRetryCount 5 } $executionTime = (Get-Date) - $startTime $executionTime.TotalSeconds
And result is: 0.9028436 sec
Pool
In case where is a long list to be processed, paralleling presented above can lead to resource depletion (memory, CPU) what's why it's good to keep paralleling in the check. The best approach in that cases is pool of treads
. It allows to keep balance between speed and reliability.
There is only one change needs to be added to to activate pooling -ThrottleLimit <int>
.
Below code keeps in check the parallelism. In consequence, only 3 execution are being performed at the same time.
$startTime = Get-Date $webs = @("https://www.google.com/", "https://www.facebook.com/", "https://twitter.com/", "https://stackoverflow.com/") $webs | ForEach-Object -Parallel { "Get web: $_!" Invoke-RestMethod -Uri $_ -Method get -MaximumRetryCount 5 } -ThrottleLimit 3 $executionTime = (Get-Date) - $startTime $executionTime.TotalSeconds
Variables
One of the most challenging problems in the PowerShell paralleling execution is a variable sharing between contexts. Once there is -Paralle
l used in ForEach-Object
loop the code inside is being executed in new shell (context) without access to variables and functions defined outside the loop.
In below example $var
variable is unknown and that line prints empty line.
$var = "val" $numbers = @("One", "Two", "Three", "Four") $numbers | ForEach-Object -Parallel { "Number: $_!" $var }
To fix that issue there is mechanism to pass that variables to inner context. $using
is a key. Here is example
$var = "val" $numbers = @("One", "Two", "Three", "Four") $numbers | ForEach-Object -Parallel { "Number: $_!" $using:var }
The same story refers to functions. Despite being feasible, this introduces a lot of complexity in the code.
function Nice-Function { Param ($A) Write-Host "Nice: [$A]" } $funcStr = ${function:Nice-Function}.ToString() $numbers = @("One", "Two", "Three", "Four") $numbers | ForEach-Object -Parallel { ${function:Nice-Function} = $using:funcStr Nice-Function $_ }
For me is too much $using
and ToString()
. In consequence, readability is on low level. Additionally, it's quite easy to make mistake. How to overcome that issue? Move stuff to module and import it inside loop context. This is just one line and all function are available.
$numbers = @("One", "Two", "Three", "Four") $numbers | ForEach-Object -Parallel { Import-Module '.\Path\Where\Nice-Function\is\definded\module.psm1' Nice-Function $_ }
It works like a charm!
Summary of Paralleling in PowerShell
In PowerShell, "paralleling" refers to the concept of executing multiple tasks simultaneously, taking advantage of the multi-core capabilities of modern processors. There are many approaches to instruct PowerShell to perform code simultaneously. For instance, ForEach-Object Parallelism, which was the subject of this post.