Paralleling in PowerShell

parallel

Paralleling

Paralleling in general refers to the technique of running multiple tasks or operations simultaneously. Consequently, parallel execution is taking advantage of multi-core processors to speed up whole process. Especially, this approach is really useful when dealing with IO tasks, time-consuming tasks or handling large datasets. Paralleling is included to PowerShell.

Types of Paralleling in PowerShell

In PowerShell 7 there are two commonly used approaches to paralleling:

  • Job
  • ForEach-Object -Parallel

First one is well known approach to deal with simultaneously processing. It allows to process the block of the code in the background.

Second one is a new kid in the block. It allows to add for foreach-object parallelism by adding one keyword. Consequently, it improves readability and allows to refactor the existing code quite simple. Additionally, by adding another keyword there is an option to create a pool of processes with is more than welcome in critical systems where resource consumption has a limits.

Case: ForEach-Object -Parallel

Let me look at the second type because is new and rally useful.

The typical syn

$object | ForEach-Object {
    code_block
}

And example with output:

$numbers = @("One", "Two", "Three", "Four")
 $numbers | ForEach-Object {
 "Number: $_!"
}

Number: One!
Number: Two!
Number: Three!
Number: Four!

Here one question comes in: what about the case when code inside loop is not just printing the variable but some time consuming task which wait for IO (disk, network) or need a bit more memory.

$startTime = Get-Date
$webs = @("https://www.google.com/", "https://www.facebook.com/", "https://twitter.com/", "https://stackoverflow.com/")
$webs | ForEach-Object {
    "Get web: $_!"
	Invoke-RestMethod -Uri $_ -Method get -MaximumRetryCount 5
}
$executionTime = (Get-Date) - $startTime
$executionTime.TotalSeconds

The time in my case is: 2.2057898 sec. Not too much however it takes some noticeable amount of time. Getting one page consumes some time but is quite fast to have a filling that it is performing in real-time. When the script is performing step by step some operation those small steps added up to each other. At the end of the day there is a lag. Paralleling to the rescue!

Code

The code is quote simple. Only one word is enough.

$object | ForEach-Object -Parallel {
    code_block
}

Code

Below is refactored code. Surprisingly there is only one change, one word: -Parallel

$startTime = Get-Date
$webs = @("https://www.google.com/", "https://www.facebook.com/", "https://twitter.com/", "https://stackoverflow.com/")
$webs | ForEach-Object -Parallel {
    "Get web: $_!"
	Invoke-RestMethod -Uri $_ -Method get -MaximumRetryCount 5
}
$executionTime = (Get-Date) - $startTime
$executionTime.TotalSeconds

And result is: 0.9028436 sec

Pool

In case where is a long list to be processed, paralleling presented above can lead to resource depletion (memory, CPU) what's why it's good to keep paralleling in the check. The best approach in that cases is pool of treads. It allows to keep balance between speed and reliability.

There is only one change needs to be added to to activate pooling -ThrottleLimit <int>.

Below code keeps in check the parallelism. In consequence, only 3 execution are being performed at the same time.

$startTime = Get-Date
$webs = @("https://www.google.com/", "https://www.facebook.com/", "https://twitter.com/", "https://stackoverflow.com/")
$webs | ForEach-Object -Parallel {
    "Get web: $_!"
	Invoke-RestMethod -Uri $_ -Method get -MaximumRetryCount 5
} -ThrottleLimit 3
$executionTime = (Get-Date) - $startTime
$executionTime.TotalSeconds

Variables

One of the most challenging problems in the PowerShell paralleling execution is a variable sharing between contexts. Once there is -Parallel used in ForEach-Object loop the code inside is being executed in new shell (context) without access to variables and functions defined outside the loop.

In below example $var variable is unknown and that line prints empty line.

$var = "val"
$numbers = @("One", "Two", "Three", "Four")
$numbers | ForEach-Object  -Parallel {
 "Number: $_!"
 $var
}

To fix that issue there is mechanism to pass that variables to inner context. $using is a key. Here is example

$var = "val"
$numbers = @("One", "Two", "Three", "Four")
$numbers | ForEach-Object  -Parallel {
 "Number: $_!"
 $using:var
}

The same story refers to functions. Despite being feasible, this introduces a lot of complexity in the code.

function Nice-Function {
  Param ($A)
  Write-Host "Nice: [$A]"
}
$funcStr = ${function:Nice-Function}.ToString()
$numbers = @("One", "Two", "Three", "Four")
$numbers | ForEach-Object -Parallel {
 ${function:Nice-Function} = $using:funcStr
 Nice-Function $_
}

For me is too much $using and ToString(). In consequence, readability is on low level. Additionally, it's quite easy to make mistake. How to overcome that issue? Move stuff to module and import it inside loop context. This is just one line and all function are available.

$numbers = @("One", "Two", "Three", "Four")
$numbers | ForEach-Object -Parallel {
 Import-Module '.\Path\Where\Nice-Function\is\definded\module.psm1'
 Nice-Function $_
}

It works like a charm!

Summary of Paralleling in PowerShell

In PowerShell, "paralleling" refers to the concept of executing multiple tasks simultaneously, taking advantage of the multi-core capabilities of modern processors. There are many approaches to instruct PowerShell to perform code simultaneously. For instance, ForEach-Object Parallelism, which was the subject of this post.

More

Create Release

Update of Kubernetes secrets

Release of PowerShell

Leave a Reply

Your email address will not be published. Required fields are marked *