Skip to main content

Making array lookups faster



powershellThis post is about making lookups in arrays as fast as possible. The array can have may properties or few, it really does not matter. The only thing required is something unique that identifies each row of data.
So from time to time I find the need to make lookups fast. Usually it is a result of importing a huge csv file or something.



Sample data

First we have to create some dummy sample data which we can run some tests against. We will create an array of 10001 objects with a few properties. The unique property that identifies each row is called ID:


(sample data script)



How to test performance?


There are a couple of items that impact performance in Powershell. For instance running a Measure-Command expression will yield quite different results. Normally the first run is slower than the second one and then the standard deviation is quite large for consequent runs. To decreate the standard deviation, I use a static call to the .net GarbageCollector with [gc]::Collect(). I feel that the results are more comparable with this approach.



First contender Where-Object

There are two ways you can query an array with the Where keyword. You can pipe the array to the Where-Object cmdlet or you can use the Where method on the array. The where method will always be faster that the cmdlet/pipline approach since you save moving the objects through the pipeline. For our test, we will therefor use the where method as the base which we measure the performance against.
We are going to run 11 different queries and find 2 unique elements in the array. The time measured will be ticks. I have created an collections of IDs which we will use when we query the data ($CollectionOfIDs):


(Measure the Where method)

image

That is about 85ms on average to query the collection for two unique IDs. Base line ready.



There is a fast knock at the door

We have a new contender and he calls himself Hashtable. He claims he can do even better that 85ms on average. Challenge accepted.
First we need to create a hashtable representation of the $csvObjects collection/array. That should be pretty straight forward. We let the unique identifier (ID) become the key and the object itself the value:

(hashtable of csv)

Now I know you have a question. What is the performance penalty of converting that array to a hashtable? Good question and I am happy you asked. It converts the 10000 objects into an hashtable in apx 53 milliseconds:

image

I would say that is a small price to pay.
Using the same ($CollectionOfIDs) as we did for the where method, let’s run the same test against the hashtable:

(Measure the hashtable)

image

Okay, so the first one is quite slow about 11ms, however it improves quite dramatically to 0.038ms. I we use the average numbers (in ticks) to be fair, we have increased the performance with a factor of 649 (837265 / 1289).



Implications

I have only tested this on WMF 5.1 (5.1.14393.103). To use the Where query method on arrays, you need version 4 or later. Converting the collection to an hashtable will give you the ability to perform super fast queries. If you are querying a collection frequently, it makes sense to use hashtable.


Code for speed if you need it, otherwise write beautiful code!

Cheers

Tore

Comments

Popular posts from this blog

Serialize data with PowerShell

Currently I am working on a big new module. In this module, I need to persist data to disk and reprocess them at some point even if the module/PowerShell session was closed. I needed to serialize objects and save them to disk. It needed to be very efficient to be able to support a high volume of objects. Hence I decided to turn this serializer into a module called HashData.



Other Serializing methods

In PowerShell we have several possibilities to serialize objects. There are two cmdlets you can use which are built in:
Export-CliXmlConvertTo-JSON
Both are excellent options if you do not care about the size of the file. In my case I needed something lean and mean in terms of the size on disk for the serialized object. Lets do some tests to compare the different types:


(Hashdata.Object.ps1)

You might be curious why I do not use the Export-CliXML cmdlet and just use the [System.Management.Automation.PSSerializer]::Serialize static method. The static method will generate the same xml, however we …

Build your local powershell module repository - ProGet

So Windows Powershell Blog released a blog a couple of days ago (link). Not too long after, a discussion emerged about it being to complicated to setup. Even though the required software is open source (nugetgalleryserver), it looks like you need to have Visual Studio Installed to compile it. I looked into doing it without visual stuidio, however I have been unable to come up with a solution. I even tweeted about it since I am not an developer. Maybe someone how is familiar with “msbuild” could do a post on how to do it without VS.

Anyhow one of my twitter-friends (@sstranger) came to the rescue and pointed me in the direction of ProGet, hence the title of this post. ProGet comes in 2 different licensing modes
Free (reduced functionality)Enterprise (paid version with extra features)The good news is that the free version supports hosting a local PowershellGet repository which was my intention anyway. So off we go and create a Configration that can install ProGet for us. This is the conf…

Something completely different – PoshARM

I needed a project for my Xmas holiday and I needed something remotely work related. Thus the dubious PoshARM PowerShell module was born and brought to life during my Xmas holiday. Simply put it is a module that lets you build – for now – simple Azure Resource Manager (ARM) templates with PowerShell . 

The module can also import templates from a file or from the clipboard/string. Your partial template or ready made template can be exported as a PowerShell script. This blog post will walk you through how to use it and the features that is currently implemented. 



Update 08.02.2017:

The module is now published to the PowerShellGallery (https://www.powershellgallery.com/packages/posharm). It is still in beta version, however test coverage have increased and some bugs have been squashed during the testing. Also help is present, however somewhat lacking here and there.

Update 18.01.2017:

The module is now on GitHub. Here is the link to the repro (PoshARM on GitHub)



What is a ARM template?It is a …