Skip to content

Latest commit

 

History

History
503 lines (443 loc) · 23.7 KB

Legacy_Documentation.adoc

File metadata and controls

503 lines (443 loc) · 23.7 KB

PSYaml

PSYaml is a simple PowerShell module that I’ve written that allows you to serialize PowerShell objects to "YAML Ain’t Markup Language" (YAML) documents and deserialize YAML documents to PowerShell objects. It uses Antoine Aubry’s excellent YamlDotNet library

To start, you can simply load the PowerShell file and the manifest from its home on GitHub PSYaml into this directory or use a script that I provide in the next listing

$($env:USERPROFILE)\Documents\WindowsPowerShell\Modules\PSYaml

I do it, or update it, via a script like this.

    Add-Type -assembly "system.io.compression.filesystem"
    # for the unzipping operation
    $YAMLDotNetLocation = "$($env:USERPROFILE)\Documents\WindowsPowerShell\Modules\PSYaml"
    # the location of the module
    if (!(test-path "$($YAMLDotNetLocation)\YAMLdotNet")) #if the location doesn't exist
      {New-Item -ItemType Directory -Force -Path "$($YAMLDotNetLocation)\YAMLdotNet"} #create the location
    $client = new-object Net.WebClient #get a webclient to fetch the files
    $client.Proxy.Credentials = [System.Net.CredentialCache]::DefaultNetworkCredentials
    $client.DownloadFile('https://github.com/Phil-Factor/PSYaml/archive/master.zip',"$($YAMLDotNetLocation)PSYAML.zip")
    if ((test-path "$($YAMLDotNetLocation)\PSYaml-master")) #delete the existing version if it exists
      { Remove-Item "$($YAMLDotNetLocation)\PSYaml-master"-recurse -force}
    [io.compression.zipfile]::ExtractToDirectory("$($YAMLDotNetLocation)PSYAML.zip", $YAMLDotNetLocation)
    Copy-Item "$YAMLDotNetLocation\PSYaml-master\*.*" $YAMLDotNetLocation #copy it into pleace

Beware that you need to be clear about your execution policy before you start and check the file before you load the module. Once you are ready, you can load the module into your PowerShell session like this.

import-module psyaml

The first time you do it, you need to be connected to the internet so it can load the latest version of the YamlDotNet library from NuGet.

Once the module is in place and working, you can execute code like this

[ordered]@{
 Computername = $(Get-wmiobject win32_operatingsystem).csname
 OS = $(Get-wmiobject win32_operatingsystem).caption
 'Uptime (hours)' = ((get-date) - ([wmiclass]"").ConvertToDateTime((Get-wmiobject win32_operatingsystem).LastBootUpTime)).Hours
 Make = $(get-wmiobject win32_computersystem).model
 Manufacturer = $(get-wmiobject win32_computersystem).manufacturer
 'Memory (Gb)' = $(Get-WmiObject win32_computersystem).TotalPhysicalMemory/1GB -as [int]
 Processes = (Get-Process).Count
 drives =  Get-WmiObject Win32_logicaldisk|select DeviceID, description
 } |ConvertTo-YAML

to give you a YAML representation of the data that is easy to assimilate.

  Computername: 'LTPFACTOR'
  OS: 'Microsoft Windows 8.1 Enterprise'
  Uptime (hours): 21
  Make: 'Latitude E8770'
  Manufacturer: 'Dell Inc.'
  Memory (Gb): 8
  Processes: 169
  Drives:
  -
     DeviceID: 'C:'
     description: 'Local Fixed Disk'
  -
     DeviceID: 'K:'
     description: 'Network Connection'
  -
     DeviceID: 'L:'
     description: 'Network Connection'
  -
     DeviceID: 'M:'
     description: 'Network Connection'
  -
     DeviceID: 'N:'
     description: 'Network Connection'
  -
     DeviceID: 'P:'
     description: 'Network Connection'
  -
     DeviceID: 'S:'
     description: 'Network Connection'

Try it with something like Format-table and you’ll probably agree that there is a place for rendering hierarchical information in a human-oriented way.

YAML and PowerShell

When you need to use structured data in PowerShell, you have to think of writing it out - serializing it – and reading it into an object– deserialising it. You’ll hear talk of serializing objects, but really, you’re only serializing the data within it, such as properties, lists, collections, dictionaries and so on, rather than the methods. In a compiled language, a serialized object can’t do anything for itself once it has been deserialised and re-serialised. It is just a container for data. PowerShell is unusual in that it can include scripts in objects, as ScriptMethods and ScriptProperties, so it is theoretically possible to transfer both between PowerShell applications, but this is out of the scope of this article.

You’ve got some choice in PowerShell of how you serialize objects into structured documents, and back again. The two built-in formats are XML and JSON. I’ll be showing you how to get to use a third: YAML.

Why YAML?

You’d need a good reason for not using XML. It is the obvious format for juggling with data. PowerShell allows you to query it and treat it as an object. If you use XML Schemas, you have a very robust system. The downside of XML is that it is complex, arcane, and the XML documents can’t be easily read or altered by humans. It can take a long time to process.

JSON is popular because it is so simple that any language can be used to read or write it. The downside is that it doesn’t do much, and has a restricted range of datatypes. You can’t actually specify the data type of a value, for example. It isn’t an intuitive way of laying data out on the page. YAML is a formalization of the way that we used to lay out taxonomies and forms of structured data before computers. It is easy to understand. When you start doing bulleted lists within lists, it starts to look like YAML. As far as readability goes, here is YAML document

phil:
  name: Phil Factor
  job: Developer
  skills:
   - SQL
   - python
   - perl
   - pascal
- derek:
  name: Derek DBA
  job: DBA
  skills:
   - TSQL
   - fortran
   - cobol

And here is the same in JSON.

[ { phil:
   { name: 'Phil Factor',
    job: 'Developer',
    skills: [ 'SQL', 'python', 'perl', 'pascal' ] } },
 { derek:
   { name: 'Derek DBA',
    job: 'DBA',
    skills: [ 'TSQL', 'fortran', 'cobol' ] } } ]

I haven’t really the space in this article for the XML version.

YAML is officially now a superset of JSON, and so a YAML serializer can usually be persuaded to use the JSON ‘brackety’ style if you prefer, or require, that. The PSYaml module has a function just to convert from the indented dialect of YAML to the 'Brackety' dialect aka JSON. Beware that not everything in YAML will convert to JSON so it is possible to get errors in consequence.

import-module psyaml
Convert-YAMLtoJSON @"
# Employee records
-  phil:
    name: Phil Factor
    job: Developer
    skills:
      - SQL
      - python
      - perl
      - pascal
-  derek:
    name: Derek DBA
    job: DBA
    skills:
      - TSQL
      - fortran
      - cobol
"@

which will give …​

[{"phil": {"name": "Phil Factor", "job": "Developer", "skills": ["SQL", "python", "perl", "pascal"]}}, {"derek": {"name": "Derek DBA", "job": "DBA", "skills": ["TSQL", "fortran", "cobol"]}}]

YAML also allows you to specify the data type of its values explicitly. If you wish to ensure that a datatype is read correctly, and Mr and Mrs Null will agree with me on this, you can precede the value with !!float, !!int, !!null, !!timestamp, !!bool, !!binary, !!Yaml or !!str. These are the most common YAML datatypes that you are likely to across, and any deserializer must cope with them. YAML also allows you to specify a data type that is specific to a particular language or framework, such as geographic coordinates. YAML also contains references, which refer to an existing element in the same document. So, if an element is repeated later in a YAML document, you can simply refer to the element using a short-hand name.

Another advantage to YAML is that you can specify the type of set or sequence, and whether it is ordered or unordered. It is much more attuned to the rich variety of data that is around.

I use YAML a great deal for documentation and for configuration settings. I started off by using PowerYAML which is a thin layer around YamlDotNet. Unfortunately, although YamlDotNet is excellent, PowerYAML hadn’t implemented any serialiser, hadn’t implemented data type tags, and couldn’t even auto-detect the data type. As it wasn’t being actively maintained, and was incompatible with the current version of the YamlDotNet library that was doing all the heavy work, I wrote my own module using YamlDotNet directly.

You merely load the module:

import-module psyaml

and you will have a number of functions that you require.

You don’t really need a special module, of course. Using YamlDotNet directly isn’t a big deal if you don’t want to bother with PSYaml. You just need to import a single library. To get hold of the latest version of YAML.net, you should get it from NuGet. You’d get hold of Nuget.exe and run

nuget install yamldotnet

Don’t worry about this unless you would like to work directly with YamlDotNet for special purposes. In my module, I have a function that does all this for you and allows you to keep up-to-date with the latest version of YamlDotNet.

In our simple PowerShell script we load this library

Add-Type -Path "$OurPathTo\yamldotnet.dll" #where $OurPathTo is the actual path

And we can then create some simple functions

Function YAMLSerialize
    {
    [CmdletBinding()]
    param
    (
    [parameter(Position = 0, Mandatory = $true, ValueFromPipeline = $true)]
    [object]$PowershellObject
    )
BEGIN { }
PROCESS
    {$Serializer = New-Object YamlDotNet.Serialization.Serializer([YamlDotNet.Serialization.SerializationOptions]::emitDefaults)
#None. Roundtrip, DisableAliases, EmitDefaults, JsonCompatible, DefaultToStaticType
$stringBuilder = New-Object System.Text.StringBuilder
$stream = New-Object System.io.StringWriter -ArgumentList $stringBuilder
$Serializer.Serialize($stream,$PowershellObject) #System.IO.TextWriter writer, System.Object graph)
$stream.ToString()}
END {}
}

Function YAMLDeserialize

    {
    [CmdletBinding()]
    param
    (
        $YamlString
    )
$stringReader = new-object System.IO.StringReader([string]$yamlString)
$Deserializer=New-Object -TypeName YamlDotNet.Serialization.Deserializer -ArgumentList $null, $null, $false
$Deserializer.Deserialize([System.IO.TextReader]$stringReader)
}

This will give us the basics. Naturally, there is a lot more we can, and will, do; but this will get you started. Of course, this is all done for you in PSYaml and you can access these very functions.

Now we just want a simple YAML string to test out the plumbing.

    $YamlString =@"
    invoice: !!int 34843
    date   : 2001-01-23
    approved: yes
    bill-to: &id001
        given  : Chris
        family : Dumars
        address:
            lines: |
                458 Walkman Dr.
                Suite #292
            city    : Royal Oak
            state   : MI
            postal  : 48046
    ship-to: *id001
    product:
        - sku         : BL394D
          quantity    : 4
          description : Basketball
          price       : 450.00
        - sku         : BL4438H
          quantity    : 1
          description : Super Hoop
          price       : 2392.00
    tax  : 251.42
    total: 4443.52
    comments: >
        Late afternoon is best.
        Backup contact is Nancy
        Billsmer @ 338-4338.

"@

So let’s create a PowerShell object, and convince ourselves that it can read it in correctly by taking the object it produced, accessing properties from it and then outputting it as JSON.

YAMLSerialize (YAMLDeserialize $yamlString)

You should get the simple invoice back again. Job done? Well, possibly, but if you need to process the results in PowerShell, you may still hit problems. You’d expect, from using ConvertFrom-JSON, that this would work:

$MyInvoice=YAMLDeserialize $yamlString
$BillTo=$MyInvoice.'bill-to'

"Dispatch this to $($BillTo.given) $($BillTo.family) at the address:
 $($BillTo.address.lines)$($BillTo.address.city)
$($BillTo.address.state)
($($BillTo.address.postal))"

But it doesn’t. What is also bad is that in the PowerShell IDE, you haven’t got the intellisense prompt for the object either. You want the equivalent of this to happen with YAML

$JSONInvoice=convertFrom-JSON @'
{
  "invoice": 34843,
  "date": "\/Date(980208000000)\/",
  "approved": true,
  "bill-to": {
          "given": "Chris",
          "family": "Dumars",
          "address": {
                  "lines": "458 Walkman Dr.\nSuite #292\n",
                  "city": "Royal Oak",
                  "state": "MI",
                  "postal": 48046
                }
        },
  "ship-to": "id001",
  "product": [
          {
            "sku": "BL394D",
            "quantity": 4,
            "description": "Basketball",
            "price": 450.00
          },
          {
            "sku": "BL4438H",
            "quantity": 1,
            "description": "Super Hoop",
            "price": 2392.00
          }
        ],
  "tax": 251.42,
  "total": 4443.52,
  "comments": "Late afternoon is best. Backup contact is Nancy Billsmer @ 338-4338.\n"
}
'@
$BillTo=$jsonInvoice.'bill-to'

"Dispatch this to $($BillTo.given) $($BillTo.family) at the address:
 $($BillTo.address.lines)$($BillTo.address.city)
$($BillTo.address.state)
($($BillTo.address.postal))"

…and whatever else in terms of accessing the data via dot notation that you care to try. The problem is that the YAML deserialiser creates NET objects, which is entirely correct and useful, but it is just more convenient to have PowerShell objects to make them full participants.

Refining the Deserializing process.

Generally speaking, a good library for parsing and emitting data documents does so in two phases. The main work on a string containing XML, YAML, CSV or JSON is to create a representational model. The second phase is to turn that representational model into real data structures that are native to your computer language.

In the case of YAML, you can have several separate documents in a single YAML string so the parser will return a representational model for every data document within the file:. Each representational model consists of a number of ‘nodes’. All you need to do is to examine each node recursively to create a data object. Each node contains the basics: the style, tag and anchor. The mapping-style of the node is the way it is formatted in the document, The anchor is used where a node references another node to get its value, and a tag tells you what sort of data type it needs, explicitly. This will include ‘omap’, ‘seq’ or ‘map’, where the node contains a list, sequence or a dictionary, or ‘float’, ‘int’, ‘null’, ‘bool’ or ‘str’ if it has a simple value. You can specify your own special data, such as coordinates, table data or whatever you wish.

A typical YAML library will parse the presentation stream and compose the Representation Graph. The final input process is to construct the native data structures from the YAML representation. The advantage of this is that you can then specify how your special data types are treated in the conversion process. Because YAML is a superset of JSON, you still have to allow untyped values that then have to be checked to see what sort of data it contains.

Here is a routine that takes as a parameter a representational model and converts it into a PowerShell object. It is easy to check this by converting the resulting object to XML or JSON or even YAML.

function ConvertFrom-YAMLDocument
{
    [CmdletBinding()]
    param
    (
        [object]$TheNode #you pass in a node that, when you call it, will be the root node.
    )
    #initialise variables that are needed for providing the correct powershell data type for a string-based value.
    [bool]$ABool = $false; [int]$AnInt = $null; [long]$ALong = $null; [decimal]$adecimal = $null; [single]$ASingle = $null;
    [double]$ADouble = $null; [datetime]$ADatetime = '1/1/2000';

    $TheTypeOfNode = $TheNode.GetType().Name # determine this
    Write-Verbose "$TheTypeOfNode = $($theNode)" #just so see what is going on
     $Style = $TheNode.Style; $Tag = $TheNode.Tag; $Anchor = $TheNode.Anchor;
     Write-Verbose "Tag=$tag, Style=$style, Anchor=$anchor"
    if ($TheTypeOfNode -eq 'YamlDocument') #if it is the document, then call recursively with the rrot node
    { $TheObject = ConvertFrom-YAMLDocument $TheNode.RootNode }
    elseif ($TheTypeOfNode -eq 'YamlMappingNode') #ah mapping nodes
    {
        $TheObject = [ordered]@{ }; $theNode |
        foreach{ $TheObject.($_.Key.Value) = ConvertFrom-YAMLDocument $_.Value; }
    }
    elseif ($TheTypeOfNode -eq 'YamlScalarNode' -or $TheTypeOfNode -eq 'Object[]')
    {
        $value = "$($theNode)"
        if ($tag -eq $null)
        {
            $value = switch -Regex ($value)
            {
                # if it is one of the allowed boolean values
                '(?i)\A(?:on|yes)\z' { 'true'; break } #Deal with all the possible YAML boolenas
                '(?i)\A(?:off|no)\z' { 'false'; break }
                default { $value }
            };
        };

        $TheObject =
            if ($tag -ieq 'tag:yaml.org,2002:str') { [string]$Value } #it is specified as a string
            elseif ($tag -ieq 'tag:yaml.org,2002:bool') { [bool]$Value } #it is specified as a boolean
            elseif ($tag -ieq 'tag:yaml.org,2002:float') { [double]$Value } #it is specified as adouble
            elseif ($tag -ieq 'tag:yaml.org,2002:int') { [int]$Value } #it is specified as a int
            elseif ($tag -ieq 'tag:yaml.org,2002:null') { $null } #it is specified as a null
            elseif ($tag -ieq 'tag:yaml.org,2002:timestamp') {[datetime]$Value} #it is date/timestamp
            elseif ($tag -ieq 'tag:yaml.org,2002:binary') {[System.Convert]::FromBase64String($Value)}
            elseif ([int]::TryParse($Value, [ref]$AnInt)) { $AnInt } #is it a short integer
            elseif ([bool]::TryParse($Value, [ref]$ABool)) { $ABool } #is it a boolean
            elseif ([long]::TryParse($Value, [ref]$ALong)) { $ALong } #is it a long integer
            elseif ([decimal]::TryParse($Value, [ref]$ADecimal)) { $ADecimal } #is it a decimal
            elseif ([single]::TryParse($Value, [ref]$ASingle)) { $ASingle } #is it a single float
            elseif ([double]::TryParse($Value, [ref]$ADouble)) { $ADouble } #is it a double float
            elseif ([datetime]::TryParse($Value, [ref]$ADatetime)) { $ADatetime } #is it a datetime
            else { [string]$Value }
    }
    elseif ($TheTypeOfNode -eq 'Object[]') #sometimes you just get a raw object, not a node
    { $TheObject = $theNode.Value } #so you return its value
    elseif ($TheTypeOfNode -eq 'YamlSequenceNode') #in which case you
    { $TheObject = @(); $theNode | foreach{ $TheObject += ConvertFrom-YAMLDocument $_ } }
    else { Write-Verbose "Unrecognised token $TheTypeOfNode" }
    $TheObject
}

In order to use this, all you need to do is to load the text of the YAML document into a YAML stream.

  $stringReader = new-object System.IO.StringReader([string]$yamlString)
  $yamlStream = New-Object YamlDotNet.RepresentationModel.YamlStream
  $yamlStream.Load([System.IO.TextReader]$stringReader)
  ConvertFrom-YAMLDocument ($yamlStream.Documents[0])

So there you have it. We now wrap this last code in a function and we have a PowerShell module that we can use whenever we need to parse YAML. I won’t bother to list that here as I’ve put it on GitHub for you.

I also have added ConvertTo-YAML, because this is handy if you need plenty of control over the way that your PowerShell objects are serialized. Some of these objects are very unwieldy, with a lot of irrelevant information, and if you try serializing them without any sort of filtering, you will accidentally contribute to the Big Data crisis.

Last but most important, I wanted a way of loading a third party .net library into a module from nuget. I therefore added a function to add the library using add-Type, but which checked to make sure that everything was there first, and load it in the right place if it wasn’t. You can call it explicitly to check that you have the latest version of YamlDotNet. If it breaks something, you just delete the directory that it put the new version in: The module always loads the latest version in the YamlDotNet directory that it can find.

Initialize-PsYAML_Module $True

Simple Example of use

Here is a way of producing a YAML result from any SQL expression on a database

import-module psyaml
$SourceTable = 'production.location'
$Sourceinstance = 'YourInstanceName'
$Sourcedatabase = 'Adventureworks'

$SourceConnectionString = "Data Source=$Sourceinstance;Initial Catalog=$Sourcedatabase;Integrated Security=True"
$sql = "select * FROM $SourceTable"
$result = @()
try
{
    $sourceConnection = New-Object System.Data.SqlClient.SQLConnection($SourceConnectionString)
    $sourceConnection.open()
    $commandSourceData = New-Object system.Data.SqlClient.SqlCommand($sql, $sourceConnection)
    $reader = $commandSourceData.ExecuteReader()
    $Counter = $Reader.FieldCount
    while ($Reader.Read())
    {
        $tuple = @{ }
        for ($i = 0; $i -lt $Counter; $i++)
        {
        $tuple."$($Reader.GetName($i))" = "$(if ($Reader.GetFieldType($i).Name -eq 'DateTime')
                { $Reader.GetDateTime($i) }
                else { $Reader.GetValue($i) })";
        }
        $Result += $tuple
    }
    YAMLSerialize $result
}
catch
{
    $ex = $_.Exception
    Write-Error "whilst opening source $Sourceinstance . $Sourcedatabase . $SourceTable : $ex.Message"
}
finally
{
    $reader.close()
}

This will give the result (just the first three rows)

- CostRate: 0.0000
  ModifiedDate: 06/01/1998 00:00:00
  Name: Tool Crib
  Availability: 0.00
  LocationID: 1
- CostRate: 0.0000
  ModifiedDate: 06/01/1998 00:00:00
  Name: Sheet Metal Racks
  Availability: 0.00
  LocationID: 2
- CostRate: 0.0000
  ModifiedDate: 06/01/1998 00:00:00
  Name: Paint Shop
  Availability: 0.00
  LocationID: 3
#and so on...

So what is the point of all this?

Besides the fact that it is an intuitive way of representing data, one of the most important advantages of YAML over JSON is that YAML allows you to specify your data type. You don’t need to in YAML, but it can resolve ambiguity. I’ve implemented the standard YAML scalar tags of timestamp, binary, str, bool, float, int and null. if there is no scalar tag, I also autodetect a string to try to get it to the right data type.

YAML also has a rather crude way of allowing you to represent relational data by means of node Anchors. These A have an '&' prefix. An alias node can then be used to indicate additional inclusions of the anchored node. It means that you don’t have to repeat nodes in a document. You just write it once and then refer to the node by its anchor.

I find YAML to be very useful. What really convinces me of the power of YAML is to be able to walk the representational model to do special-purpose jobs such as processing hierarchical data to load into SQL. It is at that point that I finally decided that YAML had a lot going for it as a format of data document.