Thursday, September 3, 2015

Manage stock Windows AMIs with Ansible (part 2)

In part 1, we demonstrated the use of an AWS User Data script to set a known Administrator password, and configure WinRM on a stock Windows AMI. In part 2, we'll use this technique with Ansible to spin up Windows hosts from scratch and put them to work.

We'll assume that you've got Ansible configured properly for your AWS account (eg, boto installed, IAM credentials set up). See Ansible's AWS Guide if you need help getting this going. The examples in this post were tested against Ansible 2.0 (in alpha as of this writing), however, most of the content is applicable to Ansible 1.9. For simplicity, these samples also assume that you have a functional default VPC in your region (you should, unless you've deleted it). If you need help getting that configured, see Amazon's page on default VPCs.

We'll build up the files throughout the post, but a gist with complete file content is available at

First, we'll set up a basic inventory that includes localhost, and define a couple of groups. The hosts we create or connect with in AWS will be added dynamically to the inventory and those groups. Create a file called hosts in your current directory, with the following contents:

localhost ansible_connection=local


ansible_ssh_pass={{ win_initial_password }}

Note that we're using a variable in our inventory for the password- in conjunction with a vault, that keeps the password private. We'll set that up next. Create a vault file called secret.yml in the same directory with your inventory by running:

ansible-vault create secret.yml

Assign a strong password to the vault file when prompted, then put the following contents inside it when the editor pops up: [note- the default vault editor is vim- ensure it's installed, or preface the command with EDITOR=(your editor of choice here) to use a different one]:

win_initial_password: myTempPassword123!

Save and exit the editor to encrypt the vault file.

Next, we'll create a template of the User Data script we used in Part 1, so that the initial instance password can be set dynamically. Create a file called userdata.txt.j2 with the following content:

$admin = [adsi]("WinNT://./administrator, user")
$admin.PSBase.Invoke("SetPassword", "{{ win_initial_password }}")
Invoke-Expression ((New-Object System.Net.Webclient).DownloadString(''))

Note that we've replaced the hardcoded password from Part 1 with the variable win_initial_password (that's being set in our vault file).

Finally, we'll create the playbook that will set up our Windows machine. Create a file called win-aws.yml; we'll build our playbook inside.

Since our first play will be talking only to AWS (from our control machine), it only needs to target localhost, and we don't need to gather facts, so we can shut that off. We'll set a play-level var for the AWS region, and load the passwords from secret.yml. The first task looks up an Amazon-owned AMI named for the OS we want to run. The version number changes frequently, and old images are often retired, so we'll wildcard that part of the name, and sort descending so that the first image in the list should be the newest. Thankfully, Amazon pads these version numbers to two digits, so an ASCII sort works here. We want the module to fail if no images are found. Last, we'll register the output from the module to a var named found_amis, so we can refer to it later. Place the following content in win-aws.yml:

- name: infrastructure setup
  hosts: localhost
  gather_facts: no
    target_aws_region: us-west-2
    - secret.yml
  - name: find current Windows AMI in this region
      region: "{{ target_aws_region }}"
      platform: windows
      virtualization_type: hvm
      owner: amazon
      name: Windows_Server-2012-R2_RTM-English-64Bit-Base-*
      no_result_action: fail
      sort: name
      sort_order: descending
    register: found_amis

Next, we'll take the first found AMI result and set its ami_id value into a var called win_ami_id:

  - set_fact:
      win_ami_id: "{{ (found_amis.results | first).ami_id  }}"

Before we can fire up our instance, we'll need to ensure that there's a security group we can use to access it (in the default VPC, in this case). The group allows inbound access on port 80 for the web app we'll set up later, port 5986 for WinRM over https, and port 3389 for RDP in case we need to log in and poke around interactively. Again, we'll register the output to a var called sg_out so we can get its ID:

  - name: ensure security group is present
      name: WinRM RDP
      description: Inbound WinRM and RDP
      region: "{{ target_aws_region }}"
      - proto: tcp
        from_port: 80
        to_port: 80
      - proto: tcp
        from_port: 5986
        to_port: 5986
      - proto: tcp
        from_port: 3389
        to_port: 3389
      - proto: -1
    register: sg_out

Now that we know the image and security group IDs, we have everything we need to ensure that we have an instance in the default VPC:

  - name: ensure instances are running
      region: "{{ target_aws_region }}"
      image: "{{ win_ami_id }}"
      instance_type: t2.micro
      group_id: [ "{{ sg_out.group_id }}" ]
      wait: yes
      wait_timeout: 500
      exact_count: 1
        Name: stock-win-ami-test
        Name: stock-win-ami-test
      user_data: "{{ lookup('template', 'userdata.txt.j2') }}"
    register: ec2_result

We're just passing through the target_aws_region var we set earlier, as well as the win_ami_id we looked up. From the sg_out variable that contains the output from the security group module, we pull out just the group_id value and pass that as the instance's security group. For our sample, we just want one instance to exist, so we ask for an exact_count of 1, which is enforced by the count_tag arg finding instances with the Name tag set to "stock-win-ami-test". Finally, we use an inline template render to substitute the password into our User Data script template and pass it directly to the user_data arg; that will cause our instance to set up WinRM and reset the admin password on initial bootup. Once again, we register the output to the ec2_result var, as we'll need it later to add the EC2 hosts to inventory. Once this task has run, we need some way to ensure that the instances have booted, and that WinRM is answering (which can take some time). The easiest way is to use the wait_for action, against the WinRM port:

  - name: wait for WinRM to answer on all hosts
      port: 5986
      host: "{{ item.public_ip }}"
      timeout: 300
    with_items: ec2_result.tagged_instances

This task will return immediately if the instance is already answering on the WinRM port, and if not, poll it for up to 300 seconds before giving up and failing. Our next step will consume the output from the ec2 task to add the host to our inventory dynamically:

  - name: add hosts to groups
      name: win-temp-{{ }}
      ansible_ssh_host: "{{ item.public_ip }}"
      groups: win
    with_items: ec2_result.tagged_instances

This task loops over all the instances that matched the tags we passed (whether they were created or pre-existing) and adds them to our in-memory inventory, placing them in the win group (that we defined statically in the inventory earlier). This allows us to use the group_vars we set on the win group with all the WinRM connection details, so the only values we have to supply are the host's name and it's IP address (via ansible_ssh_host, so WinRM knows how to reach it). Once this task completes, we have fully-functional Windows instances that we can immediately target in another play in the same playbook (for instance, to do common configuration tasks, like resetting the password), or we could use a separate playbook run later against an ec2 dynamic inventory that targets these hosts. Let's do the former; we'll install IIS and configure up a simple Hello World web app. First, let's create a web page that we'll copy over. Create a file called default.aspx with the following content:

Hello from <%= Environment.MachineName %> at <%= DateTime.UtcNow %>

Next, add the following play to the end of the playbook we've been working with:

- name: web app setup
  hosts: win
  vars_files: [ "secret.yml" ]
  - name: ensure IIS and ASP.NET are installed
      name: AS-Web-Support

  - name: ensure application dir exists
      path: c:\inetpub\foo
      state: directory

  - name: ensure default.aspx is present
      src: default.aspx
      dest: c:\inetpub\foo\default.aspx

  - name: ensure that the foo web application exists
      name: foo
      physical_path: c:\inetpub\foo
      site: Default Web Site

  - name: ensure that application responds properly
      url: http://{{ ansible_ssh_host}}/foo
      return_content: yes
    register: uri_out
    delegate_to: localhost
    until: uri_out.content | search("Hello from")
    retries: 3

  - debug:
      msg: web application is available at http://{{ ansible_ssh_host}}/foo

This play targets the win group with the dynamic hosts we just added to it. We pull in our secrets file again (as the inventory will always need the password value inside). The play ensures that IIS and ASP.NET are installed with the win_feature module, creates a directory for the web application with win_file, copies the application content into that directory with win_copy, and ensures that the web application is created in IIS. Finally, we delegate a uri task to the local Ansible runner, and have it make up to 3 requests to the foo application, looking for the content that should be there.

At this point, we've got a complete playbook that will idempotently stand up a Windows machine in AWS with a stock AMI, then configure and test a simple web application. To run it, just tell ansible-playbook where to get its inventory, what to run, and that you'll need to specify a vault password, like:

ansible-playbook -i hosts win-aws.yml --ask-vault-pass

After supplying your vault password, the playbook should run to completion, at which point you should be able to access the web application via http://(your AWS host IP)/foo/.

We've shown that it's pretty easy to use Ansible to provision Windows instances in AWS without needing custom AMIs. These techniques can be expanded to set up and deploy most any application with Ansible's growing Windows support. Give it a try for your code today! Happy automating...

Manage stock Windows AMIs with Ansible (part 1)

Ever wished you could just spin up a stock Windows AMI and manage it with Ansible directly? Linux AMIs usually have SSH enabled and private key support configured at first boot, but most stock Windows images don't have WinRM configured, and the administrator passwords are randomly assigned and only accessible via APIs several minutes post-boot. People go to some pretty awful lengths to get plug-and-play Windows instances working with Ansible under AWS, but the most common solution seems to be building a derivative AMI from an instance with WinRM pre-configured and a hard-coded Administrator password. This isn't too hard to do once, but between Amazon's frequent base AMI updates, and the need to repeat the process in multiple regions, it can quickly turn into an ongoing hassle.

Enter User Data. If you're not familiar with it, you're not alone. It's a somewhat obscure option buried in the Advanced area of the AWS instance launch UI. It can be used for many different purposes; much of the AWS documentation treats it as a mega-tag that can hold up to 16k of arbitrary data, accessible only from inside the instance. Less well-known is that scripts embedded in User Data will be executed by the EC2 Config Windows service near the end of the first boot. This allows a small degree of first-boot customization on a vanilla instance, including setting up WinRM and changing the administrator password; once those two items are completed, the instance is manageable with Ansible immediately!

We'll build up the files throughout the post, but a gist with complete file content is available at

Scripts can be embedded in User Data by wrapping them in <powershell> or <script> tags for Windows batch scripts- in this case, we'll stick to Powershell. The following User Data script will set the local Administrator password to a known value, then download and run a script hosted in Ansible's GitHub repo to auto-configure WinRM:

$admin = [adsi]("WinNT://./administrator, user")
$admin.PSBase.Invoke("SetPassword", "myTempPassword123!")

Invoke-Expression ((New-Object System.Net.Webclient).DownloadString(''))

A word of caution: User Data is accessible via http from inside the instance without any authentication. While the following technique will get your instances quickly accessible from Ansible, DO NOT use a sensitive password (eg, your master domain admin password), as it will be visible as long as the User Data exists, and User Data requires an instance stop/start cycle to modify. Anyone/anything inside your instance that can make an http request to an arbitrary host can see the password you set with this technique. A good practice is to have one of your first Ansible tasks against your new instance change the password to a different value. Another thing to keep in mind is that the default Windows password policy is usually enabled, so the passwords you choose need to satisfy its complexity requirements.

Before we get to the Holy Grail of actually using Ansible to spin up Windows instances using this technique, let's just try it manually from the AWS Console first. Click Launch Instance, and select a Windows image, then under Configure Instance Details, expand Advanced Details at the bottom to see the User Data textbox.

Paste the script above into the textbox, then click through to Configure Security Group, and ensure that TCP ports 3389 and 5986 are open for all IPs. Continue to Review and Launch, select your private key (which doesn't make any difference now, since you know the admin password), and wait for the instance to launch. If all's well, after the instance has booted you should be able to reach RDP on port 3389, and WinRM on port 5986 with Ansible (both protocols using the Administrator password set by the script). It can often take several minutes for Windows instances set up this way to begin responding, so be patient!

Let's test this using the win_ping module with a dirt simple inventory. Create a file called hosts with the following contents:

aws-win-host ansible_ssh_host=(your aws host public IP here)



then run the win_ping module using Ansible, referencing this inventory file:

ansible win -i hosts -m win_ping

If all's well, you should see the ping response, and your AWS Windows host is fully manageable by Ansible without using a custom AMI!

In part 2, we'll show an end-to-end example of using Ansible to provision Windows AWS instances.

Monday, November 19, 2012

DHCP Failover Breaks with Custom Options

I was really itching to try out the new DHCP Failover goodies in Windows Server 2012. I ran into a couple weird issues when trying to configure it- hopefully I can save someone else the trouble.

When I tried to create the partner server relationship and configure failover, I'd get the following error: Configure failover failed. Error: 20010. The specified option does not exist.

We have a few custom scope options defined for our IP phones. Apparently, it won't propagate the custom option configuration during the partner relationship setup- you have to do it manually. I haven't found this step or error message documented anywhere in the context of failover configuration.

Since we only had one custom option, and I knew what it was, I just manually added it. If you don't know which options are custom and need to be copied over, it's not hard to figure out. In the DHCP snap-in on the primary server, right-click the IPv4 container and choose Set Predefined Options, then scroll through values in the Option Name dropdown with the keyboard arrows or mouse wheel until you see the Delete button light up (that'll be a custom value). Hit Edit and copy the values down, then in the same place on the partner server, hit Add and poke in the custom values. If you have lots of custom options, you can use netsh dhcp or PowerShell to get/set the custom option config.

Once the same set of custom options exist on both servers, you can do Configure Failover as normal on the scopes and it should work fine. The values of any custom options defined under the scopes will sync up just fine.

I also had one scope where Configure Failover wasn't an option. I had imported all my scopes from a 2003 DC awhile back, so I'm guessing there was something else corrupted in the scope config- just deleting and recreating the scope fixed the problem (it was on a rarely used network, so no big deal; YMMV).

Hope this helps!

Friday, March 2, 2012

Enabling AHCI/RAID on Windows 8 after installation

UPDATE: MS has recently published a KB article on a simpler way to address this. Thanks to commenter Keymapper for the heads up!

Been playing around with Windows 8 Consumer Preview and Windows 8 Server recently. After installing, I needed to enable RAID mode (Intel ICH9R) on one of the machines that was incorrectly configured for legacy IDE mode (why is this the default BIOS setting Dell?). In Win7, you would just ensure that the Start value for the Intel AHCI/RAID driver is set to 0 in the registry, then flip the switch in the BIOS, and all's well. Under Win8 though, you still end up with the dreaded INACCESSIBLE_BOOT_DEVICE. The answer is simple enough: turns out they've added a new registry key underneath the driver you'll need to tweak: StartOverride. I just deleted the entire key, but if you're paranoid, you can probably just set the named value 0 to "0".

So, the full process:

- Enable the driver you need before changing the RAID mode setting in the BIOS:
(for Intel stuff, the driver name is usually iaStorV or iaStorSV, others may use storahci)
-- Delete the entire StartOverride key (or tweak the value) under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\(DriverNameHere)
- Reboot to BIOS setup
- Enable AHCI or RAID mode
- Profit!

Monday, January 9, 2012

Windows installation fails with missing CD/DVD driver

I was recently upgrading one of our large storage servers to a newer version of Windows and came across a really strange error during setup: "A required CD/DVD drive device driver is missing". This was really odd, since I've installed with no problems to this same class of hardware many times (and even this exact piece of hardware). Even stranger, the error popped up regardless if I was using the real DVD drive in the machine or our KVM's "virtual media" feature (which looks like a USB-connected DVD drive to Windows). After lots of searching and trying various things, I remembered what was different about this machine: it had about 30 storage LUNs on it for all the various disks. Hitting the "Browse..." button in the driver select dialog confirmed the problem- Windows was helpfully automounting all the LUNs and assigning drive letters to them. Since there are more than 26, it ran out of drive letters before getting around to mounting the DVD drive. Setup assumes if it can't find the DVD drive, it must be a driver problem, hence the misleading error message. I always disable automount on our storage servers anyway (since we mount all those storage LUNs under NTFS folders, not drive letters), but you can't do that for setup without altering the boot image. The solution in this case was to hit Shift-F10 from the setup dialog to get a command prompt, then use diskpart to unassign D: from a storage LUN and reassign it to the DVD drive.
list vol
select vol X 
(where X is the volume number in the list with D: assigned)
(removes the drive letter)
select vol Y 
(where Y is the volume number in the list that is your Windows Setup DVD)
assign letter=D
Once the setup DVD has a drive letter, you can close the command prompt and proceed with setup normally.

Monday, July 25, 2011

Dynamic client-side UI with Script#

At Earth Class Mail, we've just recently shipped a client-side UI written 100% in Script#, and we thought people might be interested in the process we used to get there. This post is just an overview of what we did- we'll supplement with more details in future posts. I should throw out some props to the team before I get too far- while my blog ended up being the home for the results, Matt Clay was the one behind most of the cool stuff that happened with this. He recognized early on that Script# represented a new way for us to do things, and drove most of the tooling I describe below. Cliff Bowman was the first to consume all the new tooling and came up with a lot of ideas on how to improve it.

Script# Background

If you haven't run across Script# before, I'm not surprised. It's a tool built by Nikhil Kothari that allows C# to compile down to Javascript. Script# has been around since at least 2007 (probably longer), but has only recently started to be really convenient to use. It lets you take advantage of many of the benefits of developing in a strongly-typed language like C# (compile-time type-checking, method signature checks, high-level refactoring) while still generating very minimal JS that looks like what you wrote originally. If you’d like more background, Matt and I talked about it recently with Scott Hanselman. You can also visit Nikhil's Script# project home.

Import Libraries

The first thing we had to do was write a couple of import libraries for JS libraries we wanted to use that weren't already in the box. Import libraries are somewhat akin to C header files that describe the shape (objects, methods, etc) of the code in the JS library so the C# compiler has something to verify your calls against. The import library does *not* generate any JS at compile-time, it's merely there to help out the C# compiler (as well as everything that goes with that- e.g., Intellisense, "Find References"). Script# ships with a jQuery import library, which hit the majority of what we needed. Since we'd previously decided to use some other libraries that didn't already have Script# import libraries (jQuery Mobile, DateBox, JSLinq), we had to whip out import libraries for them- at least for the objects and functions we needed to call from C#. This was a pretty straightforward process- only took a couple of hours to get what we needed.

Consuming WCF Services

The next challenge was to get jQuery calling our WCF JSON services. Our old JS UI had a hand-rolled service client that we'd have to keep in sync with all our operations and contracts- maintenance fail! Since our service contracts were already defined in C#, we initially tried to just "code-share" them into the Script# projects, but that proved to be problematic for a few reasons. First, the contracts were decorated with various attributes that WCF needs. Since Script# isn't aware of WCF, these attributes would need to be redeclared or #ifdef'd away to make the code compilable by Script#. It ended up not mattering anyway, though, since our service contract signatures weren't directly usable anyway. Since XmlHttpRequest and its ilk are async APIs, our generated service client would have to supply continuation parameters to all service calls (ie, onSuccess, onError, onCancel) which would render the operation signatures incompatible anyway. Our options were to redeclare the service contract (and impl) to be natively async pattern (so they'd have a continuation parameter declared) or to code-gen an async version of the contract interface for the client. We opted for the code-generation approach, as it allowed for various other niceties (e.g., unified eventing/error-handling, client-side partial members that aren't echoed back to the server), and settled on a hybrid C#/T4 code generator that gets wired into our build process. Now we have fully type-checked access to our service layer, and with some other optimizations that we'll talk about later, only a minimal amount of JS gets generated to support it.

jQuery templating

The next challenge was using the new jQuery template syntax. This is a pretty slick new addition to jQuery 1.5 that allows for rudimentary declarative client-side databinding. It works by generating JS at runtime from the templated markup file (very simple regex-based replacements of template tags with JS code)- the templates can reference a binding context object that allows the object's current value to appear when the markup is rendered. While it worked just fine in our new "all-managed" build environment, we had a couple of things we didn't like. The first was that template problems (unbalanced ifs, mis-spelled/missing members, etc) can't be discovered until runtime, when it's either a script error (at best) or a totally blank unrenderable UI (at worst). It also means that managed code niceties like "Find All References" won't behave correctly against code in templates, since they don't target Script#. We decided to make something with similar syntax and mechanics, but that runs at build-time and dumps out Script# code instead of JS. This way, "Find All References" still does the right thing, and we get compile-time checking of the code emitted by the templates. Just like other familiar tools, our template compiler creates a ".designer.cs" file that contains all the code and markup, which is then compiled by the Script# compiler into JS. We get a few added benefits from this approach as well. The code isn't being generated at runtime (as it is with jQuery templates), so we can detect template errors at compile-time. We were also able to add some new template tags for declarative formatting, control ID generation, and shortcutting resource access.


Next, we wanted to consume resources from .resx files using the same C# class hierarchy available in native .NET apps. Even though Script# has a little bit of resource stuff built in, Matt whipped up a simple build-time T4-based resx-to-Script# code generator that also added a few niceties (automatic enum-string mapping, extra validation).

Visual Studio/Build integration

Currently, all this stuff is wired up through pre/post-build events in Visual Studio, and some custom MSBuild work. We're looking at ways we could get it a little more tightly integrated, as well as having it work as a "custom tool" in VS2010 to allow save-time generation of some of the code rather than build-time.


Combining Script# with a bit of custom tooling allows for a new way of writing tight client-side JS that looks a lot more like ASP.NET or Winforms development, and offers many of the same conveniences. Nobody on our team wants to write naked JS any more- to the point that we're actually working on tooling to convert our existing JS codebase to Script#, so we can start to clean it up and make changes with all the C# features we've come to expect from client-side development. Obviously, some manual work will still be required to get everything moved over, but our team really believes that this is the wave of the future.

Saturday, July 23, 2011

The Road to Script#

At work, we just shipped our first major new chunk of UI in a couple of years, written 100% in Script#. We've been watching Script# for a few years as an interesting option for creating client-side UI, and it recently hit a level of functionality where we felt it was workable. It also coincided nicely with our need for a mobile UI (a standalone product that we could roll out slowly, low-risk compared to changing our bread-and-butter desktop UI).

A little history

When we first started working on a .NET rewrite of the LAMP-based Earth Class Mail (aka Remote Control Mail) in 2007, the client-side revolution was in full force. All the cool kids were eschewing server-side full page postback UI in favor of Javascript client UI. We recognized this from the start, but also had very tight shipping timelines to work under. ComponentArt had some nifty-looking products that promised the best of both worlds- server-side logic with client-side templating, data-binding, and generated Javascript. This fit perfectly with our team's server-side skillset (we didn't have any JS ninjas at the time), so we jumped on it. While we were able to put together a mostly client-side UI in a matter of months, it really didn't live up to the promise. The generated JS and ViewState was very heavy, causing the app to feel slow (Pogue and other reviewers universally complained about it). Also, the interaction between controls on the client was very brittle. Small, seemingly unimportant changes (like the order in which the controls were declared) made the difference between working code and script errors in the morass of generated JS. At the end of the day, we shipped pretty much on time, but the result was less than stellar, and we'd already started making plans for a rewrite.

V2: all JS

Fast-forward to summer of 2009, when we shipped a 25k+ line all-JS jQuery client UI rewrite (which also included an all-new WCF webHttpBinding/JSON service layer). While it was originally planned to be testable via Selenium, JSUnit, and others, things were changing too fast and the tests got dropped, so it was months of tedious manual QA to get it out the door. User reception of the new UI was very warm, and we iterated quickly to add new features. However, refactoring the new JS-based UI was extremely painful due to the lack of metadata. We mostly relied on decent structure and "Ctrl-f/Ctrl-h" when we needed to propagate a service contract change into the client. Workable, but really painful to test changes, and there were inevitably bugs that would slip through when someone did something "special" in the client code. It got to a point where we were basically afraid of the codebase, since we couldn't refactor or adjust without significant testing pain, so the client codebase stagnated somewhat.

On to Script#

We'd been watching our user-agent strings trend mobile for awhile, and this summer it finally reached a point where we needed to own a mobile UI. Our mobile story to this point consisted of users running the main browser UI on mobile devices with varying degrees of success (and a LOT of zooming), and an iPhone app that a customer wrote by reverse-engineering our JSON (we later helped him out by providing the service contracts, since he was filling a void we weren't). The question came to how would we build a new mobile UI? Bolting it to our existing JS client wasn't attractive to anyone, as it's grown unwieldy and scary, and we didn't want to risk destabilizing it with a bunch of new mobile-only code. The prospect of another mass of JS code wasn't attractive to anyone. Another ECM architect (Matt Clay) had been watching Script# for quite awhile, and it had just recently added VS2010 integrated code editing (used to be a standalone non-intellisense editor that ran inside VS2008) and C# 2.0 feature support (generics, anonymous methods). These features gave us enough pause to take another look, and after a week or so of experimentation, we decided to try and ship the mobile UI written with Script#. I'll post something soon that describes what the end result looks like.