Thursday, September 3, 2015

Manage stock Windows AMIs with Ansible (part 2)

In part 1, we demonstrated the use of an AWS User Data script to set a known Administrator password, and configure WinRM on a stock Windows AMI. In part 2, we'll use this technique with Ansible to spin up Windows hosts from scratch and put them to work.

We'll assume that you've got Ansible configured properly for your AWS account (eg, boto installed, IAM credentials set up). See Ansible's AWS Guide if you need help getting this going. The examples in this post were tested against Ansible 2.0 (in alpha as of this writing), however, most of the content is applicable to Ansible 1.9. For simplicity, these samples also assume that you have a functional default VPC in your region (you should, unless you've deleted it). If you need help getting that configured, see Amazon's page on default VPCs.

We'll build up the files throughout the post, but a gist with complete file content is available at https://gist.github.com/nitzmahone/aaf4340ea8d87c7fa578.

First, we'll set up a basic inventory that includes localhost, and define a couple of groups. The hosts we create or connect with in AWS will be added dynamically to the inventory and those groups. Create a file called hosts in your current directory, with the following contents:

localhost ansible_connection=local

[win]

[win:vars]
ansible_connection=winrm
ansible_ssh_port=5986
ansible_ssh_user=Administrator
ansible_ssh_pass={{ win_initial_password }}

Note that we're using a variable in our inventory for the password- in conjunction with a vault, that keeps the password private. We'll set that up next. Create a vault file called secret.yml in the same directory with your inventory by running:

ansible-vault create secret.yml

Assign a strong password to the vault file when prompted, then put the following contents inside it when the editor pops up: [note- the default vault editor is vim- ensure it's installed, or preface the command with EDITOR=(your editor of choice here) to use a different one]:

win_initial_password: myTempPassword123!

Save and exit the editor to encrypt the vault file.

Next, we'll create a template of the User Data script we used in Part 1, so that the initial instance password can be set dynamically. Create a file called userdata.txt.j2 with the following content:

<powershell>
$admin = [adsi]("WinNT://./administrator, user")
$admin.PSBase.Invoke("SetPassword", "{{ win_initial_password }}")
Invoke-Expression ((New-Object System.Net.Webclient).DownloadString('https://raw.githubusercontent.com/ansible/ansible/devel/examples/scripts/ConfigureRemotingForAnsible.ps1'))
</powershell>

Note that we've replaced the hardcoded password from Part 1 with the variable win_initial_password (that's being set in our vault file).

Finally, we'll create the playbook that will set up our Windows machine. Create a file called win-aws.yml; we'll build our playbook inside.

Since our first play will be talking only to AWS (from our control machine), it only needs to target localhost, and we don't need to gather facts, so we can shut that off. We'll set a play-level var for the AWS region, and load the passwords from secret.yml. The first task looks up an Amazon-owned AMI named for the OS we want to run. The version number changes frequently, and old images are often retired, so we'll wildcard that part of the name, and sort descending so that the first image in the list should be the newest. Thankfully, Amazon pads these version numbers to two digits, so an ASCII sort works here. We want the module to fail if no images are found. Last, we'll register the output from the module to a var named found_amis, so we can refer to it later. Place the following content in win-aws.yml:

- name: infrastructure setup
  hosts: localhost
  gather_facts: no
  vars:
    target_aws_region: us-west-2
  vars_files:
    - secret.yml
  tasks:
  - name: find current Windows AMI in this region
    ec2_ami_find:
      region: "{{ target_aws_region }}"
      platform: windows
      virtualization_type: hvm
      owner: amazon
      name: Windows_Server-2012-R2_RTM-English-64Bit-Base-*
      no_result_action: fail
      sort: name
      sort_order: descending
    register: found_amis

Next, we'll take the first found AMI result and set its ami_id value into a var called win_ami_id:

  - set_fact:
      win_ami_id: "{{ (found_amis.results | first).ami_id  }}"

Before we can fire up our instance, we'll need to ensure that there's a security group we can use to access it (in the default VPC, in this case). The group allows inbound access on port 80 for the web app we'll set up later, port 5986 for WinRM over https, and port 3389 for RDP in case we need to log in and poke around interactively. Again, we'll register the output to a var called sg_out so we can get its ID:

  - name: ensure security group is present
    ec2_group:
      name: WinRM RDP
      description: Inbound WinRM and RDP
      region: "{{ target_aws_region }}"
      rules:
      - proto: tcp
        from_port: 80
        to_port: 80
        cidr_ip: 0.0.0.0/0
      - proto: tcp
        from_port: 5986
        to_port: 5986
        cidr_ip: 0.0.0.0/0
      - proto: tcp
        from_port: 3389
        to_port: 3389
        cidr_ip: 0.0.0.0/0
      rules_egress:
      - proto: -1
        cidr_ip: 0.0.0.0/0
    register: sg_out

Now that we know the image and security group IDs, we have everything we need to ensure that we have an instance in the default VPC:

  - name: ensure instances are running
    ec2:
      region: "{{ target_aws_region }}"
      image: "{{ win_ami_id }}"
      instance_type: t2.micro
      group_id: [ "{{ sg_out.group_id }}" ]
      wait: yes
      wait_timeout: 500
      exact_count: 1
      count_tag:
        Name: stock-win-ami-test
      instance_tags:
        Name: stock-win-ami-test
      user_data: "{{ lookup('template', 'userdata.txt.j2') }}"
    register: ec2_result

We're just passing through the target_aws_region var we set earlier, as well as the win_ami_id we looked up. From the sg_out variable that contains the output from the security group module, we pull out just the group_id value and pass that as the instance's security group. For our sample, we just want one instance to exist, so we ask for an exact_count of 1, which is enforced by the count_tag arg finding instances with the Name tag set to "stock-win-ami-test". Finally, we use an inline template render to substitute the password into our User Data script template and pass it directly to the user_data arg; that will cause our instance to set up WinRM and reset the admin password on initial bootup. Once again, we register the output to the ec2_result var, as we'll need it later to add the EC2 hosts to inventory. Once this task has run, we need some way to ensure that the instances have booted, and that WinRM is answering (which can take some time). The easiest way is to use the wait_for action, against the WinRM port:

  - name: wait for WinRM to answer on all hosts
    wait_for:
      port: 5986
      host: "{{ item.public_ip }}"
      timeout: 300
    with_items: ec2_result.tagged_instances

This task will return immediately if the instance is already answering on the WinRM port, and if not, poll it for up to 300 seconds before giving up and failing. Our next step will consume the output from the ec2 task to add the host to our inventory dynamically:

  - name: add hosts to groups
    add_host:
      name: win-temp-{{ item.id }}
      ansible_ssh_host: "{{ item.public_ip }}"
      groups: win
    with_items: ec2_result.tagged_instances

This task loops over all the instances that matched the tags we passed (whether they were created or pre-existing) and adds them to our in-memory inventory, placing them in the win group (that we defined statically in the inventory earlier). This allows us to use the group_vars we set on the win group with all the WinRM connection details, so the only values we have to supply are the host's name and it's IP address (via ansible_ssh_host, so WinRM knows how to reach it). Once this task completes, we have fully-functional Windows instances that we can immediately target in another play in the same playbook (for instance, to do common configuration tasks, like resetting the password), or we could use a separate playbook run later against an ec2 dynamic inventory that targets these hosts. Let's do the former; we'll install IIS and configure up a simple Hello World web app. First, let's create a web page that we'll copy over. Create a file called default.aspx with the following content:

Hello from <%= Environment.MachineName %> at <%= DateTime.UtcNow %>

Next, add the following play to the end of the playbook we've been working with:

- name: web app setup
  hosts: win
  vars_files: [ "secret.yml" ]
  tasks:
  - name: ensure IIS and ASP.NET are installed
    win_feature:
      name: AS-Web-Support

  - name: ensure application dir exists
    win_file:
      path: c:\inetpub\foo
      state: directory

  - name: ensure default.aspx is present
    win_copy:
      src: default.aspx
      dest: c:\inetpub\foo\default.aspx

  - name: ensure that the foo web application exists
    win_iis_webapplication:
      name: foo
      physical_path: c:\inetpub\foo
      site: Default Web Site

  - name: ensure that application responds properly
    uri:
      url: http://{{ ansible_ssh_host}}/foo
      return_content: yes
    register: uri_out
    delegate_to: localhost
    until: uri_out.content | search("Hello from")
    retries: 3

  - debug:
      msg: web application is available at http://{{ ansible_ssh_host}}/foo

This play targets the win group with the dynamic hosts we just added to it. We pull in our secrets file again (as the inventory will always need the password value inside). The play ensures that IIS and ASP.NET are installed with the win_feature module, creates a directory for the web application with win_file, copies the application content into that directory with win_copy, and ensures that the web application is created in IIS. Finally, we delegate a uri task to the local Ansible runner, and have it make up to 3 requests to the foo application, looking for the content that should be there.

At this point, we've got a complete playbook that will idempotently stand up a Windows machine in AWS with a stock AMI, then configure and test a simple web application. To run it, just tell ansible-playbook where to get its inventory, what to run, and that you'll need to specify a vault password, like:

ansible-playbook -i hosts win-aws.yml --ask-vault-pass

After supplying your vault password, the playbook should run to completion, at which point you should be able to access the web application via http://(your AWS host IP)/foo/.

We've shown that it's pretty easy to use Ansible to provision Windows instances in AWS without needing custom AMIs. These techniques can be expanded to set up and deploy most any application with Ansible's growing Windows support. Give it a try for your code today! Happy automating...


Manage stock Windows AMIs with Ansible (part 1)

Ever wished you could just spin up a stock Windows AMI and manage it with Ansible directly? Linux AMIs usually have SSH enabled and private key support configured at first boot, but most stock Windows images don't have WinRM configured, and the administrator passwords are randomly assigned and only accessible via APIs several minutes post-boot. People go to some pretty awful lengths to get plug-and-play Windows instances working with Ansible under AWS, but the most common solution seems to be building a derivative AMI from an instance with WinRM pre-configured and a hard-coded Administrator password. This isn't too hard to do once, but between Amazon's frequent base AMI updates, and the need to repeat the process in multiple regions, it can quickly turn into an ongoing hassle.

Enter User Data. If you're not familiar with it, you're not alone. It's a somewhat obscure option buried in the Advanced area of the AWS instance launch UI. It can be used for many different purposes; much of the AWS documentation treats it as a mega-tag that can hold up to 16k of arbitrary data, accessible only from inside the instance. Less well-known is that scripts embedded in User Data will be executed by the EC2 Config Windows service near the end of the first boot. This allows a small degree of first-boot customization on a vanilla instance, including setting up WinRM and changing the administrator password; once those two items are completed, the instance is manageable with Ansible immediately!

We'll build up the files throughout the post, but a gist with complete file content is available at https://gist.github.com/nitzmahone/4271319ab8e7acf3330c.

Scripts can be embedded in User Data by wrapping them in <powershell> or <script> tags for Windows batch scripts- in this case, we'll stick to Powershell. The following User Data script will set the local Administrator password to a known value, then download and run a script hosted in Ansible's GitHub repo to auto-configure WinRM:


<powershell>
$admin = [adsi]("WinNT://./administrator, user")
$admin.PSBase.Invoke("SetPassword", "myTempPassword123!")

Invoke-Expression ((New-Object System.Net.Webclient).DownloadString('https://raw.githubusercontent.com/ansible/ansible/devel/examples/scripts/ConfigureRemotingForAnsible.ps1'))
</powershell>


A word of caution: User Data is accessible via http from inside the instance without any authentication. While the following technique will get your instances quickly accessible from Ansible, DO NOT use a sensitive password (eg, your master domain admin password), as it will be visible as long as the User Data exists, and User Data requires an instance stop/start cycle to modify. Anyone/anything inside your instance that can make an http request to an arbitrary host can see the password you set with this technique. A good practice is to have one of your first Ansible tasks against your new instance change the password to a different value. Another thing to keep in mind is that the default Windows password policy is usually enabled, so the passwords you choose need to satisfy its complexity requirements.

Before we get to the Holy Grail of actually using Ansible to spin up Windows instances using this technique, let's just try it manually from the AWS Console first. Click Launch Instance, and select a Windows image, then under Configure Instance Details, expand Advanced Details at the bottom to see the User Data textbox.



Paste the script above into the textbox, then click through to Configure Security Group, and ensure that TCP ports 3389 and 5986 are open for all IPs. Continue to Review and Launch, select your private key (which doesn't make any difference now, since you know the admin password), and wait for the instance to launch. If all's well, after the instance has booted you should be able to reach RDP on port 3389, and WinRM on port 5986 with Ansible (both protocols using the Administrator password set by the script). It can often take several minutes for Windows instances set up this way to begin responding, so be patient!

Let's test this using the win_ping module with a dirt simple inventory. Create a file called hosts with the following contents:

aws-win-host ansible_ssh_host=(your aws host public IP here)

[win]
aws-win-host

[win:vars]
ansible_connection=winrm
ansible_ssh_port=5986
ansible_ssh_user=Administrator
ansible_ssh_pass=myTempPassword123!

then run the win_ping module using Ansible, referencing this inventory file:

ansible win -i hosts -m win_ping

If all's well, you should see the ping response, and your AWS Windows host is fully manageable by Ansible without using a custom AMI!


In part 2, we'll show an end-to-end example of using Ansible to provision Windows AWS instances.