Troubleshooting User Data scripts when creating AWS EC2 instances

When an AWS EC2 User Data script fails, you’ll see something like this in /var/log/cloud-init.log in your instance:

2018-02-03 06:08:16,536 - util.py[DEBUG]: Failed running /var/lib/cloud/instance/scripts/part-001 [127]

Traceback (most recent call last):

  File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 806, in runparts

    subp(prefix + [exe_path], capture=False)

  File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 1847, in subp

    cmd=args)

cloudinit.util.ProcessExecutionError: Unexpected error while running command.

Command: ['/var/lib/cloud/instance/scripts/part-001']

Exit code: 127

Reason: -

Stdout: -

Stderr: -

2018-02-03 06:08:16,541 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)

2018-02-03 06:08:16,541 - handlers.py[DEBUG]: finish: modules-final/config-scripts-user: FAIL: running config-scripts-user with frequency once-per-instance

It tells you something failed, but not what. The trouble seems that output from your user data script does not go to the cloud-init.log by default.

One of the answers in this post suggests to pipe your script commands and output to logger into a separate log file like this:

set -x
exec > >(tee /var/log/user-data.log|logger -t user-data ) 2>&1
echo BEGIN
date '+%Y-%m-%d %H:%M:%S'

Now running my script with a ‘apt-get update -y’ looks like:

+ echo BEGIN
BEGIN
+ date '+%Y-%m-%d %H:%M:%S'
2018-02-03 23:37:55
+ apt-get update -y
... output continues here

And further down, here’s my specific error I was looking for:

+ java -Xmx1024M -Xms1024M -jar minecraft_server.1.12.2.jar nogui

/var/lib/cloud/instance/scripts/part-001: line 11: java: command not found

My EC2 running the Ubuntu AMI does not have Java installed by default, so I need to install it with (adding to my User Data script):

apt-get install openjdk-8-jre-headless -y

… and now my script runs as expected.

 

AWS EC2 Pricing (Feb 2018) – per second vs hourly?

EC2 pricing at first look seems simple: for on-demand pricing the current prices are listed here, as fractions of a dollar per hour, Here’s a quick look (below) at how these prices are listed for the first few t2 instance types. The list of different instance types is pretty extensive (check out the link for the complete list):

The page says the price is:

by the hour or second (minimum of 60 seconds) with no long-term commitments

and further down:

Pricing is per instance-hour consumed for each instance, from the time an instance is launched until it is terminated or stopped. Each partial instance-hour consumed will be billed as a full hour or per-second depending on which Amazon EC2 instances you run

I added the emphasis as this part is not entirely clear. So to summarize:

  • charged by the second, minimum 60 seconds (this was a recent change, introduced in October 2017, see here)

or

  • charged as a full hour depending on which Amazon EC2 instances you run

The last part of this last statement is (I think) the key. It refers to which AMI image you’re running, since some images have an hourly charge (if I’m wrong, please leave me a comment and let me know!) :

If you click the Marketplace link, you’ll notice AMI images with commercial products, along with an hourly charge for usage. Ahah, there you go!

Creating AWS EC2 Spot Instances with a Launch Template

With EC2, you have a huge variety of instance types to chose from, and with each type having a range of sizes small to large:

  • General Purpose
  • Compute Optimized
  • GPU instances
  • FPGA instances
  • Memory Optimized
  • Storage Optimized

For each of the types you have a further range of options for how you chose to provision an instance, which has an impact on how the instance is priced:

  • On-demand: requested and provisiond as you need them
  • Spot Instances: spare capacity instances auctioned at a lower price than On-Demand, but may not always be available. You request a price, if an instance is available at that price it is provisioned, otherwise you wait until available (see current pricing here)
  • Reserved Instances
  • Dedicated Hosts
  • Dedicated Instances

I’ve never created a Spot Instance before, and I’m curious what the steps are. As with every service on AWS, there’s more than one approach, and I’ve going to look at using a Launch Template:

Bu creating a Launch Template you can configure a number of settings for you instance (AMI image, EC2 instance type, etc).

From the Request Spot Instance page in the EC2 Management Console, you can now use your Launch Template which prepopulates most of the settings for your requested Spot Instance:

Further down on this page is where you request your pricing – it defaults to buy at cheapest available, and is capped at the current On-Demand price (if the spot price rises to match the On-Demand price then there’s no savings from the Spot pricing and you might as well get On-Demand instead):

After submitted, it shows the request in a submitted state:

So here’s my first error:

Repeated errors have occured processing the launch specification “t2.small, ami-41e0b93b, Linux/UNIX (Amazon VPC), us-east-1d”. It will not be retried for at least 13 minutes. Error message: com.amazonaws.services.ec2.model.AmazonEC2Exception: Network interfaces and an instance-level subnet ID may not be specified on the same request (Service: AmazonEC2; Status Code: 400; Error Code: InvalidParameterCombination)

Network instances and a instance-level subnet – I did add an interface because I wanted to select the public ip option. Let’s create a new version of the template and try again.

Now my request is in ‘pending fulfillment’ status:

Same error. One of the answers here suggests this is because I don’t have a Security Group. Ok, let’s add a Security Group to the launch template and try again.

Same error even though I added a Security Group to my template, but I noticed this is the Request Spot Instance options – when you select your Template, if you’ve made version updates to your template, make sure you select the latest version as it defaults to 1, i.e. I was restarting with the original template that I know doesn’t work:

Next error:

com.amazonaws.services.ec2.model.AmazonEC2Exception: The security group ‘sg-825bfe14’ does not exist in VPC ‘vpc-058f5d7c’ (Service: AmazonEC2; Status Code: 400; Error Code: InvalidGroup.NotFound)

Hmm. So I’m interpreting this as my Security Group is not for the default VPC that my instance was assigned, so let’s create a new VPC, and then a new Security Group for this VPC:

Now create a new Template version with this new VPC and SG.

Next error:

Error message: com.amazonaws.services.ec2.model.AmazonEC2Exception: Security group sg-b756adc0 and subnet subnet-f756c7bf belong to different networks. (Service: AmazonEC2; Status Code: 400; Error Code: InvalidParameter)

SG and subnet belong to different networks. Ok, getting close. Let’s take a look.

On the VPC for my SG I have: 10.0.0.0/16

On my subnet for us-east-1d I have: 10.0.0.0/24

Ah, ok. let’s add a new subnet for us-east-1d with the same CIDR block and try again.

When creating your spot request, make sure you select your VPC and subnet to match:

Ahah! Now we’re looking good and my Spot Instance is being provisioned:

Ugh, next error:

Looks like in my template I didn’t give a device name (‘missing device name’) for my EBS volume, e.g. /dev/sdb. New template version, trying again.

Next error:

Error message: com.amazonaws.services.ec2.model.AmazonEC2Exception: The parameter iops is not supported for gp2 volumes. (Service: AmazonEC2; Status Code: 400; Error Code: InvalidParameterCombination)

Geesh. Ok, removing the iops value in the template and trying again (would help to have some validation on the template form)

And now:

 

Fulfilled, we made it, a Spot Instance provisioned!

At this point though my instance was started without a public IP, so now I’ve got the Security Group and Subnet issue sorted, I’ll go back to the template and add a network interface and select ‘assign public IP’. Rather than assigning this on the network interface though, it looks like it’s also an option from the subnet config, so I edited and added it here:

And now we’re up, with a public IP! Whether my User Data init script actually did what it was supposed to is the next thing to check, but I’ll look at that next!

Installing and using s3cmd to copy files to AWS S3

s3cmd is a useful tool that lets you list put and get objects from an AWS S3 bucket. To install:

Install python2.7 with :

sudo apt-get install python2.7

Install setup tools with :

sudo apt-get install python-setuptools

Download and unzip the .zip distro from link here: http://s3tools.org/download

Install with:

sudo python2.7 setup.py install

To see options, run:

s3cmd --help

Before running the s3cmd setup, you need to create an AWS IAM user with programmatic access, to get a access key that will be used by the s3cmd.

First, create a new user from the Management Console, and ensure ‘Programmatic Access’ is checked:

Create a new IAM Policy and attach to this user with read, write and list actions, and restrict the resource to the ARN for this S3 bucket that you want to use the s3cmd with:

If you want to narrow down the permissions to a minimal list, a policy list like this is the minimum needed for s3cmd to work (based on answers to this question on SO):

{
 "Version": "2012-10-17",
 "Statement": [
   {
     "Sid": "Stmt123456",
     "Effect": "Allow",
     "Action": [
       "s3:ListAllMyBuckets"
     ],
     "Resource": [
     "arn:aws:s3:::*"
     ]
   },
   {
     "Sid": "VisualEditor0",
     "Effect": "Allow",
     "Action": [
       "s3:ListBucket",
       "s3:PutObject",
       "s3:PutObjectAcl"
     ],
     "Resource": [
       "arn:aws:s3:::bucketname",
       "arn:aws:s3:::bucketname/*"
     ]
   }
 ]
}

Following how-to guide here, for first time setup, run:

s3cmd --configure

and provide your IAM user api access key and secret key and other values as prompted. After configuring, when prompted to test the config, the util will attempt to list all buckets, but if the policy you created was for limited read/write on a specific bucket, this will fail, but that’s ok.

To confirm access to your bucket, try:

s3cmd ls s3://bucketname

and to put a file:

s3cmd put filename s3:/bucketname