Creating a new AWS Lambda using the AWS CLI

It’s pretty easy to set up and configure a new AWS Lambda with the AWS Console, but if you’re iterating on some changes and need to redeploy a few times, the AWS CLI makes it pretty easy.

To create a new Lambda, assuming you have a .zip packaged up and ready to go:

aws lambda create-function --function-name example-lambda --zip-file fileb://example-lambda.zip --handler index.handler --runtime nodejs8.10 --role arn:aws:iam::role-id-here:role/lambda-role

Building a React frontend for my AWS Lambda Sudoku solver

Over the past few months I built an implementation of Donald Knuth’s Algorithm X using Dancing Links in Java to solve Sudoku puzzles.

This was a fascinating exercise in itself (you can read more my experience here), but the next logical step would be to package it up in a way to share it online.

Since I’m pursuing my AWS certifications right now, one interesting and low cost approach to host the the Solver implementation is to package it as an AWS Lambda. Sudoku Solver as a Service? Done. I exposed it through AWS API Gateway. It accepts an request payload that looks like this:

{"rows":["...81.67.","..749.2.8",".6..5.1.4","1....39..","4...8...7","..69....3","9.2.3..6.","6.1.743..",".34.69..."]}

and returns a response with a solution to the submitted puzzle request like this:

{"rows":["349812675","517496238","268357194","185723946","493681527","726945813","972538461","651274389","834169752"]}

The request and response payloads are an array of Strings, where each item represents a String of values concatenated together for one row in the grid, with ‘.’s for unknowns.

I’m still learning React as I go, and while building this front end for my Lambda Sudoku Solver I learnt some interesting things about React and Javascript. The source for the app is shared here.

I used Flux to structure the app, so there main parts of the app are:

  • a main, highlevel Container component,
  • a CellComponent that renders each cell in the Sudoku grid,
  • an Action that handles the interaction with the AWS Lambda
  • a Store that holds the results from calling the Lambda

I don’t want to focus on the pros and cons of using React or Flux (and this is not intended to be a how-to on building an app using React) as there were some other specific issues I ran into that were interesting learning opportunities. A couple of these I already captured in separate posts, so I’ll include these links below.

Iteration 1: onChange handler per row

My first approach to maintaining the state for the display of the grid and the handler for changes to each cell was to keep it simple and have a seperate array of values per row, and a separate onChange handler for each row. This is not a particularly effective way to structure this as there’s duplication in each of the 9 handlers.

The State looked like this:

this.state =
{
row1 : [],
row2 : [],
row3 : [],
row4 : [],
row5 : [],
row6 : [],
row7 : [],
row8 : [],
row9 : []
};

And each of the handlers looked like this, one handler per row, so handleChangeRow1() through handleChangeRow9():

handleChangeRow1(index, event){
console.log("row 1 update: " + event.target.value);
var updatedRow = [...this.state.row1];
updatedRow[index] = event.target.value
this.setState( { row1 : updatedRow } );
}

This approach needed 9 versions of the function above, each one specifically handling updates to the state for a single row. We’ll come back to improving this later.

The interesting thing to notice at this point that to update an array in React state, you need to clone a copy of the array, and then update the copy. I used the spread operator ‘…’ to clone the array.

Each row in the grid I rendered separately like this (so this approach needed 9 of these blocks):

<div>
{
this.state.row1.map( (cell, index) => (
<CellComponent key={index} value={this.state.row1[index]}
onChange={this.handleChangeRow1.bind(this, index)}/>
)
)}
</div>

This was my first working version of the app, at least at the point where I could track the State of the grid as a user entered or changed values in the 9×9 grid. Next steps was to improve the approach.

Iteration 2: Using an array of arrays for the State

The first improvement was to improve the State arrays, moving to an array of arrays. This is easily setup like this:

this.state =
{
grid: []
};

for (var row = 0; row < 9; row++) {
this.state.grid[row] = [];
}

Iteration 3: One onChange handler for all rows

Instead of a handler per row, I parameterized the onChange handler to reused for all rows. This is what I ended up with:

handleGridChange(row, colIndex, event) {
console.log("row [" + row + "] col [" + colIndex + "] : " + event.target.value);
var updatedGrid = [...this.state.grid];
updatedGrid[row][colIndex] = event.target.value;

//call Action to send updated data to Store
SudokuSolverAction.updatePuzzleData(updatedGrid);
}

Using .map() on each of the rows in State, I then rendered each row of the grid like this, passing the current row index and column index as params into handleGridChange():

<tr>
{
this.state.grid[0].map((cell, colIndex) => (
<td key={"row0" + colIndex}>
<CellComponent value={this.state.grid[0][colIndex]}
onChange={this.handleGridChange.bind(this, 0, colIndex)}/>
</td>
)
)}
</tr>

I’m sure there’s a way to use a nested .map() of the results of a .map() or some other clever approach to render the whole grid in a single go, but rendering each of rows individual is an ok approach with me since there’s only 9 rows. If the number of rows were much more than 9 then I’d spend some time working on a better approach, but I’m ok with this for now.

Flux Action and Store

The Action to call the Lambda, and maintaining the state of the responses in the Store was pretty simple. You can check out the source here if you’re interested.

CSS styling for the grid

One last thing to do was to style the grid so it looks like a usual Sudoku grid, with vertical and horizontal lines at 3 and 6, to divide the grid in 3×3 of the 3×3 squares. This took some reading to find out how to easily do this, but turns out CSS nth-child() psuedoclass handles this perfectly. I covered this in this post here.

Take a look at the app

I might move this to a more permanent home later, but if you want to check out the app, you can take a look here.

AWS Lambdas: “Process exited before completing request”

While testing a React frontend for my SudokuSolver Lambda, I kept getting this error when calling the Lambda using superagent from React:

RequestId: 35934232-xxx Process exited before completing request

Testing from Postman it completed as expected.

This error message means what it says, the Lambda quit before it completed executing.

There are 2 possible paths through my SudokuSolver:

  1. The input puzzle has a single, unique solution
  2. The input has more than one possible solution

If there is more than one solution, the Solver finds the first solution and then exits. Yes, it does a System.exit(). There’s the cause of my problem. I was testing from Postman with a puzzle with a single solution, but the test from my React app only had a couple of values in the grid.

Lessons learned:

  • read and understand what the error message means. Once you understand what it’s telling you, ask how and where this applies to your code
  • when changing variable aspects of your test, don’t change too many at one time. If possible only make one change, so if something is unexpected you’ll know it’s as a result of that change (in my case I changed my test data from Postman to the React app and so wasn’t comparing the results with the same inputs. The issue was unrelated to React or superagent, it was completely related to my test data)

Building PyTorch from source for a smaller (<50MB) AWS Lambda deployment package

I’ve been trying to deploy a Python based AWS Lambda that’s using PyTorch. The problem I’ve run into is the size of the deployment package with PyTorch and it’s platform specific dependencies is far beyond the maximum size of a deployable zip that you can deploy as an AWS Lambda. Per the AWS Lambda Limits page, the maximum deployable zip is 50MB (and unzipped it needs to be less than 250MB).

I found this article which suggested to build PyTorch from source in an Amazon AMI EC2, using build options to reduce the build size. I followed all steps up to but not including line 65 as I don’t need torchvision.

If you’re looking for the tl:dr summary, here’s the keypoints:

  • yes, this approach works! (although it took many hours spread over a few weeks to get to this point!)
  • the specific Amazon AMI you need is the one that’s currently in use for running AWS Lambdas (this will obviously change at some point but as of 9/3/18 this AMI works) : amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2 (ami-aa5ebdd2)
  • a t2.micro EC2 instance does not have enough RAM to successfully build PyTorch. I used a t2.medium with 4GB RAM.
  • you can’t take a trained model .pt file generated from a different version of PyTorch/torch and use it to generate text using a different version. The PyTorch version for training and generating output must be identical

Ok, beyond the tl;dr summary above, here’s my experience following the steps in this article.

At line 63: 

python setup.py install

I got this error:

Could not find /home/ec2-user/pytorch/torch/lib/gloo/CMakeLists.txt
Did you run 'git submodule update --init'?

I ran the suggested ‘git submodule update’ and then re-ran the setup.py script and now it ran for a while but ended with error:

gcc: error trying to exec 'cc1plus': execvp: No such file or directory
error: command 'gcc' failed with exit status 1

I spent a bunch of time trying to work out what was going on here, but I decided to take a different direction and skip building Python 3.6 from source, and try recreating these steps using Python 2.7 that is preinstalled in the Amazon Linux 2 AMI. The only parts that are slightly different is pip is no preinstalled, so I installed it with:

sudo yum install python-pip
sudo yum install python-wheel

The then virtualenv with:

sudo pip install virtualenv

I think point I pick up the steps from creating the virtualenv:

virtualenv ~/shrink_venv

After the step to build pytorch, now I’ve got (another) different error:

as: out of memory allocating 4064 bytes after a total of 45686784 bytes
{standard input}: Assembler messages:
{standard input}:934524: Fatal error: can't close build/temp.linux-x86_64-2.7/torch/csrc/jit/python_ir.o: Memory exhausted
torch/csrc/jit/python_ir.cpp:215:2: fatal error: error writing to -: Broken pipe

Ugh, I’m running in a t2.micro that only has 1GB ram. Let’s stop the instance, change the instance type to a t2.medium with 4GB and let’s try building again.

Running free before:

$ free
total        used        free      shared  buff/cache   available
Mem:        1009384       40468      870556         288       98360      840700
Swap:             0           0           0

And now after resizing:

$ free
total        used        free      shared  buff/cache   available
Mem:        4040024       55004     3825940         292      159080     3780552
Swap:             0           0           0

Ok, trying again, but since we’ve rebooted the instance, remembering to set the flags to minimize the build options which was the whole reason we were doing this:

$ export NO_CUDA=1
$ export NO_CUDNN=1

Next error:

error: could not create '/usr/lib64/python2.7/site-packages/torch': Permission denied

Ok, let’s run the build with sudo instead then. That fixes that.

Now I’m at a point where I can actually run the generate.py script but now I’ve got a completely different error:

/home/ec2-user/shrinkenv/lib/python2.7/site-packages/torch/serialization.py:316: SourceChangeWarning: source code of class 'torch.nn.modules.sparse.Embedding' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
Traceback (most recent call last):
  File "generate.py", line 54, in <module>
    decoder = torch.load(args.filename)
  File "/home/ec2-user/shrinkenv/lib/python2.7/site-packages/torch/serialization.py", line 261, in load
    return _load(f, map_location, pickle_module)
  File "/home/ec2-user/shrinkenv/lib/python2.7/site-packages/torch/serialization.py", line 409, in _load
    result = unpickler.load()
AttributeError: 'module' object has no attribute '_rebuild_tensor_v2'

Searching for the last part of this error found this post, which implies my trained model .pt file is from a different torch/pytorch version … which it most likely is as I trained using a version installed with pip, and now I’m trying to generate with a version built from source.

Rather than spend more time on this (some articles suggested you can read the .pt model from one pytorch version and convert it, but this doesn’t seem like a trivial activity and requires writing some code to do the conversion), so I’m going to train a new model with the same version I just built from source.

Now that’s successfully done, I have my Lambda handler script ready to go, and ready to package up, so back to the final steps from the article to zip up everything built and installed so far in my virtualenv:

cd $VIRTUAL_ENV/lib/python2.7/site-packages
zip -r ~/kevinhookebot-ml-lambda-generate-py.zip *

We’re at 57MB, so looking ok so far (although larger than 50MB?). Now add char-rnn.pytorch, my generated model and Lambda handler into the same zip, and we’re now at 58M so well within the 250MB limit for a Lambda package deployed via S3.

Let’s upload and deploy. Test calling the Lambda, and now we get:

Unable to import module 'generatelambda': /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /var/task/torch/lib/libshm.so)

Searching for this error I found this post which has a link to this page which lists a specific AMI version to be used when compiling dependencies for a Lambda deployment (amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2). Picking the default Amazon Linux 2 AMI is probably not this same image (and I already tried the Amazon Linux AMI 2018.03.0 and ran into other issues on that one) so looks like I need to start over (but getting close now, surely!)

Ok, new EC2 t2-medium instance with the exactly the same AMI image as mentioned above. Retraced my steps and now I feel I’m almost back at the same error as before:

error: command 'gcc' failed with exit status 1
gcc: error trying to exec 'cc1plus': execvp: No such file or directory

Searching some more for this I found this post with a solution to change the PATH to point exactly where cc1plus is installed. Instead of 4.8.3 in this AMI though it seems I have 4.8.5, so here’s the settings I used:

$ export COMPILER_PATH="/usr/libexec/gcc/x86_64-amazon-linux/4.8.5/:$COMPILER_PATH";
export C_INCLUDE_PATH="/usr/lib/gcc/x86_64-amazon-linux/4.8.5/include/:$C_INCLUDE_PATH";

And then I noticed in the post they hadn’t included either of these in setting the new PATH which seems like an oversight (I don’t think these will make any difference if they are not in the PATH), so I set my path like this, including COMPILER_PATH first:

export PATH="$COMPILER_PATH:/sbin:/bin:/usr/sbin:/usr/bin:/opt/aws/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/aws/bin:$PATH";

Now cc1plus is in my path, let’s try building pytorch again! In short, that worked.

Going back to the packaging steps, let’s zip everything up and push the zip to S3 ready to deploy.

I zipped up:

  • char-rnn.pytorch : from github, including my own runner script and lambda handler script
  • modules installed into virtualenv lib: ~/shrinkenv/lib/python2.7/site-packages/* (tqdm)
  • modules installed into virtualenv lib64: ~/shrinkenv/lib64/python2.7/site-packages/* (torch, numpy and unidecode)

At this point I used ‘aws s3 cp’ to copy the zip to s3, and configured my Lambda from the zip in s3. Set up a test to call my handler, and success!