Building bots on Twitter with AWS Lambdas (and other stuff)

I’ve built a few different bots on Twitter and written several articles describing how I built them. Some of these were a few months back – once they’re up and running it’s easy to forget they’re up and running (thanks to the free tier on AWS Lambda which means you can run scheduled Tweets well within the free tier limits). This is a summary of the bots I’ve developed so far.

Looking at where I got started, my first bot was to build an integration between Amateur Radio 2m Packet, retweeting packets received locally to Twitter. This was my first experience working with the Twitter REST apis and the OAUTH authentication, so I lot of what I learned here I reapplied to the following bots too:

For my next project, I was inspired by articles by researcher Janelle Shane who has been training ML models to produce some hilarious results, such as weird recipes, college course names and many others. I was curious what content a ML model would generate if I extracted all of my past 4000+ Tweets from Twitter and trained a model with the content. I had many questions, such as would the content be similar in style, and is 4000 Tweets enough text to train a model? You can follow my progress in these posts:

This then led to repeating the experiment with over 10 years of my blog articles and posts collected here, which you can follow in these posts:

Next, what would it take to train my model in the cloud using AWS Sagemaker, and run using AWS Lambdas?

You can follow this bot on Twitter here: @kevinhookebot

I had fun developing @kevinhookebot – it evolved over time to support a few features, not just to retweet content from the trained ML model. Additional features added:

  • an additional Lambda that consumes the Twitter API ‘mentions’ timeline and replies with one of a number of canned responses (not generated, they’re just hard coded phrases). If you reply to any of it’s tweets or Tweet @ the bot it will reply to you every 5 minutes when it sees a new tweet in the mentions timeline
  • another Lambda that responds to @ mentions to the bot as if it is a text-base adventure game. Tweet ‘@kevinhookebot go north’ (or east/west/south) and the bot will respond with some generated text in the style of an adventure game. There’s no actual game to play and it doesn’t track your state, but each response is generated using @GalaxyKate ‘s Tracery library to generate the text using a simple grammar that defines the structure of each reply.

After having fun with the adventure text reply generator, I also used the Tracey library for another AWS Lambda bot that generates product/project names and tweets every 6 hours. I think it’s rather amusing, you can check it out here: @ProductNameBot :¬†

@ProductNameBot

My most recent creation I upped the ante slightly and wondered what it would take to develop a Twitter bot that playeda card game. This introduced some interesting problems that I hadn’t thought about yet, like how to track the game state for each player. I captured the development in these posts here:

I have some other ideas for something I might put together soon. Stay posted for more details ūüôā

MacOS Mojave spaces gestures stopped working after taking screenshot with Grab

I just took a screenshot with the updated Grab app in MacOS Mojave and my 4-finger swipe to switch Spaces desktops stopped working. I found this post which talks about something similar with multitouch gestures which sounds like it’s a randomly occurring issue with multitouch gestures. A reboot fixed it for me, but it was surprising how much I rely on gestures to switch desktops/apps, and when it’s not working how to you switch?! (Ctrl-Left/Right is also works as a keyboard shortcut).

Building Redis from source on Ubuntu server 18.04

After downloading redis source and attempting to make on Ubuntu server 28.04, it looks like I’ve got some dependencies missing:

kev@ubuntu18-redis1:~/redis/redis-4.0.11$ make
cd src && make all
make[1]: Entering directory '/home/kev/redis/redis-4.0.11/src'
    CC Makefile.dep
    CC adlist.o
In file included from adlist.c:34:0:
zmalloc.h:50:10: fatal error: jemalloc/jemalloc.h: No such file or directory
 #include <jemalloc/jemalloc.h>
          ^~~~~~~~~~~~~~~~~~~~~
compilation terminated.

I found an identical post about this issue, with not building the dependencies. To fix:

cd deps
make hiredis jemalloc linenoise lua geohash-int
cd ..
make
sudo make install

That resulted in a clean install, ready to start redis-server !

AWS VPCs and CIDR blocks (possibly more than you’ve ever wanted to know about IP addresses, networks, subnet masks and all that stuff)

If you’re interested in firing up some AWS EC2 instances, and some point you’re going to need to know about VPCs (Virtual Private Clouds) and CIDR blocks. If you’re studying for the AWS Solution Architect certification this is also an important topic covered in the exam. There’s a whole section in the AWS docs here that covers the topic of CIDR blocks.

Unless you’re an experienced network engineer or someone who works with configuring network topologies, this might be the first time you’ve come across this concept (I had seen the CIDR notation before but didn’t know what it meant). At first it might be easy to memorize a few of the common examples and remember which represents a smaller network and which is larger, but to understand why a /24 network has less available addresses than a /8 network you’ll need to dig a little deeper.

First, let’s look at typical IPv4 IP addresses, networks and subnets:

IPv4 IP addresses have 4 digits separated by ‘.’s. Each digit is 8 bits, and are referred to as ‘octets’.

A typical address on a home network like 192.168.1.16 has a subnet mask of 255.255.255.0. This means the first 3 octets are used to represent the network, in this case 192.168.1.0. Computers with this same IP prefix are therefore part of the same network, so

  • 192.168.1.1
  • 192.168.1.2
  • up to
  • 192.168.1.254

… are all on the same network and assuming your router and each individual computer is setup ok, then each of the computers with these addresses on this same network can also see each other (without any additional routing between networks).

In this range there are also some reserved IPs for special purposes:

  • 192.168.1.0 is referred to as the network itself
  • 192.168.1.255 is a broadcast address for this network
  • this leaves 192.168.1.1 through 192.168.1.254 as usable addresses in the network

Originally networks were divided and categorized by 8 bit boundaries and were referred to as ‘class’ based networks:

Class C network:

  • subnet mask 255.255.255.0
  • first 3 octets are the network
  • the smallest network Class, with 256 available addresses
  • only the 4th octet is available for your host addresses, 0-255, 256 available addresses, (or 1-254 ignoring 0 and 255)
  • 24 bits used for network, 8 for hosts

Class B network:

  • subnet mask 255.255.0.0
  • first 2 octets are the network
  • 3rd and 4th octets available for host addresses
  • 16 bits used for network, 16 bits for hosts

Class A network:

  • subnet mask 255.0.0.0
  • first octet is the network
  • the largest IPv4 network Class
  • 2nd, 3rd and 4th octets available for hosts
  • 8 bits used for network, 24 bits for hosts

Ok, so this summarizes Class based networks which are divided by 8 bit boundaries, of which we only have 3 options, A (largest), B and C (smallest). Now let’s look at Classless networks.

Instead of restricting to 8 bit boundaries, Classless networks can use any of the bits to represent the network and the remaining bits for the hosts. Now let’s first look at the Classless InterDomain Routing (CIDR) notation for the same Class based networks as the first examples:

  • /24 CIDR block is the same as Class C, with subnet mask 255.255.255.0
  • /16 CIDR block is the same as Class B, with 255.255.0.0
  • /8 CIDR block is the same as Class A, with subnet mask 255.0.0.0

You might have noticed that the number in the /xx CIDR block notation is referring to is the number of bits used for the network, and therefore implies the number of bits remaining for host addresses (from the 4 total octets, or 32 bits). This approach is not restricted to the 8 bit boundaries though, any number of bits can be used for combination of network and hosts, so /24, /23, /22 and any value from /32 to /1 are all valid (although /32 with all bits for the network is of little practical use, similarly for /31 as there’s only 1 bit remaining for a host address, but these are used for special purposes, e.g. for a single host route, or point to point network links).

Ok, so now let’s apply this to look at AWS VPCs and CIDR blocks.

For a /24 block, we already looked at x.x.x.0 address to refer to the network and x.x.x.255 for the broadcast address. AWS VPC subnets reserve a further 3 IP addresses (described here) for AWS usage, x.x.x.1, x.x.x.2, and x.x.x.3, so for each subnet there are 5 IP addressed unavailable for your own hosts.

Now we’ve looked at how the address blocks are comprised, it’s easy to calculate how many addresses are available in a VPC for any CIDR block. Taking /24 as an example:

  • 32 – 24 = 8 bits for the host addresses
  • 2^8 = 256
  • 256 – 5 = 251

Knowing how IP addresses are structured, how CIDR blocks define the range of possible IPs for your hosts, the purpose of the .0 and .255 and with the additional 3 AWS reserved IPs, you can now calculate how many IP addresses are available to your VPS for any CDR block.

Building PyTorch from source for a smaller (<50MB) AWS Lambda deployment package

I’ve been trying to deploy a Python based AWS Lambda that’s using PyTorch. The problem I’ve run into is the size of the deployment package with PyTorch and it’s platform specific dependencies is far beyond the maximum size of a deployable zip that you can deploy as an AWS Lambda. Per the AWS Lambda Limits page, the maximum deployable zip is 50MB (and unzipped it needs to be less than 250MB).

I found this article which suggested to build PyTorch from source in an Amazon AMI EC2, using build options to reduce the build size. I followed all steps up to but not including line 65 as I don’t need torchvision.

If you’re looking for the tl:dr summary,¬†here’s the keypoints:

  • yes, this approach works! (although it took many hours spread over a few weeks to get to this point!)
  • the specific Amazon AMI you need is the one that’s currently in use for running AWS Lambdas (this will obviously change at some point but as of 9/3/18 this AMI works) : amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2 (ami-aa5ebdd2)
  • a t2.micro EC2 instance does not have enough RAM to successfully build PyTorch. I used a t2.medium with 4GB RAM.
  • you can’t take a trained model .pt file generated from a different version of PyTorch/torch and use it to generate text using a different version. The PyTorch version for training and generating output must be identical

Ok, beyond the tl;dr summary above, here’s my experience following the steps in this article.

At line 63: 

python setup.py install

I got this error:

Could not find /home/ec2-user/pytorch/torch/lib/gloo/CMakeLists.txt
Did you run 'git submodule update --init'?

I ran the suggested ‘git submodule update’ and then re-ran the setup.py script and now it ran for a while but ended with error:

gcc: error trying to exec 'cc1plus': execvp: No such file or directory
error: command 'gcc' failed with exit status 1

I spent a bunch of time trying to work out what was going on here, but I decided to take a different direction and skip building Python 3.6 from source, and try recreating these steps using Python 2.7 that is preinstalled in the Amazon Linux 2 AMI. The only parts that are slightly different is pip is no preinstalled, so I installed it with:

sudo yum install python-pip
sudo yum install python-wheel

The then virtualenv with:

sudo pip install virtualenv

I think point I pick up the steps from creating the virtualenv:

virtualenv ~/shrink_venv

After the step to build pytorch, now I’ve got (another) different error:

as: out of memory allocating 4064 bytes after a total of 45686784 bytes
{standard input}: Assembler messages:
{standard input}:934524: Fatal error: can't close build/temp.linux-x86_64-2.7/torch/csrc/jit/python_ir.o: Memory exhausted
torch/csrc/jit/python_ir.cpp:215:2: fatal error: error writing to -: Broken pipe

Ugh, I’m running in a t2.micro that only has 1GB ram. Let’s stop the instance, change the instance type to a t2.medium with 4GB and let’s try building again.

Running free before:

$ free
total        used        free      shared  buff/cache   available
Mem:        1009384       40468      870556         288       98360      840700
Swap:             0           0           0

And now after resizing:

$ free
total        used        free      shared  buff/cache   available
Mem:        4040024       55004     3825940         292      159080     3780552
Swap:             0           0           0

Ok, trying again, but since we’ve rebooted the instance, remembering to set the flags to minimize the build options which was the whole reason we were doing this:

$ export NO_CUDA=1
$ export NO_CUDNN=1

Next error:

error: could not create '/usr/lib64/python2.7/site-packages/torch': Permission denied

Ok, let’s run the build with sudo instead then. That fixes that.

Now I’m at a point where I can actually run the generate.py script but now I’ve got a completely different error:

/home/ec2-user/shrinkenv/lib/python2.7/site-packages/torch/serialization.py:316: SourceChangeWarning: source code of class 'torch.nn.modules.sparse.Embedding' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
Traceback (most recent call last):
  File "generate.py", line 54, in <module>
    decoder = torch.load(args.filename)
  File "/home/ec2-user/shrinkenv/lib/python2.7/site-packages/torch/serialization.py", line 261, in load
    return _load(f, map_location, pickle_module)
  File "/home/ec2-user/shrinkenv/lib/python2.7/site-packages/torch/serialization.py", line 409, in _load
    result = unpickler.load()
AttributeError: 'module' object has no attribute '_rebuild_tensor_v2'

Searching for the last part of this error found this post, which implies my trained model .pt file is from a different torch/pytorch version … which it most likely is as I trained using a version installed with pip, and now I’m trying to generate with a version built from source.

Rather than spend more time on this (some articles suggested you can read the .pt model from one pytorch version and convert it, but this doesn’t seem like a trivial activity and requires writing some code to do the conversion), so I’m going to train a new model with the same version I just built from source.

Now that’s successfully done, I have my Lambda handler script ready to go, and ready to package up, so back to the final steps from the article to zip up everything built and installed so far in my virtualenv:

cd $VIRTUAL_ENV/lib/python2.7/site-packages
zip -r ~/kevinhookebot-ml-lambda-generate-py.zip *

We’re at 57MB, so looking ok so far (although larger than 50MB?). Now add char-rnn.pytorch, my generated model and Lambda handler into the same zip, and we’re now at 58M so well within the 250MB limit for a Lambda package deployed via S3.

Let’s upload and deploy. Test calling the Lambda, and now we get:

Unable to import module 'generatelambda': /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /var/task/torch/lib/libshm.so)

Searching for this error I found this post which has a link to this page which lists a specific AMI version to be used when compiling dependencies for a Lambda deployment (amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2). Picking the default Amazon Linux 2 AMI is probably not this same image (and I already tried the Amazon Linux AMI 2018.03.0 and ran into other issues on that one) so looks like I need to start over (but getting close now, surely!)

Ok, new EC2 t2-medium instance with the exactly the same AMI image as mentioned above. Retraced my steps and now I feel I’m almost back at the same error as before:

error: command 'gcc' failed with exit status 1
gcc: error trying to exec 'cc1plus': execvp: No such file or directory

Searching some more for this I found this post with a solution to change the PATH to point exactly where cc1plus is installed. Instead of 4.8.3 in this AMI though it seems I have 4.8.5, so here’s the settings I used:

$ export COMPILER_PATH="/usr/libexec/gcc/x86_64-amazon-linux/4.8.5/:$COMPILER_PATH";
export C_INCLUDE_PATH="/usr/lib/gcc/x86_64-amazon-linux/4.8.5/include/:$C_INCLUDE_PATH";

And then I noticed in the post they hadn’t included either of these in setting the new PATH which seems like an oversight (I don’t think these will make any difference if they are not in the PATH), so I set my path like this, including COMPILER_PATH first:

export PATH="$COMPILER_PATH:/sbin:/bin:/usr/sbin:/usr/bin:/opt/aws/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/aws/bin:$PATH";

Now cc1plus is in my path, let’s try building pytorch again! In short, that worked.

Going back to the packaging steps, let’s zip everything up and push the zip to S3 ready to deploy.

I zipped up:

  • char-rnn.pytorch : from github, including my own runner script and lambda handler script
  • modules installed into virtualenv lib: ~/shrinkenv/lib/python2.7/site-packages/* (tqdm)
  • modules installed into virtualenv lib64: ~/shrinkenv/lib64/python2.7/site-packages/* (torch, numpy and unidecode)

At this point I used ‘aws s3 cp’ to copy the zip to s3, and configured my Lambda from the zip in s3. Set up a test to call my handler, and success!

Adding a cheap SSD to my 2008 Mac Pro

Windows 10 on my 2008 Mac Pro maxes out the disk i/o while booting, checking for updates and doing whatever it does after startup, plus add Steam and Origin to launch at boot and disk i/o sits at 100% for several minutes after boot. My Windows 10 disk up until now has been a cheap Hitachi/HGST 7200rpm 500GB HDD.

I boot Windows 10 on my Mac Pro only for occasional gaming, so I haven’t been overly eager to install an SSD. It wasn’t until these recent SSD deals with 480GB for as low as $65 that I decided to pick one up.

I’m aware that the 2008 Mac Pro only has a SATA2 disk controller by default so won’t be able to take advantage of the maximum SATA3 SSD speeds (max 600MB/s), but even at SATA2 bandwidth (max 300MB/s) the i/o will still be multiple times faster than what’s capable by a 7200rpm magnetic disk.

For the last couple of magnetic 2.5″ disks I added, I used a cheap $5 2.5 to 3.5″ 3d printed bracket from Amazon. While it works and holds the disks in place, it’s not sturdy enough to get the drives inserted into the SATA slots when you push the drive sled into the machine. You need to reach under to find the back of the drive and give it a push, then it seats into the slot. I decided to try a Sabrent metal bracket for the SSD. When it arrived I realized I had already used one of these in the past when installing an SSD into a 2012 MacBook Pro. These are pretty sturdy and work well:

$5 3d printed adapter on left, Sabrent adapter on right

A few notes as reminders to myself on the install:

  • Windows 10 will not install from the ISO burnt to a USB flash drive, no matter whether you set it up from Windows 10, MacOS, or Linux. I tried multiple times, and it will not boot. Strangely, MacOS will boot and install from a USB flash drive just fine.
  • Windows 10 will not install to a fresh, blank HD or SSD if there are other disks already in the Mac Pro. Remove all the other disks, leaving just the target disk for Windows 10. Boot from DVD, complete the install, then insert all the other disks back after completing the install

My HDDs with most uptime hours

I keep a few old HDDs around as scratch disks for installing random stuff. I realized a couple of them I’ve been using fairly regularly in my Mac Pro are pretty old, so took at look at the SMART stats (smartctl) to see how old they actually are, and what their stats and uptime actually are:

WD Caviar Blue 500GB – this drive came installed in my 2008 Mac Pro when I bought it used. I’ve no idea if it was an original disk in the machine or added later, but it’s still chugging along with no errors and over 3.7 years uptime:

32,830 uptime hours
0 read error rate
SMART health: PASSED

Hitachi Deskstar 3.5″ 7200rpm P7K500 250GB – I have 2 of these disks that I used in a Linux server as a RAID1 pair when I used to self host my website from home. Still no errors and over 5 years uptime so far:¬†¬†

45,082 uptime hours
0 read error rate
SMART health: PASSED

I understand that both of these are on borrowed time and I don’t use these for anything critical, but it’s interesting to see how long some disks last. On the other end of the spectrum I’ve also had several disks fail within a year, and one (a Quantum Fireball I think) failed within a couple of weeks, but it’s interesting to compare the lifetimes and failures from a number of disks over time.

Learning Golang (part 1)

A few random notes from my initial attempts learning Golang.

Compile and run:

go run source.go

Build executable:

go build source.go

Structure:

//defines a module
package packagename

Package main defines a standalone executable:

package main

Import required packages:

import packagename

Semicolons are not required unless there’s more than one statement on a line

Functions:

func functionName() {
//code
{

Arguments passed to an app are accessible via the array os.Args. os.Args[0] contains the name of the app itself.

Ok, let’s try my first hello world in Eclipse with the Goclipse plugin installed:

import (
    "fmt"
)
func main(){   fmt.Println("hello!")
}

I get this error:

Ok, first lesson, a package is required, so added:

package main

Creating a Run As… config in Eclipse for my project and then attempting to Run again gave me this useful message:

Ok, so I moved my source into an app folder (/src/main) but this gave me additional errors. At this point I’ve errors about $GOPATH:

Looking through the Project properties, this dialog with this option adds the Project location into the required GOPATH:

Now my first app runs successfully!

Observations about common IT technologies in 1988-89

Sometime around 1988-1989 I did some part-time data entry work for an IT Recruitment Agency that my Dad worked for. Tucked away in some papers I found these two sheets listing a range of different programming languages and other in-demand software packages/systems at the time. From memory, I think this list was what I used to code each of the job applicants tech skills as they were entered into their recruitment CV/resume database.

There’s many things interesting about this list from 30 years ago. The first that caught my attention is how many of the tech skills on this list are no longer in use today, and some I’ve never even heard of since.

The second point that’s interesting is how many technologies and languages we commonly use today are not even on this list, meaning they were developed and introduced at some point after 1989. Every web technology in common use today was introduced after this point – HTML, CSS, JavaScript and any of the various popular JavaScript libraries, all introduced at some point after 1989.

Even other popular languages and frameworks/platforms, Visual Basic, Java, .NET, Ruby, PHP … all introduced after 1989.

This reinforces the fact that commonly used IT technologies come and go pretty quick, and what’s common today can easily be replaced with something else tomorrow. If you’re planning to stay in IT for the long run (longer than a few years), be prepared to keep your skills up to date, stay flexible and adapt to change.

Installing RabbitMQ rpm on RHEL 5.11

Rough notes for installing RabbitMQ on RHEL5.11.

Install the EPEL repo (not sure if this is needed for the RPM install or not):

curl -O http://archives.fedoraproject.org/pub/archive/epel/5/x86_64/epel-release-5-4.noarch.rpm
rpm -ivh epel-release-5-4.noarch.rpm
yum update

Install the erlang dependency (from answer here):

wget -O /etc/yum.repos.d/epel-erlang.repo http://repos.fedorapeople.org/repos/peter/erlang/epel-erlang.repo
yum install erlang

Download the noarch version of RabbitMQ:

wget -O http://www.rabbitmq.com/releases/rabbitmq-server/v3.1.1/rabbitmq-server-3.1.1-1.noarch.rpm

Note on my initial install of RHEL 5.11 I could’t wget or curl to any https based sites, as I’d get SSL connection/certificate errors. I downloaded the rpm on another machine and scp’d it up to my server.

Install the rpm:

rpm -i rabbitmq-server-3.1.1-1.noarch.rpm

Enable the admin console:

rabbitmq-plugins enable rabbitmq_management

Delete the default guest user:

rabbitmqctl delete_user guest

Create a new admin user and add to the admin group:

rabbitmqctl add_user newuserid password
rabbitmqctl set_user_tags admin administrator

Start/stop/restart the server:

/sbin/service rabbitmq-server start
/sbin/service rabbitmq-server stop
/sbin/service rabbitmq-server restart

Additional notes:

I tried the generic¬†Linux version wouldn’t start up for me, it gave this error, so found a working RPM above instead:

{"init terminating in do_boot",{undef,[{rabbit_prelaunch,start,[]},{init,start_it,1},{init,start_em,1}]}}
init terminating in do_boot ()