Using AWS Sagemaker to train a model to generate text (part 1)

If you’ve followed any of my recent posts, you’ll know I have been using RNN models to generate text from a model trained with my previous tweets, and the text from all of my previous blog posts, and feeding this into a Twitter bot: @kevinhookebot

The trouble I have right now is the scripts and generate models are running using Lua, and although I could install this to an EC2 instance, I don’t want to pay for an EC2 instance being up 100% of the time. Currently when I generate a new batch of text for my Twitter bot, I startup a local server running the scripts and the model, generate new text, and then stage it to DynamoDB to get picked up by the bot when it’s scheduled to next run. With the AWS provided Machine Learning services, there has to be something out of the box I can use on AWS that would automate these steps.

Let’s take a look at using AWS SageMaker.

First I created a SageMaker notebook with a new role, to access S3 buckets with ‘sagemaker’ in the name.

Then I created an S3 bucket – sagemaker-kevinhooke-ml – and uploaded a copy of my data file (all my previous posts from this blog, concatenated into a single file).

Next I created a new Training Job.

You need to pick an algorithm for the training and there’s a selection of provided algorithms for different purposes. To generate new text ‘in the style of’ the text that I’m going to training the model with, the ‘Sequence2Sequence’ looks like it does what I need.

On completing the Training Job, I got this error:

Ok, so let’s change the instance type. I picked the smallest of the instances before:

And it looks like you can’t change the type on the Notebook. So let’s create a new Notebook. Looking at the instance types, the ones with GPU support are on the large side, so let’s pick the smallest of the options and try again.

At this point I realized the instance type it’s talking about is for the training job not the notebook, and it’s specified here:

So let’s pick one of the GPU types and try again.

First training job is running:

Next error:

Hmm, off to do some reading in the docs to see what’s needed to run this training job. The docs here describe what’s needed for the sequence2sequence algorithm and I’m clearly missing some steps, so taking a pause here and will come back with an update later.

Installing and Configuring Atlassian Confluence with MySQL in Docker Containers

Atlassian Confluence is already available as a Docker Image from the Docker Hub but you still need to provide a database instance for a production setup. Let’s build a docker-compose file to create a container from this image together with a container running MySQL.

First,  per the docs on the Docker Hub page, create an external folder /data/confluence that will get mounted as a volume by the Container.

This is my first version to get this working (keep reading for refining this to include a JDBC driver)

[code]

version: ‘3’
services:
confluence:
image: atlassian/confluence-server
restart: always
volumes:
– /data/confluence:/var/atlassian/application-data/confluence
ports:
– 8090:8090
– 8091:8091
confl-mysql:
build: ./mysql
restart: always
environment:
– MYSQL_RANDOM_ROOT_PASSWORD=yes
– MYSQL_DATABASE=confluence
– MYSQL_USER=confluence
– MYSQL_PASSWORD=your-password
[/code]

After hitting your-ip:8090 for the first time, you can pick the ‘My own database’ option:

To connect to a MySQL db you need to drop a MySQL JDBC driver into /opt/atlassian/confluence/confluence/WEB-INF/lib so at this point we’ve got a couple of options. We could either copy the JDBC driver into the container (but since containers are ephemeral we’d lose this change if we started a new container from the image), or take a step back and rebuild the image including the driver:

The right thing to do would be to rebuild a custom image including the driver. So let’s do that.

Download the MySQL Connector driver from here.

Let’s commit it into our project and add a new Dockerfile to build a modified version of the official Confluence image, which is simply just these two lines:

[code]

FROM atlassian/confluence-server
COPY mysql-connector-java-5.1.46.jar /opt/atlassian/confluence/confluence/WEB-INF/lib

[/code]

Update the docker-compose file to build this new image instead of using the provided one from Docker Hub. Replace:

[code]

image: atlassian/confluence-server

[/code]

with

[code]

build: ./confl-mysql

[/code]

(or your corresponding name of your custom image containing the above Dockerfile)

Now when we startup this container and hit the app, the JDBC driver was recognized and we’re on to the next config page for our database connection params:

Entering our credentials and pressing Test, we’ve got an error about the default encoding:

To address this, the Confluence setup docs here describe editing the my.cnf file in MySQL, or alternatively I could pass params. The MySQL docs have a chapter on configuring and running MySQL in Docker, and this Q&A on Stackoverflow describes passing the optional params in a command section in your docker-compose file.

My first attempt was to add this:

[code]

confl-mysql:
build: ./mysql
restart: always
command: character-set-server=utf8 collation-server=utf8_bin
[/code]

but the syntax was not quite right yet, resulting in the container startup in a restart loop, and this error appearing in the container logs:

/usr/local/bin/docker-entrypoint.sh: line 202: exec: character-set-server=utf8: not found

Reading docs for the command option, the command in the docker-compose file needs to be the command to start the app in the container as well as the optional params. So now I’m here:

[code]

confl-mysql:
build: ./mysql
restart: always
command: [mysqld, –character-set-server=utf8 –collation-server=utf8_bin]
[/code]

Now we’re getting closer. Logs from my MySQL container and how showing:

ERROR: mysqld failed while attempting to check config

command was: "mysqld --character-set-server=utf8 --collation-server=utf8_bin --verbose --help"

mysqld: Character set 'utf8 --collation-server=utf8_bin' is not a compiled character set and is not specified in the '/usr/share/mysql/charsets/Index.xml' file

Some Googling made me realize each of the params is command separated,  so next update is:

[code]
confl-mysql:
build: ./mysql
restart: always
command: [mysqld, –character-set-server=utf8, –collation-server=utf8_bin]
[/code]

and now we’ve got both containers starting up. The list of params should be updated to add all the optional params listed in the Confluence MySQL set up docs, otherwise you’ll get an error for each missing param. The complete list is:

command: [mysqld, --character-set-server=utf8, --collation-server=utf8_bin, --default-storage-engine=INNODB, --max_allowed_packet=256M, --innodb_log_file_size=2GB, --transaction-isolation=READ-COMMITTED, --binlog_format=row]

… and my VM has run out of diskspace, so time to expand my disk. Back shortly.

Ok, back. Restarted and now we’re in business:

Complete config and now the containers are up!

Changing a GitLab Runner from ‘Locked to a Project’ to Shared

I have a GitLab Runner assigned to a project that I’d like to share with another similar project. Currently it looks like this:

Pressing the small edit icon, I can see these options:

I want to reuse this same runner, so I unchecked the ‘Lock to current projects’ checkbox.

Now if I go to the CI/CD settings for my other project I can see it is available, so I click ‘enable for this project’

Now my Pending Job that was triggered after my first push to my repo has kicked in and is being deployed to my test Docker server. Cool.

Setting up a Raspberry Pi SD card with some Amateur Radio related apps

Gert KK6ZGA asked if I could set up an SD Card for her Raspberry Pi with including a few Amateur Radio apps. Rather than just install a bunch of random apps and hand it back, I thought it may be useful to document the setup steps for others as a reference in case anyone else is interested in doing something similar.

First steps, installing an OS – the SD card was blank, so first step I installed Raspbian from here: https://www.raspberrypi.org/downloads/raspbian/

… and then wrote the .img to the sd card with the dd util (notes on how to do this here , If on Windows there are utilities you can download to help you burn an image to an sd card, Google for help with these).

With my LG monitor it doesn’t recognize the HDMI output from the Pi unless you tweak the settings in config.txt to boost the output signal. I’ve covered this before here.

Booting up for the first time, the keyboard is configured by default for GB_en locale and UK keyboard layout (which makes it difficult to find some symbols on a US keyboard like ‘$’, so I have notes on how to switch this to US_en using raspi-config here.

For future reference, I have a number of other Raspberry Pi related getting started posts here.

Raspbian by default is configured to boot to a graphical desktop and to logon automatically with the default userid/password (pi / raspberry) – you should change your password on first boot. You can change this option in raspi-config too if you’d like to boot to a shell, or require logon at boot.

After first boot and the initial setup above, the list of apps I thought would be useful to install is most of what I covered in my July 2016 presentation at one of our RCARS club meetings on using a Raspberry Pi with  amateur radio. Here’s each of the apps I installed and how to start/use them:

  • Installed xlog:
    • sudo apt-get install xlog
    • To start, double-click the icon on the desktop
  • Installed cqrlog
    • sudo apt-get install cqrlog
    • To start, double-click the icon on the descktop

There’s many things you can do with with RTL-SDR (you’ll need a RTL-SDR dongle to take advantage of these), so here’s a couple of examples. Most of these are command line only, from the Terminal, which you can open from the desktop here:

  • dump1090 receives and decodes ADS-B transponder signals from airplanes flying overhead (depending on your antenna, within about a 100 mile radius) on 1.090Ghz. To run, there’s a couple of different modes.

‘Interactive’ mode is started like this from a terminal, first ‘cd dump1090’ then:

./dump1090 –interactive

You’ll see a display like this that updates every second, showing decoded info from received ADS-B transponder signals:

‘Net’ mode displays the received signals via a webpage. You’ll need the Pi to be on a network, either wired or wifi, and you’ll need to know your Pi’s IP address (which you can find by running ‘ifconfig’ in a Terminal). Run this with:

./dump1090 –net –quiet

And then point a browser at your Pi’s IP address on port 8080 (e.g. assuming your IP is 192.168.1.75, http://192.168.1.75:8080) and you’ll see the received signals plotted like this:

Received signals including latitude and longitude location info are plotted on the map, other signals with no location info are displayed in the table on the right.

Other things you can do with rtl-sdr utils: rtl_fm allows you to tune to a specific frequency and decode the FM modulation, and with a combination of piping the data to your audio out if you have speakers attached to the Pi’s audio output, you can receive FM signals and output the audio like this (scroll to the right for the whole command):

rtl_fm -f 96.9M -M wbfm -s 200000 -r 48000 | aplay -r 48k -f S16_LE

This tunes the RTL-SDR to 96.9Mhz, uses wideband FM, a sample rate (I think) of 200000, pipes the audio ‘|’ into aplay to play the audio stream. Take a look at the RTL-SDR docs here for more info on the options.

Hopefully this is a few things to get you started 🙂