Using AWS Organizations for learning and personal projects

If you work on many projects deployed to AWS over time, it can become more difficult to track what resources are where and what relates to what. Tagging can help a lot, so can regions. For example I can deploy one project to us-west-1 and another to us-west-2.

Another idea is to take advantages of multiple AWS accounts and manage them as an Organization. There’s no additional cost for each account or setting up the Organization, the costs are still only for the resources you are using.

Now you have multiple accounts to segregate various projects or other things you’re working on, instead of logging off one account and logging on to the other and switching back a forth, you can assume a role within other accounts from the Account dropdown and ‘Switch Roles’. This option is only visible if you are signed on as an IAM user and not the root account user.

Before you get to this step, in the account you want to switch to, create a new IAM role with the permissions you need to use, and in the Trust section, add the account id for the other account where you want to assume the role from. The complete the fields above and insert the ARN id for the role.

After the first time you’ve used this switch role feature, you’ll see the role in the Account dropdown to reuse later.

Running aitextgen model training in a Docker container

I’m setting up an approach to run text generation model training jobs on demand with aitextgen, and the first approach I’m looking at is to run the training in a Docker container. Later I may move this to an AWS service like ECS, but this is my first step.

I’ve built a Docker image with the following dockerfile:

FROM amazonlinux
RUN yum update -y
RUN yum install -y python3
RUN pip3 install aitextgen
ADD source-file-for-fine-tuning.txt .
ADD generate.py .
ADD train.py .

.. and then built my image with:

docker build -t aitextgen .

I then run a container passing in the cmd I want to run, in this case ‘python3 train.py’:

docker run --volume /data/trained_model:/trained_model:rw -d aitextgen sh -c "cd / && python3 train.py && mv aitextgen.tokenizer.json /trained_model"

I’m also attaching a bind point where the model output is being written to during the run, and -d to run the container in the background. The last step in the run command copies the token file to the mounted EBS volume so it can be reused by the generation.

To generate text from the model, run:

docker run --volume /data/trained_model:/trained_model:rw -d aitextgen sh -c "cd / && python3 generate.py"

Mount as EBS volume inside a EC2 instance

By default, if you provision and attach additional EBS volumes for an EC2 instance, they don’t get mounted by default.

The boot EBS is usually /dev/xvda1. Each additional EBS volume should be /dev/xvdb and so on.

First format the new volume:

sudo mkfs -t ext4 /dev/xvdb

Make a mount mount directory like /data, then mount it with:

sudo mount /dev/xvdb /data

Now you should see the new volume available:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        3.9G     0  3.9G   0% /dev
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           3.9G  432K  3.9G   1% /run
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/xvda1       20G  8.8G   12G  44% /
tmpfs           798M     0  798M   0% /run/user/1000
/dev/xvdb       7.8G   36M  7.3G   1% /data

Add a line to /etc/fstab to mount on startup:

/dev/xvdb /data ext4 defaults,nofail 0 2

These steps are from multiple places, mainly answers to this question.

AWS CloudFormation example for S3 bucket

Typical Cloudformation for an S3 bucket with block all public access enabled:

Resources:
  S3BucketExample:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: s3-bucket-name
      PublicAccessBlockConfiguration:
        BlockPublicAcls: true
        BlockPublicPolicy: true
        IgnorePublicAcls: true
        RestrictPublicBuckets: true