Running aitextgen model training in a Docker container

I’m setting up an approach to run text generation model training jobs on demand with aitextgen, and the first approach I’m looking at is to run the training in a Docker container. Later I may move this to an AWS service like ECS, but this is my first step.

I’ve built a Docker image with the following dockerfile:

FROM amazonlinux
RUN yum update -y
RUN yum install -y python3
RUN pip3 install aitextgen
ADD source-file-for-fine-tuning.txt .

.. and then built my image with:

docker build -t aitextgen .

I then run a container passing in the cmd I want to run, in this case ‘python3’:

docker run --volume /data/trained_model:/trained_model:rw -d aitextgen sh -c "cd / && python3 && mv aitextgen.tokenizer.json /trained_model"

I’m also attaching a bind point where the model output is being written to during the run, and -d to run the container in the background. The last step in the run command copies the token file to the mounted EBS volume so it can be reused by the generation.

To generate text from the model, run:

docker run --volume /data/trained_model:/trained_model:rw -d aitextgen sh -c "cd / && python3"

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.