Kev's Development Toolbox

April 23, 2018April 22, 2018

Copy a file from inside a running Docker container to the host

To copy a file from inside a running container to a location on the host, use:

docker cp containerid:/path/to/file/in/container /location/on/host

See docs here.

April 22, 2018

Exporting WordPress post content from MySQL into a file

I can never remember how to do this, so leaving this here:

select post_title, post_content from wp_posts
order by post_date asc 
into outfile '/var/lib/mysql-files/yourfile.txt';

Also see:

mysql ‘select … into outfile’ access denied

Exporting select result from MySQL and getting the error ‘ERROR 1290 (HY000): The MySQL server is running with the –secure-file-priv option’

April 22, 2018

Caching Spring Boot RESTController responses with Spring Cache and Redis

Spring Boot provides easy integration for caching responses using a number of cache providers. See the section in the docs here. Depending on what type of response you’re trying to cache however, there’s a range of issues you can run into.

If you’re interested in a working configuration then skip to the end, otherwise what follows is the steps I went through to try and get this working as I wanted. If you run into one of these error messages (which are surprisingly descriptive) then maybe you’ll find this post from a search and this will help you get to a solution).

I’m using Redis as a cache provider. To enable this I added the following to my Spring Boot application.properties:

spring.cache.type=redis
spring.redis.host=localhost
spring.redis.port=6379

Add this dependency to your pom.xml:

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-cache</artifactId>
</dependency>

And then add Redis support:

<dependency>
  <groupId>org.springframework.data</groupId>
  <artifactId>spring-data-redis</artifactId>
</dependency>

<dependency>
  <groupId>redis.clients</groupId>
  <artifactId>jedis</artifactId>
</dependency>

I’m using JPA with h2 for testing, so I’ve also added:

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>

and then for h2:

<dependency>
  <groupId>com.h2database</groupId>
  <artifactId>h2</artifactId>
</dependency>

To enable caching support in your Spring Boot app, add the @EnableCaching annotation:

@SpringBootApplication
@EnableCaching

To enable support for using a RedisTemplate directly (I think Spring Boot Cache when configured to use Redis as your cache provided uses this config too) add this @Bean to your @Configuration class:

@Bean
public RedisTemplate<String, String> redisTemplate(RedisConnectionFactory cf) {
  RedisTemplate<String, String> redisTemplate = new RedisTemplate<String, String>();
  redisTemplate.setConnectionFactory(cf);
  return redisTemplate;
}

… and configure the Jedis client library:

@Bean
public JedisConnectionFactory redisConnectionFactory() {
  JedisConnectionFactory redisConnectionFactory = new JedisConnectionFactory();

  // Defaults
  redisConnectionFactory.setHostName("127.0.0.1");
  redisConnectionFactory.setPort(6379);
  return redisConnectionFactory;
}

You also need to configure the RedisCacheManager as the cache provider:

@Bean
public CacheManager cacheManager(RedisTemplate redisTemplate) {
  RedisCacheManager cacheManager = new RedisCacheManager(redisTemplate);

  // Number of seconds before expiration. Defaults to unlimited (0)
  cacheManager.setDefaultExpiration(300);
  return cacheManager;
}

The first requirement for caching your responses is that the response needs to be Serializable, otherwise you’ll see this error in your response from an @RestController:

{
“status”: 500,
“error”: “Internal Server Error”,
“exception”: “org.springframework.data.redis.serializer.SerializationException”,
“message”: “Cannot serialize; nested exception is org.springframework.core.serializer.support.SerializationFailedException: Failed to serialize object using DefaultSerializer; nested exception is java.lang.IllegalArgumentException: DefaultSerializer requires a Serializable payload but received an object of type [org.springframework.http.ResponseEntity]”
}

ResponseEntity is the wrapper you typically use for your responses from a @RestController, and it’s not Serializable. From this question here, this suggests using a SpEL expression to tell the cache support to use the body of the response as a the key, and not the ResponseEntity itself (which makes sense):

@Cacheable(cacheNames = "addresses", key="#response?.body")

With this you now get:

Null key returned for cache operation (maybe you are using named params on classes without debug info?) Builder[public org.springframework.http.ResponseEntity kh.springcloud.service1.Service1RestController.getAddresses(java.lang.String) throws org.springframework.web.client.RestClientException] caches=[addresses] | key=’#response?.body’ | keyGenerator=” | cacheManager=” | cacheResolver=” | condition=” | unless=” | sync=’false'”

Thinking what I really need is the id of the address I’m returning as the key, I modified the SpEL to this:

#response?.body.id

but this is not really what I want either if I’m attempting to return a list of Addresses from GET /addresses?state=CA. What I really want as the key of the results to be cached is the State value that I’m querying on, so trying this approach:

#request.getParameter('state')

this says request is null.

Looking at the examples in this question, it looks like you can reference the method parameter names directly, so:

@Cacheable(cacheNames = "addresses", key="'state")

Now trying a request we get this:

“message”: “EL1008E: Property or field ‘state’ cannot be found on object of type ‘org.springframework.cache.interceptor.CacheExpressionRootObject’ – maybe not public?”,

Ok, how about key=”{#state}” – this gives same error as before.

“message”: “Expression [{ #state ] @0: EL1044E: Unexpectedly ran out of input”,

I’m really reaching for a solution at this point 🙂 What if we step back and simplify the api we’re trying to cache, and add a findById api and try to cache this by id:

@GetMapping("/addresses/{id}")
@Cacheable(cacheNames = "addresses", key="#id")
public ResponseEntity<Address> getAddressByd(@PathVariable("id") Long id)

Ok, with this simple endpoint, we’re not retrieving anything by id, but a quick sanity check (removing the cacheable) shows the responses are coming back without an id. So there’s your problem:

{
  "addr1": "test3",
  "city": "testcity2",
  "state": "CA",
  "zip": "95616"
}

Taking a look at my mapping, as a test I’m running against an in memory h2 db, and was mapping my id with AUTO:

@Id
@GeneratedValue(strategy=GenerationType.AUTO)

It seems this doesn’t generate any id at all for the h2 db, so let’s switch to an identity table (does h2 have sequences? I’m not sure, but feel like this is going too far down a rabbit hole)

@Id
@GeneratedValue(strategy=GenerationType.IDENTITY)

Now what do we get in responses – the same response. Trying /addresses/1 vs /addresses/2 I can see that I have a response with id=1, but not for id=2

Looking at the mappings in the entity, I don’t have a getter/setter for the id, so let’s add one and try again. Ok, now we’re good:

{
  "id": 1,
  "addr1": "test3",
  "city": "testcity2",
  "state": "CA",
  "zip": "95616"
}

Let’s put the @Cacheable back and try again.

This cache config:

@Cacheable(cacheNames = "addresses", key="#response?.body?.id")

Still gives:

"Null key returned for cache operation"

Let’s try id directly:

@Cacheable(cacheNames = "addresses", key="id")

Now we get:

“message”: “EL1008E: Property or field ‘id’ cannot be found on object of type ‘org.springframework.cache.interceptor.CacheExpressionRootObject’ – maybe not public?”,

key=”#id” is back with this error, so it seems like this is the right config, but we’re still struggling with the ResponseEntity not being Serializable:

“message”: “Cannot serialize; nested exception is org.springframework.core.serializer.support.SerializationFailedException: Failed to serialize object using DefaultSerializer; nested exception is java.lang.IllegalArgumentException: DefaultSerializer requires a Serializable payload but received an object of type [org.springframework.http.ResponseEntity]”,

This issue on the SpringBoot project is exactly what we’re seeing, but the OP was told to post the question to SO:

https://github.com/spring-projects/spring-boot/issues/5017

This might the same question, if not then it’s pretty much the same question, but the only answer so far says what we know so far: ResponseEntity is not Serializable, but we don’t know yet how to fix this. It seems we need a custom Serializer configured, but not sure how we do that yet.

This post suggests how to change/set the serializer and key serializer in use and suggests:

template.setDefaultSerializer(new GenericJackson2JsonRedisSerializer());
template.setKeySerializer(new StringRedisSerializer());
template.setHashKeySerializer(new GenericJackson2JsonRedisSerializer());
template.setValueSerializer(new GenericJackson2JsonRedisSerializer());

Trying with this in place, we’ve got a different error, and this seems an issue with the Long key, instead of a String key.

It doesn’t look like there is a LongRedisSerializer, but searching around I found this question which suggests to use the GenericRedisSerializer for other key types, so replacing StringRedisSerializer with this:

redisTemplate.setKeySerializer(new GenericToStringSerializer<Long>(Long.class));

Now we’re making some progress. Calling the get by id api the first time, the sql is executed and the json response is returned from the endpoint. Calling it a second time we now get a different error:

com.fasterxml.jackson.databind.JsonMappingException: Can not construct instance of org.springframework.http.ResponseEntity: no suitable constructor found, can not deserialize from Object value (missing default constructor or creator, or perhaps need to add/enable type information?)

Starting the redis-cli and doing a ‘keys *’ to see what if anything has been inserted, there’s a couple of keys, one is “1”.

Doing a ‘GET 1’ we get:

127.0.0.1:6379> get 1

“{\”@class\”:\”org.springframework.http.ResponseEntity\”,\”headers\”:{\”@class\”:\”org.springframework.http.HttpHeaders\”},\”body\”:{\”@class\”:\”kh.springcloud.service1.domain.Address\”,\”id\”:1,\”addr1\”:\”test3\”,\”city\”:\”testcity2\”,\”state\”:\”CA\”,\”zip\”:\”95616\”},\”statusCode\”:\”OK\”,\”statusCodeValue\”:200}”

127.0.0.1:6379>

We’ve got a cached response! So the problem now looks like deserializing the cached data back out to a ResponseEntity<Address>

What if we try @Cacheable on the JPA query instead of trying to work out how to deserialize the ResponseEntity?

First attempt:

@Cacheable(cacheNames = "addresses", key="#id")
public Address findOne(Long id);

This gives us an error about the key being null:

Null key returned for cache operation (maybe you are using named params on classes without debug info?) Builder[public abstract kh.springcloud.service1.domain.Address kh.springcloud.service1.repo.AddressRepository.findOne(java.lang.Long)

Ok. This post mentions an alternative numbered param reference to the params, using the format #p0. Let’s try:

@Cacheable(cacheNames = "addresses", key="#p0")

This works! Calling the api the first time we get the select statement against the db (seen from the hibernate debug log output), but subsequent times there’s no additional select against the db, the value is returned from cache.

There doesn’t seem to be much info about how to handle deserializing the result back out as a ResponseEntity, when I have some time I’ll see if I can work out how to do this, but in the meantime, caching the responses from JPA is good enough and avoids hitting the db every time, so this works for now.

April 8, 2018July 11, 2018

Generating tweets using a Recurrent Neural Net (torch-rnn)

Even if you’re not actively following recent trends in AI and Machine Learning, you may have come across articles by a researcher who experiments with training neural nets to generate interesting things such as:

cooking recipes – including culinary wisdom such as:

Brown salmon in oil. Add creamed meat and another deep mixture

recipe titles – including my favorites:

Chocolate Pickle Sauce

Completely Meat Chocolate Pie

and even craft beer names

So what’s going on here? What’s being used is something called a Recurrent Neural Net to generate text in a specific style. It’s trained with input data which it analyzes to recognizes patterns in the text, constructing a model of that data. It can then generate new text following the same patterns, sometimes with rather curious and amusing results.

A commonly referred to article on this topic is by Andrej Karpathy, titled “The Unreasonable Effectiveness of Recurrent Neural Networks” – it’s well worth a read to get an understanding of the theory and approach.

There’s many RNN implementations you can download and start training with any input data you can imagine. Here’s a few to take a look at:

char-rnn by Andrej Karpathy
torch-rnn – a re-implementation of char-rnn using Torch
textgenrnn – a character RNN python module
… and many more

So it occurred to me, what would happen if you trained a RNN with all your past Twitter tweets, and then used it to generate new tweets? Let’s find out 🙂

Let’s try it out with torch-rnn – the following is a summary of install steps from https://github.com/jcjohnson/torch-rnn:

sudo apt-get -y install python2.7-dev

sudo apt-get install libhdf5-dev

Install torch, from http://torch.ch/docs/getting-started.html#_ :

git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps;
./install.sh
#source new PATH for first time usage in current shell

source ~/.bashrc

Now clone the torch-rnn repo:

git clone https://github.com/jcjohnson/torch-rnn.git

Install torch deps:

luarocks install torch
luarocks install nn
luarocks install optim
luarocks install lua-cjson

Install torch-hdf5:

git clone https://github.com/deepmind/torch-hdf5
cd torch-hdf5
luarocks make hdf5-0-0.rockspec

Install pip to install python deps:

sudo apt-get install python-pip

From inside torch-rnn dir:

pip install -r requirements.txt

Now following steps from docs to preprocess your text input:

python scripts/preprocess.py \
  --input_txt my_data.txt \
  --output_h5 my_data.h5 \
  --output_json my_data.json

For my input tweet text this looks like:

python scripts/preprocess.py \
  --input_txt ~/tweet-text/tweet-text.txt  \
  --output_h5 ~/tweet-text/tweet-text.h5 \
  --output_json ~/tweet-text/tweet-text.json

This gives me:

Total vocabulary size: 182

Total tokens in file: 313709

  Training size: 250969

  Val size: 31370

  Test size: 31370

Now to train the model:

th train.lua \
  -input_h5 my_data.h5 
  -input_json my_data.json

For my input file containing my tweet text this looks like:

th train.lua 
  -input_h5 ~/tweet-text/tweet-text.h5 
  -input_json ~/tweet-text/tweet-text.json

This gave me this error:

init.lua:389: module 'cutorch' not found:No LuaRocks module found for cutorch

 no field package.preload['cutorch']

Trying to manually install cutorch I got errors about cuda toolkit:

CMake Error at /usr/share/cmake-3.5/Modules/FindCUDA.cmake:617 (message):

  Specify CUDA_TOOLKIT_ROOT_DIR

Checking the docs:

By default this will run in GPU mode using CUDA; to run in CPU-only mode, add the flag -gpu -1

… so adding -gpu -1 and trying again, now I’ve got this output as it runs:

Epoch 1.44 / 50, i = 44 / 5000, loss = 3.493316

… one line every few seconds.

After some time it completes a run, and you’ll find files like this in your cv dir beneath where you ran the previous script:

checkpoint_1000.json
checkpoint_1000.t7
checkpoint_2000.json
checkpoint_2000.t7
checkpoint_3000.json
checkpoint_3000.t7
checkpoint_4000.json
checkpoint_4000.t7
checkpoint_5000.json
checkpoint_5000.t7

Now to run and get some generated text:
th sample.lua -checkpoint cv/checkpoint_5000.t7 -length 500 -gpu -1 -temperature 0.4
Breaking this down:

-checkpoint : as the model training runs, it saves these point in time snapshots of the model. You can run the generation against any of these files, but it seems the last file it generates gives you the best results

-length : how many characters to generate from the model

-gpu -1 : turn off the gpu usage

-temperature : this ranges from 0.1 to 1 and with values closest to zero the generation is less creative, closer to 1 the generated output is, let’s say, more creative

Let’s run a couple of example. Let’s do 140 chars are -temperature 0.1:

The programming to softting the some the programming to something the computer the computer the computer to a computer the com

and now lets crank it up to 1.0:

z&loDOps be sumpriting sor’s a porriquilefore AR2 vanerone as dathing 201lus: It’s buct. Z) https://t.co/gEDr9Er24N Amatere. PEs’me tha

Now we’ve some pretty random stuff including a randomly generated shortened url too.

Using a value towards the middle, like 0.4 to 0.5 gets some reasonably interesting results that are not too random, but somewhat similar to my typical tweet style. What’s interesting is my regular retweets of software development quotes from @CodeWisdom have heavily influenced the model, so based on my 3000+ tweets it generates text like:

RT @CodeWisdom followed by random generated stuff

Given that the following text is clearly not content from @CodeWisdom, it wouldn’t be appropriate to use this text as-is and post it as a new tweet. Since I’m looking to take this text and use it as input for an automated Twitter-bot, as interesting as this generated pattern is in that it does look like the majority of my tweets, I’ve filtered out anything that starts with ‘RT @text’

I’ve already implemented a first attempt at a Twitter bot using this content with an AWS Lambda running on a timed schedule, you can check it out here:

to a programming to sure the code to like a do and the programmer

— Kevin Hooke Bot (@kevinhookebot) April 6, 2018

I’ll be following up with some additional posts on the implementation of my AWS Lambda soon.