Cloning an existing VM on VMware ESXi using command line vmkfstools

Apparently vCenter Server provides the ability to clone VMs via the Client, but not if you’re just using ESXi and managing your host directly with the web client. It is possible however to clone a VM’s disk using the vmkfstools commanline utility as described in this post.

Here’s a summary of the steps:

Enable SSH from the ESXi web console: Host / Manage / Services

In my case I wanted to create a copy of an existing CentOS7 VM. SSH into your ESXi host, then:

vmkfstools -i CentOS7-1/CentOS7-1_0.vmdk CentOS7-2/CentOS7-2.vmdk -d thin

Next, create a new VM as normal, but on the Customize Settings dialog, press the X on the right to delete the disk created by the new VM wizard:

Next, press ‘Add new disk’, select ‘Existing hard disk’, then point to the copy of the VM disk that you created with the vmkfstools command:

Credit to this post for the tip to configure using an existing disk.

Big data analysis, machine learning, and publicly available data sets

I’ve been meaning to take a look at some Big Data analysis tools for a while, particularly Apache Spark, and deeplearning4j. If I’m going to use Spark to ingest a large dataset, I thought it would be worthwhile to write a regular Java app to crunch some numbers on a dataset first as a benchmark. Looking around for some publicly available datasets, I’ve know for a while that Project Gutenburg has publicly available texts of many classic novels available. I wondered what it would take to do a simple word count on all words in a typical novel.

It turns out a typical novel, say Alice in Wonderland, is actually pretty small, at around 150kb. Not exactly ‘big’ at all in today’s meaning of ‘big data’, in fact trivial. Anyway, I wrote a simple Java app to count word occurrences and then order by number of occurrences, you can see my code here. I didn’t attempt to optimize the code at all, this was my first attempt at writing a word count app – the surprising thing is how quick it executes. On my i7 Macbook Pro with an SSD, it complete the count and sort in 100ms. I was hoping to have something with more siginficant number crunching than this, so clearly I need to set my sights higher in terms of larger data sets.

If you Google ‘public big data sets’ you’ll find many collections, for example this list. Some of these are collections of publicly available data, some are data shared by organizations who are asking the community for input on analyzing their data. The Yelp data set is interesting in this category – they offer a dataset that’s 5.79GB of json data for example, for researchers to analyze and provide feedback in a ‘Dataset challenge‘. Almost 6GB of data is significantly larger than my 150k, so if I’m going to do anything interesting with Spark this might be a good place to start.

Data set downloaded, off I go 🙂

Installing ESXi Guest Tools on CentOS 7

From here, edit /etc/yum.repos.d/vmware-tools.repo , add:

[vmware-tools]
name = VMware Tools
baseurl = http://packages.vmware.com/packages/rhel7/x86_64/
enabled = 1
gpgcheck = 1

To install:

sudo yum install open-vm-tools-deploypkg

On my freshly installed CentOS 7, this gave the error:

[kev@unknown000C2960F639 ~]$ sudo yum upgrade

Loaded plugins: fastestmirror

You have enabled checking of packages via GPG keys. This is a good thing. 

However, you do not have any GPG public keys installed. You need to download

the keys for packages you wish to install and install them.

You can do that by running the command:

    rpm --import public.gpg.key

Alternatively you can specify the url to the key you would like to use

for a repository in the 'gpgkey' option in a repository section and yum 

will install it for you.

For more information contact your distribution or package provider.

Problem repository: vmware-tools

It doesn’t say how to import the GPG keys, but this page mentions they are in this location:  /etc/pki/rpm-gpg

So to import,

sudo rpm --import  /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7

Then I could update:

sudo yum update

Then install the vmware-tools:

sudo yum install open-vm-tools-deploypkg