MongoDB aggregation query with a $sort

I have an aggregation query to count documents grouped by a property (this is actually part of my http://www.spotviz.info app that I’m working on – the query retrieves counts per Amateur Radio callsign for number of spots uploaded).

By default the result are not in any particular order (that I can see, but maybe I don’t enough enough test data to be able to tell), so I wanted to add a $sort condition.

Here’s the initial aggregation query:

db.Spot.aggregate(
[
{ $group : { _id : {"spotterCallsign" : "$spotter"},
count : {$sum : 1},
firstSpot : {$min : "$spotReceivedTimestamp"},
lastSpot : {$max : "$spotReceivedTimestamp"} }
}
] )

To add the $sort to the aggregation pipeline, just add another document for $sort following the $group, like this:

{ $sort : { "count" : -1 } }

The full query now looks like:

db.Spot.aggregate(
[
{ $group : { _id : {"spotterCallsign" : "$spotter"},
count : {$sum : 1},
firstSpot : {$min : "$spotReceivedTimestamp"},
lastSpot : {$max : "$spotReceivedTimestamp"} }
},
{ $sort : { "count" : -1 } }
] )

Building this query with the Java API is easy, just add another DBObject for the $sort document to the List containing all docs ($group, $sort), in the pipeline:

DBCollection col = db.getCollection("Spot");

// $group
DBObject groupFields = new BasicDBObject("_id", "$spotter");
groupFields.put("firstSpot", new BasicDBObject("$min", "$spotReceivedTimestamp"));
groupFields.put("lastSpot", new BasicDBObject("$max", "$spotReceivedTimestamp"));
groupFields.put("totalSpots", new BasicDBObject("$sum", 1));
DBObject group = new BasicDBObject("$group", groupFields);

List<DBObject> pipeline = Arrays.asList(group,
new BasicDBObject("$sort", new BasicDBObject("totalSpots", -1)));

AggregationOutput output = col.aggregate(pipeline);

MongoDB aggregation queries for ‘counts per day’ (part 1)

I need a MongoDB query to retrieve document counts per day to feed a heatmap display (using https://kamisama.github.io/cal-heatmap/), for my Amateur Radio received signals historical visualization service, SpotViz.

The data to feed Cal-heatmap looks like this:

{
"946721039":4,
"946706853":2,
"946706340":7,
...
}

What’s interesting about this data structure is the property name is variable, and I’m not sure how to project a result into a property name in a MongoDB query. I asked this question on StackOverflow: “Return a computed value as field name in MongoDB query?” – so far I haven’t had any answers or suggestions, so I’m not sure this is possible.

There doesn’t seem to be a way to do exactly what I need, so my next challenge was how to group documents per day (ignoring the time part of a date), and return a count per day.

I started with a working Aggregation query from the shell, and then took that and implemented using the MongoDB Java api. The challenge with this query is that there doesn’t seem to be any out of the box feature that allows you to select matching documents based on a date and exclude the time portion of new Date(). What I need is the equivalent of ‘find counts of documents that are grouped by the same day’. The catch is to not group docs by exactly the same yyyy/MM/dd hh:mm:ss values, but to group by only the same yyyy/MM/dd values.

Since there is a way to extract the year, month and day values from a date with the aggregation $year, $month, $dayOfMonth operators, these could be used to get the result I need (the counts per day), but this format doesn’t help me get the property name for the counts in a seconds past 1/1/1970, e.g. “946721039”.

A query using this approach would look like this:

db.Spot.aggregate(
[
  {$match: {spotter: "kk6dct"}},
  {$group: { _id : {
    year:{$year:"$spotReceivedTimestamp"},
    month:{$month:"$spotReceivedTimestamp"},
    day:{$dayOfMonth:"$spotReceivedTimestamp"}
    },
    count:{$sum: 1 }
  }
}
])

… this approach follows a suggestion from this SO post.

This approach to group the document counts by day is good, but it doesn’t return the docs in the format I need with each day represented by seconds since 1/1/1970.

A better approach would be to group by millis for the date, and return that value. Converting a date in mongo to another format however seems to be somewhat challenging – I spent probably far too much time to work out a query to do this, getting close, but still not what I wanted, and ended up with this rather complex query:

db.Spot.aggregate(
[
  {$match: {spotter: "kk6dct"}},
  {$group: { _id : {
    yearval:{$year:"$spotReceivedTimestamp"},
    monthval:{$month:"$spotReceivedTimestamp"},
    dayval:{$dayOfMonth:"$spotReceivedTimestamp"},
    "h" : {
      "$hour" : "$spotReceivedTimestamp"
      },
    "m" : {
      "$minute" : "$spotReceivedTimestamp"
    },
    "s" : {
      "$second" : "$spotReceivedTimestamp"
    },
    "ml" : {
      "$millisecond" : "$spotReceivedTimestamp"
    }
  },
  count:{$sum: 1 }
}
},

{$project :

{
  "date" : {
  "$subtract" : [
  "$spotReceivedTimestamp",
  {
    "$add" : [
      "$ml",
      { "$multiply" : [ "$s", 1000 ] },
      { "$multiply" : [ "$m", 60, 1000 ] },
      { "$multiply" : [ "$h", 60, 60, 1000 ] }
    ]
  }
]
}
}
}
])

What I was attempting to do with this approach was to use the $project stage to subtract the $hour, $minute and $second values converted to millis from each of the timestamp values to get just the millis value of the yyyy/MM/dd but ignoring the time part. This is about as close as I got, but I couldn’t get the math to work, or at least convert between types so the calculations would work the way I wanted.

My next attempt was based on the suggestion in this SO post. This is a much simpler approach to the problem – my new query looks like this:

db.Spot.aggregate( [
  {$match: {spotter: "kk6dct"}},
  {"$group": {
    "_id": {
      "$subtract": [
        { "$subtract": [ "$spotReceivedTimestamp", new Date("1970-01-01") ] },
        { "$mod": [ { "$subtract": [ "$spotReceivedTimestamp", new Date("1970-01-01") ] }, 1000 * 60 * 60 * 24 ] }
      ]
    },
    count:{$sum: 1 }
  }
}
])

If I try and break this down into words, then what I’m doing is:

– for date x, calculate millis since 1/1/1970 (the epoch date)

– subtract from this the number of millis since the start of the day (this is the millis since 1/1/1970 mod number of millis in a day, the remainder of one divided by the other)

… the result is the millis of each date at midnight, i.e. excluding the time part.

Ok, almost there! How I then took this query and converted into the MongoDB Java Drvier API is coming in part 2.

MongoDB usage notes (2)

Continuing prior notes from here.

Drop a database:

use databasename
db.dropDatabase()

List distinct property values across all docs:

db.collectionname.distict("propertyname")

Remove docs in a collection – param is a doc query. If empty doc, removes all docs:

db.collectionname.remove({})

Date range query using $gt and $lt for a range:

db.collectionname.find({ "timestamp" : { "$gte" : ISODate("2014-07-08T19:50"), "$lt" : ISODate("2014-07-08T19:52")  } })

[in progress]

MongoDB Java Driver notes

A few notes on using the MongoDB Java Driver API:

Getting a connection:

MongoClient client = new MongoClient("localhost", 27017);
DB db = client.getDB("test");

Get a collection from connected db:

DBCollection collection = db.getCollection("example");

 

Find all docs in collection and iterate through results:

DBCursor c = collection.find();
try {
   while(c.hasNext()) {
       System.out.println(c.next());
   }
} finally {
   c.close();
}

 

Simple findOne query, matching a doc with properyname = value:

DBObject result = collection.findOne(new BasicDBObject("propertyname", "value"));

 

Find all, sort on property ‘example’ ascending (1=asc, -1=desc), and limit to 10 results:

DBCursor c = collection.find()
    .sort(new BasicDBObject("example", 1))
    .limit(10);

 

Serialize results of a cursor to JSON (tip from here):

JSON json = new JSON();
String jsonString = json.serialize(c);