Skip to content

MongoDB Queries in a Nutshell

Emidio Piermarini edited this page Jan 18, 2018 · 27 revisions

Introduction

This wikipage will attempt to get you the 'low hanging fruit' in with MongoDB. With the following commands, you can do 80% of what a typical BH engineer will have to do with their data. If you want further information, please see MongoDB's online documentation: MongoDB Reference Manual.

As you will see, MongoDB's computer language is fairly easy to learn. Whilst it isn't extremely human readable, it is certainly more straightforward than things like nested loops. All of MongoDB's language is based on JSON syntax. It is worth looking a bit at the sytanx before jumping into Mongo Queries. Simply speaking, JSON script will usually look like this:

{"Key": Value} or {"Mongo Query Name": Stuff to Do}

Ok, now let's do some examples. You'll need to do the following:

  1. Open an instance of MongoShell (Details Here)
  2. Open Robo3T (Details Here)

Important Queries

You can pick and choose any query to use, but one that will be pretty much necessary for the average engineering task will be aggregate:

aggregate

The $aggregate function that executes a list of queries one after the other. So let's break that down with what we already know:

Queries = JSON objects Queries = Programmatic Sieves Programmatic Sieves = Filters for us to get to the #DataWeWant

Therefore, by the transitive propery, the aggregate function takes in a list of JSONs and spits out the the subset of data from the database that we care about a.k.a the #DataWeWant.

Aggregate(database,List of JSONs) = #DataWeWant from database

When you use the grasshopper mongo tools, this is what is going on behind the scenes. Recalling that all queries are JSON objects, we should expect to see that the aggregate command takes in a set of JSON objects. To create this list, we will separate each query by a comma:

Aggregate(database,JSON1,JSON2,JSON3,...,JSON9999999) = #DataWeWant from database

Ok, so how do we actually execute this on our Mongo Database? Well, you could use the MongoShell. Or you could use the GrasshopperToolkit tools. But one of the easiest mediums to debug you code is Robo3T. This is because Robot3T has a skosh of intellesense that can help you understand where you have syntax errors in your aggregate command. Recall that if you execute an aggregate command in Robo3T and it works, you can feed the exact same list of queries into Grasshopper using the Mongo Components and it will yield the same result. Therefore, Robo3T is your best friend when you are just getting started with Mongo.

Let's set up our aggregate function. First, we open a collection:

As you can see, when I open my collection, the default command Robot3T throws in is the find() command. If we want to play with $aggregate commands, we need to remove this:

db.getCollection('Model A Forces Database').find({}) --------> db.getCollection('Model A Forces Database')

And in its place add the aggregate command:

db.getCollection('Model A Forces Database').aggregate([JSON1,JSON2])

As you can see, the syntax has a set of [] that enlose the JSON queries. In Robot3T, it will be most convenient to to set up the aggregate command with adding some returns to make the list more human readable:

In what follows, you will learn some common queries that will replace the JSON#'s in the picture above.

THING THAT WILL ANNOY YOU ALERT: To execute any command in Robot3T you must use the F5 key not the ENTER key. This goes against everything you've ever known.....so be prepared to a bit of cussing when instead of Robot3T executing your code, it just returns to the next line of the commandline instead.

So here are some of the best queries to throw inside an aggregate function:

$addFields

If you wish to AUGMENT your data, i.e. add parameters to it, this is your query.

In the example above, you can see that we've been able to append a new property to all our documents in the collection. But most likely, you'll want the arguement of $addFields to be another function which will add data to each document on a case by case basis:

So as you can see here, we're using a new function inside of the $addFields function called $in. The $in function can help use test if a certain property is part of a provided list (here we're asking "For this specific document, does the property Bar_Number equal either 5353 or 5454). As you can see, this allows us to write specific criteria for each document and in this case, we can use that information as the input to the next query in the aggregate (as we'll show in $match)

$match

If you wish to GRAB A SUBSET OF YOUR DATABASE based on certain criteria, i.e. filter all data items based a conditional criteria, this is your query.

Here's a simple example of this:

But you really cannot see what actually happened here. What would be good is to see how many database documents match this criteria. We can find this out by adding a little command at the end of the aggregate pipeline ( { $group: { _id: null, count: { $sum: 1 } } } ----- don't worry, you'll get this soon):

50,658 documents out of the 28,873,446 total documents in the database (and in under 12 seconds to boot). Recall, thats enough data to max out 28 excel tabs in excel.

We can always add to this as well:

Obviously, the more layers we add, the less we get back (and it takes a bit longer too, but still, pretty fast).

We can even use the output of our $addFeilds as part of this:

$project

If you wish to GET ONLY CERTAIN ASPECTS OF YOUR DATA but for all of your database, i.e. I have a million items that all have properties X, Y, and Z, but I only care out the X and Y values, this is your query.

{$project: {FX: 1}}

$group

If you wish to TRANSFORM THE STRUCTURE OF YOUR DATA, i.e. instead of having 10,000 lists of 2 items, having 2 lists of 10,000 items, this is your query.

Clone this wiki locally