Dwight's blog
"Schemaless"

In the NoSQL world it is common to talk about schemaless databases or data models.

It would be more precise to say “dynamic schema”.  In MongoDB, there are databases; a system catalog of collections; documents within collections; explicitly declared indexes for a collection.  The big difference is that “columns”, or rather fields in the document data model, are not predeclared.  Each field/value in the document is dynamic and can be present or missing.  Each value has a datatype too, so it isn’t typeless but rather dynamic or what some might call duck typing.

Here’s an example in the mongo shell.  We may have a couple docs:

> db.persons.find()
{ “name” : “jane”, “age” : 25 }
{ “name” : “ben”, “age” : 30 }

We could then add a new person with an extra attribute:

> db.persons.insert({name:’julie’,age:28,likes:’baseball’})
> db.persons.find()
{ “name” : “jane”, “age” : 25 }
{ “name” : “ben”, “age” : 30 }
{ “name” : “julie”, “age” : 28, “likes” : “baseball” }

No “alter table” necessary.  This is very helpful with agile development methodologies. 

We can take it a step further however.  The value of a field need not be consistent from document to document.  Now, in practice, it is very very common for the contents of a collection to be homogeneous.  But we have the option.  For example suppose we want to add “likes” for ben, but ben likes a couple things.  What to do?

> db.persons.update({name:’ben’},{$set:{likes:[‘math’,’baseball’]}})
> db.persons.find()
{ “name” : “jane”, “age” : 25 }
{ “name” : “julie”, “age” : 28, “likes” : “baseball” }
{ “name” : “ben”, “age” : 30, “likes” : [ “math”, “baseball” ] }

In this example, things work out particularly elegantly as even though one likes value is an array, and the other a string, we can still do some queries across them that are interesting.  This is because when querying for a value, if the value is an array, MongoDB looks into the array:

> db.persons.find({likes:’baseball’})
{ “name” : “julie”, “age” : 28, “likes” : “baseball” }
{ “name” : “ben”, “age” : 30, “likes” : [ “math”, “baseball” ] }

Likewise we can index the field:

> db.persons.ensureIndex( { likes : 1 } )

All very handy and useful.  But you might ask “won’t my data get rather dirty with no schema constraints?”  I had this concern when we started; I assumed we would just add some constraint rules later when needed.  Oddly, there hasn’t been a lot of demand for the feature, so far.  Empirically, it seems the data doesn’t get too noisy.

One other very important note: the dynamic schema is not just for developer friendliness!  There is another good reason for it.  Imagine changing the schema in a database cluster involving 2,000 servers.  It might be tricky to change that global state globally in a consistent manner.  One goal here is to store very big data sets.  Alter table is probably not going to fly with billions or trillions of documents.

P.S. For compactness, the examples above do not show the _id field MongoDB or its driver automically adds to all documents.

P.P.S. Dynamic schema is not unique to MongoDB — some other products in the space do it too…of course I’m biased this is my favorite.

  1. chathaus reblogged this from dmerr
  2. dmerr posted this
blog comments powered by Disqus
-->