Dwight's blog
Bonus - 15 years of CD buying, and now they are automagically in the cloud without me doing anything.

Bonus - 15 years of CD buying, and now they are automagically in the cloud without me doing anything.

Schema Design Tactics : Grouping Small Entities for Greater Cache Efficiency
This is a somewhat standard pattern that would apply to most databases, including MongoDB, but is worth mentioning I think.  Imagine you have a large number of documents/records that are small — perhaps 100 bytes in size for example, but regardless, smaller than the caching page size used by the database.  The problem is, if that document is hot, its entire page will be held in the page cache, and that could be wasteful as other objects in the page are not frequently used.
As an example consider the diagram above.  Each black rectangle represents a page in the page cache.  This might be 4KB for example.  Suppose each object/record is 512 bytes in size.  So eight fit per page. 
So in the first page shown, if the first yellow document is “hot” and frequently accessed, that page will tend to stay in RAM.  Yet this is somewhat wasteful as it represents only 1/8th of the memory in that page.
Suppose though that there is some correlation amount documents that are accessed at the same time.  Maybe they are all for the same hash tag, or have some other property which results in there being a high probability that if one is accessed in a given time quantum, another will be.  If we could group them, we would get greater cache efficiency and system performance.  
Figure (2) highlights this.  In (2), the dark blue documents from (1) are striped to highlight them.  If we could group these, we might achieve better performance.  Once grouped, our data on disk might look more like figure (3).  (Note: figure 3 is only partially drawn, nothing is smaller there, just made more contiguous.)
So how to do this in practice?  Let’s take an example.  Suppose we have temperature measurements, for different devices, over time.  We anticipate wanting to look up these measurements for a given device and a given time range.  We could start with a measurements collection like this: 

{ device:<x>, when:<time>, temp:<tmp> }
…

And perhaps index on the device field:

db.measurements.ensureIndex({device:1})

or perhaps a compound index:

db.measurements.ensureIndex({device:1,when:1})

However our documents are very small and might exhibit something similar to the picture at the top of this page.  Instead we could group measurements under larger documents:

{ device:<d>,
  temps : [
    { when : <time>, temp : <temp> },
    …
  ]
}

But what if there are a lot (e.g. millions) of measurements for one device?  That has two problems.  First, in MongoDB there is a max document limit of 16MB.  Second, even if that limit wasn’t there, if we wanted a small set of measurements, maybe from one day of the year, we would be pulling in a lot of data on the server (and/or client) when we only needed a small portion.  In this situation, a better tactic is not to put all the data for the device in one document, but to “chunk” it.  This could be by putting N subitems in each document, and then creating a new one, or, by having some other grouping notion, such as just putting all measurements for a time period in the document.  So for the latter we could have something like this:

{ device:<d>,
  day:<date>,
  temps : [
    { when : <time>, temp : <temp> },
    …
  ]
}
…

We might then 

db.measurements.ensureIndex({device:1,day:1})

We can then grab all the measurements for one device for a given day, or set of days, easily with a query.  For example:

db.measurements.find( { 
  device:”ABC”,
  day:{$gte:ISODate(“05-01-2013”),
          $lte:ISODate(“05-07-2013”)} } )

Note if we want a portion of a day’s measurements, we would have to pull them out client side from the returned array, or we could use map/reduce or the aggregation framework to do that.
I suppose if a product supports clustered indexes, that would be a possible approach too. MongoDB does not currently, perhaps that is a blog post for another day. Although this approach  has another nice property in that we now have fewer index entries than we had before.  That said, don’t prematurely optimize, if your collection is small, don’t worry about it — unless the grouped structure was the better schema for the use case period.
And while we have been focusing on RAM cache efficiency above, the changes also mean less iops for disk i/o too.

Schema Design Tactics : Grouping Small Entities for Greater Cache Efficiency

This is a somewhat standard pattern that would apply to most databases, including MongoDB, but is worth mentioning I think.  Imagine you have a large number of documents/records that are small — perhaps 100 bytes in size for example, but regardless, smaller than the caching page size used by the database.  The problem is, if that document is hot, its entire page will be held in the page cache, and that could be wasteful as other objects in the page are not frequently used.

As an example consider the diagram above.  Each black rectangle represents a page in the page cache.  This might be 4KB for example.  Suppose each object/record is 512 bytes in size.  So eight fit per page. 

So in the first page shown, if the first yellow document is “hot” and frequently accessed, that page will tend to stay in RAM.  Yet this is somewhat wasteful as it represents only 1/8th of the memory in that page.

Suppose though that there is some correlation amount documents that are accessed at the same time.  Maybe they are all for the same hash tag, or have some other property which results in there being a high probability that if one is accessed in a given time quantum, another will be.  If we could group them, we would get greater cache efficiency and system performance.  

Figure (2) highlights this.  In (2), the dark blue documents from (1) are striped to highlight them.  If we could group these, we might achieve better performance.  Once grouped, our data on disk might look more like figure (3).  (Note: figure 3 is only partially drawn, nothing is smaller there, just made more contiguous.)

So how to do this in practice?  Let’s take an example.  Suppose we have temperature measurements, for different devices, over time.  We anticipate wanting to look up these measurements for a given device and a given time range.  We could start with a measurements collection like this: 

{ device:<x>, when:<time>, temp:<tmp> }

And perhaps index on the device field:

db.measurements.ensureIndex({device:1})

or perhaps a compound index:

db.measurements.ensureIndex({device:1,when:1})

However our documents are very small and might exhibit something similar to the picture at the top of this page.  Instead we could group measurements under larger documents:

{ device:<d>,

  temps : [

    { when : <time>, temp : <temp> },

    …

  ]

}

But what if there are a lot (e.g. millions) of measurements for one device?  That has two problems.  First, in MongoDB there is a max document limit of 16MB.  Second, even if that limit wasn’t there, if we wanted a small set of measurements, maybe from one day of the year, we would be pulling in a lot of data on the server (and/or client) when we only needed a small portion.  In this situation, a better tactic is not to put all the data for the device in one document, but to “chunk” it.  This could be by putting N subitems in each document, and then creating a new one, or, by having some other grouping notion, such as just putting all measurements for a time period in the document.  So for the latter we could have something like this:

{ device:<d>,

  day:<date>,

  temps : [

    { when : <time>, temp : <temp> },

    …

  ]

}

We might then 

db.measurements.ensureIndex({device:1,day:1})

We can then grab all the measurements for one device for a given day, or set of days, easily with a query.  For example:

db.measurements.find( {

  device:”ABC”,

  day:{$gte:ISODate(“05-01-2013”),

          $lte:ISODate(“05-07-2013”)} } )

Note if we want a portion of a day’s measurements, we would have to pull them out client side from the returned array, or we could use map/reduce or the aggregation framework to do that.

I suppose if a product supports clustered indexes, that would be a possible approach too. MongoDB does not currently, perhaps that is a blog post for another day. Although this approach  has another nice property in that we now have fewer index entries than we had before.  That said, don’t prematurely optimize, if your collection is small, don’t worry about it — unless the grouped structure was the better schema for the use case period.

And while we have been focusing on RAM cache efficiency above, the changes also mean less iops for disk i/o too.

quick links to older stuff
Why MongoDB is (somewhat) feature-heavy

MongoDB (which I work on) is at the feature-rich end of the spectrum of NoSQL products.  Not compared to Oracle, but compared to say, Amazon Dynamo.

This was intentional. The goal was to create something general purpose that can be used, with ease, to handle a reasonably broad swath of use cases. And sometimes you need features to do that.  Secondary indexes, sorting, unique key constraints, things like that…things traditional databases have.  That is to say, the goal with the project wasn’t to make something scalable that handles a particular use case, but something scalable that works with a reasonably broad set of use cases.  That philosophy has driven a lot of what the product has ended up to be.

For example, support we are writing a content management system and we are going to store documents and allow voting on the documents.  No one should be allowed to vote more than once. In the example below we remember who has voted in an array. What we are doing is “for document 700, increment votes by 1 and add joe to the voters list; if joe has already voted do nothing.”  Note that the statement executes atomically — there is no way that joe could vote twice.

db.docs.update( { _id: 700, voters : { $ne : ‘joe’ } },
                { $inc : { votes : 1 },
                  $push : { voters : ‘joe’} }
              )

A small point but I like the example. And for me personally I find it clean enough I would be tempted to use mongo for the system even if I only needed one server.

btw the example above is in mongo shell syntax but can be done the same way from any programming language.

I’m sorry, I can’t answer your call right now, my phone is garbage collecting.
Dear Linksys,

Please do this.

Imagine if by default all wifi access points publish a “Public <rand#>” SSID in addition to doing their normal stuff.  

- This public SSID is given < 10% of total available bandwidth to the access point.

- It’s outside my real network’s NAT zone and (if there were any) firewall so no security implication for me.

If all access points do this, the world is a nicer place. 10% is de minimis, i don’t mind that being used for free on my connection as a public server.

Technically there might be terms of use issue with say, my cable modem provider.  but if the % is low enough surely everyone can just say it’s all good.

There are several things in the new gmail UI that I like but several that leave me scratching my head; see image (click to expand).  Mainly, the switch from text buttons to icons.  Can you tell effortessly which icon is archive and spam and move?  I can&#8217;t.  I suppose translation is then some extra work but it was already in place.
Also note how the back button and reply button are basically the same.
I wouldn&#8217;t blame a programmer for these &#8212; I&#8217;d likely get them wrong myself, but a company of Google&#8217;s size can have some human interface gurus.
One thing they have always gotten right is keeping the UIs clean and without much adornment.

There are several things in the new gmail UI that I like but several that leave me scratching my head; see image (click to expand).  Mainly, the switch from text buttons to icons.  Can you tell effortessly which icon is archive and spam and move?  I can’t.  I suppose translation is then some extra work but it was already in place.

Also note how the back button and reply button are basically the same.

I wouldn’t blame a programmer for these — I’d likely get them wrong myself, but a company of Google’s size can have some human interface gurus.

One thing they have always gotten right is keeping the UIs clean and without much adornment.

data centers are too reliable (sort of)

Long ago in the early days of DoubleClick (really early) we built a data center out of our mail room.  We filled it to the brim.  It would get very warm and we used up all the power in the building.

Obviously, this is not ideal.  We then bought some hosting from a class 1 data center provider : raised floors, redundant cooling, power generators, UPS room, etc.  We had hardware running in both locations.

Something strange then happened.  The “real” data center had far more outages over the next year than the mail room.

Not much has changed.  Data centers are not all that reliable.  4 nines at best. Yet failures are rare enough that most software systems don’t cut over well. They are rare enough that there isn’t enough testing of failovers, and rare enough that one doesn’t always get around to all the extra work this would require anyway.

4 nines, yet we want more.  5 nines would be nice — but can you really get there?  And at what cost.  There is another option : 3 nines!  That is to say, either make your data center really, really reliable, or don’t bother at all.

Let’s call this approach a “Redundant Array of Inexpensive Data Centers” (RAIDC). 

  • + Cheaper. Forget the generators, forget the UPS room, forget multiple egress, forget everything fancy. Just let it go down. We have more data centers, and we know how to fail over.
  • + We can then actually fail over successfully.  It then happens enough that we really have to make the failover work.
  • - We do though then have to build a system that actually can fail over to a new data center.  That requires work and cost.  But didn’t you want that anyway?
I am old

I am old

"Did you get my email?"

This question leads to madness.

c++ : don’t use pair<>

I’m sure I’ll get some argument on this, but don’t use pair<>.  Except when you are using something already requiring it, such as stl classes and stl algorithms.

Instead of pair<A,B>, why not

struct X {
  A a;
  B b;
};

The above might not look so different but how about pair<int,int> versus

struct Point {
 int x, y;
};

More descriptive.  Templates become more difficult, thus pair<> being there in the first place, but often one doesn’t need such a level of generics in an app.  So take the better readability.

Where are the SSDs in the Cloud?

SSD storage technology is massively useful.  However, it seems to be unavailable in cloud computing offerings (at least the ones I know about).

This seems an obvious gap in those solutions.  SSDs are easily deployed in one’s own data center, but not in the cloud.  That’s not good — cloud computing makes sense.

There is an opportunity here for someone in this space to be first mover and thereby gain some market share.

Google 2 step authentication - way too unwieldy

I have tried the Google two step authentication, and it is…a pain.  With several browsers and a couple operating systems on my laptop, it’s asking me to verify on SMS for every single one of them.  Not to mention on my other computers.

There’s a real need for this — if one’s account is compromised, in your email are lots of info on other sites you use such as ecommerce sites.  They likely have the same password so they are compromised too.  You might even have some credit card telltales in your mailbox if you aren’t too careful.

And compromises are easy.  With a botnet millions of automated dictionary attack attempts can be done and are hard to stop.

So this is very much needed but this implementation is has way too much friction.  There are better options and/or tweaks that are needed:

  • For example, show a photo and I click on a specific spot on the photo as part of authentication.  That’s easy to remember.
  • Or ask me a question to which I know the answer such as “what was your first pet”.
  • Don’t make me reverify over and over when I’m on the same IP address (it seems to do this and there is an argument for that, for example on a large company’s network everything may look like one IP, but in reality most threats in that situation are external anyway).

Some of these variants aren’t quite as strong but it is key that a large percentage of users have the feature on, and they simply will not as-is.

All storage except multimedia will be SSDs soon

It seems the cost today is about $2 / GB and dropping at a good clip.

The Intel 320 drives are interesting as historically on low cost style SSDs write iops have been lower than reads (albeit still very fast compared to spinning disks).  The numbers below are the Intel specs.

Random 4KB Reads

  • 40GB up to 30,000 IOPS
  • 80GB up to 38,000 IOPS
  • 120GB up to 38,000 IOPS
  • 160GB up to 39,000 IOPS
  • 300GB up to 39,500 IOPS
  • 600GB up to 39,500 IOPS

Random 4KB Writes

  • 40GB up to 3,700 IOPS
  • 80GB up to 10,000 IOPS
  • 120GB up to 14,000 IOPS
  • 160GB up to 21,000 IOPS
  • 300Gb up to 23,000 IOPS
  • 600GB up to 23,000 IOPS

Quantcast