Dwight's blog
Why MongoDB is (somewhat) feature-heavy

MongoDB (which I work on) is at the feature-rich end of the spectrum of NoSQL products.  Not compared to Oracle, but compared to say, Amazon Dynamo.

This was intentional. The goal was to create something general purpose that can be used, with ease, to handle a reasonably broad swath of use cases. And sometimes you need features to do that.  Secondary indexes, sorting, unique key constraints, things like that…things traditional databases have.  That is to say, the goal with the project wasn’t to make something scalable that handles a particular use case, but something scalable that works with a reasonably broad set of use cases.  That philosophy has driven a lot of what the product has ended up to be.

For example, support we are writing a content management system and we are going to store documents and allow voting on the documents.  No one should be allowed to vote more than once. In the example below we remember who has voted in an array. What we are doing is “for document 700, increment votes by 1 and add joe to the voters list; if joe has already voted do nothing.”  Note that the statement executes atomically — there is no way that joe could vote twice.

db.docs.update( { _id: 700, voters : { $ne : ‘joe’ } },
                { $inc : { votes : 1 },
                  $push : { voters : ‘joe’} }
              )

A small point but I like the example. And for me personally I find it clean enough I would be tempted to use mongo for the system even if I only needed one server.

btw the example above is in mongo shell syntax but can be done the same way from any programming language.

quick links to older stuff
I’m sorry, I can’t answer your call right now, my phone is garbage collecting.
Dear Linksys,

Please do this.

Imagine if by default all wifi access points publish a “Public <rand#>” SSID in addition to doing their normal stuff.  

- This public SSID is given < 10% of total available bandwidth to the access point.

- It’s outside my real network’s NAT zone and (if there were any) firewall so no security implication for me.

If all access points do this, the world is a nicer place. 10% is de minimis, i don’t mind that being used for free on my connection as a public server.

Technically there might be terms of use issue with say, my cable modem provider.  but if the % is low enough surely everyone can just say it’s all good.

There are several things in the new gmail UI that I like but several that leave me scratching my head; see image (click to expand).  Mainly, the switch from text buttons to icons.  Can you tell effortessly which icon is archive and spam and move?  I can&#8217;t.  I suppose translation is then some extra work but it was already in place.
Also note how the back button and reply button are basically the same.
I wouldn&#8217;t blame a programmer for these &#8212; I&#8217;d likely get them wrong myself, but a company of Google&#8217;s size can have some human interface gurus.
One thing they have always gotten right is keeping the UIs clean and without much adornment.

There are several things in the new gmail UI that I like but several that leave me scratching my head; see image (click to expand).  Mainly, the switch from text buttons to icons.  Can you tell effortessly which icon is archive and spam and move?  I can’t.  I suppose translation is then some extra work but it was already in place.

Also note how the back button and reply button are basically the same.

I wouldn’t blame a programmer for these — I’d likely get them wrong myself, but a company of Google’s size can have some human interface gurus.

One thing they have always gotten right is keeping the UIs clean and without much adornment.

data centers are too reliable (sort of)

Long ago in the early days of DoubleClick (really early) we built a data center out of our mail room.  We filled it to the brim.  It would get very warm and we used up all the power in the building.

Obviously, this is not ideal.  We then bought some hosting from a class 1 data center provider : raised floors, redundant cooling, power generators, UPS room, etc.  We had hardware running in both locations.

Something strange then happened.  The “real” data center had far more outages over the next year than the mail room.

Not much has changed.  Data centers are not all that reliable.  4 nines at best. Yet failures are rare enough that most software systems don’t cut over well. They are rare enough that there isn’t enough testing of failovers, and rare enough that one doesn’t always get around to all the extra work this would require anyway.

4 nines, yet we want more.  5 nines would be nice — but can you really get there?  And at what cost.  There is another option : 3 nines!  That is to say, either make your data center really, really reliable, or don’t bother at all.

Let’s call this approach a “Redundant Array of Inexpensive Data Centers” (RAIDC). 

  • + Cheaper. Forget the generators, forget the UPS room, forget multiple egress, forget everything fancy. Just let it go down. We have more data centers, and we know how to fail over.
  • + We can then actually fail over successfully.  It then happens enough that we really have to make the failover work.
  • - We do though then have to build a system that actually can fail over to a new data center.  That requires work and cost.  But didn’t you want that anyway?
I am old

I am old

“Did you get my email?”

This question leads to madness.

c++ : don’t use pair<>

I’m sure I’ll get some argument on this, but don’t use pair<>.  Except when you are using something already requiring it, such as stl classes and stl algorithms.

Instead of pair<A,B>, why not

struct X {
  A a;
  B b;
};

The above might not look so different but how about pair<int,int> versus

struct Point {
 int x, y;
};

More descriptive.  Templates become more difficult, thus pair<> being there in the first place, but often one doesn’t need such a level of generics in an app.  So take the better readability.

Where are the SSDs in the Cloud?

SSD storage technology is massively useful.  However, it seems to be unavailable in cloud computing offerings (at least the ones I know about).

This seems an obvious gap in those solutions.  SSDs are easily deployed in one’s own data center, but not in the cloud.  That’s not good — cloud computing makes sense.

There is an opportunity here for someone in this space to be first mover and thereby gain some market share.

Google 2 step authentication - way too unwieldy

I have tried the Google two step authentication, and it is…a pain.  With several browsers and a couple operating systems on my laptop, it’s asking me to verify on SMS for every single one of them.  Not to mention on my other computers.

There’s a real need for this — if one’s account is compromised, in your email are lots of info on other sites you use such as ecommerce sites.  They likely have the same password so they are compromised too.  You might even have some credit card telltales in your mailbox if you aren’t too careful.

And compromises are easy.  With a botnet millions of automated dictionary attack attempts can be done and are hard to stop.

So this is very much needed but this implementation is has way too much friction.  There are better options and/or tweaks that are needed:

  • For example, show a photo and I click on a specific spot on the photo as part of authentication.  That’s easy to remember.
  • Or ask me a question to which I know the answer such as “what was your first pet”.
  • Don’t make me reverify over and over when I’m on the same IP address (it seems to do this and there is an argument for that, for example on a large company’s network everything may look like one IP, but in reality most threats in that situation are external anyway).

Some of these variants aren’t quite as strong but it is key that a large percentage of users have the feature on, and they simply will not as-is.

All storage except multimedia will be SSDs soon

It seems the cost today is about $2 / GB and dropping at a good clip.

The Intel 320 drives are interesting as historically on low cost style SSDs write iops have been lower than reads (albeit still very fast compared to spinning disks).  The numbers below are the Intel specs.

Random 4KB Reads

  • 40GB up to 30,000 IOPS
  • 80GB up to 38,000 IOPS
  • 120GB up to 38,000 IOPS
  • 160GB up to 39,000 IOPS
  • 300GB up to 39,500 IOPS
  • 600GB up to 39,500 IOPS

Random 4KB Writes

  • 40GB up to 3,700 IOPS
  • 80GB up to 10,000 IOPS
  • 120GB up to 14,000 IOPS
  • 160GB up to 21,000 IOPS
  • 300Gb up to 23,000 IOPS
  • 600GB up to 23,000 IOPS
google search - invincible

Google’s strategy with search has worked out perfectly : basically, continual incremental improvement.  No one can catch up.  A good example is this search:

http://www.google.com/search?q=united+5336

Providing very useful information quickly.

Never underestimate the bandwidth of a stationwagon full of tapes

document.getElementById('tumblr_controls').allowTransparency=true;

Quantcast