Dwight's blog
There are several things in the new gmail UI that I like but several that leave me scratching my head; see image (click to expand).  Mainly, the switch from text buttons to icons.  Can you tell effortessly which icon is archive and spam and move?  I can’t.  I suppose translation is then some extra work but it was already in place.
Also note how the back button and reply button are basically the same.
I wouldn’t blame a programmer for these — I’d likely get them wrong myself, but a company of Google’s size can have some human interface gurus.
One thing they have always gotten right is keeping the UIs clean and without much adornment.

There are several things in the new gmail UI that I like but several that leave me scratching my head; see image (click to expand).  Mainly, the switch from text buttons to icons.  Can you tell effortessly which icon is archive and spam and move?  I can’t.  I suppose translation is then some extra work but it was already in place.

Also note how the back button and reply button are basically the same.

I wouldn’t blame a programmer for these — I’d likely get them wrong myself, but a company of Google’s size can have some human interface gurus.

One thing they have always gotten right is keeping the UIs clean and without much adornment.

data centers are too reliable (sort of)

Long ago in the early days of DoubleClick (really early) we built a data center out of our mail room.  We filled it to the brim.  It would get very warm and we used up all the power in the building.

Obviously, this is not ideal.  We then bought some hosting from a class 1 data center provider : raised floors, redundant cooling, power generators, UPS room, etc.  We had hardware running in both locations.

Something strange then happened.  The “real” data center had far more outages over the next year than the mail room.

Not much has changed.  Data centers are not all that reliable.  4 nines at best. Yet failures are rare enough that most software systems don’t cut over well. They are rare enough that there isn’t enough testing of failovers, and rare enough that one doesn’t always get around to all the extra work this would require anyway.

4 nines, yet we want more.  5 nines would be nice — but can you really get there?  And at what cost.  There is another option : 3 nines!  That is to say, either make your data center really, really reliable, or don’t bother at all.

Let’s call this approach a “Redundant Array of Inexpensive Data Centers” (RAIDC). 

  • + Cheaper. Forget the generators, forget the UPS room, forget multiple egress, forget everything fancy. Just let it go down. We have more data centers, and we know how to fail over.
  • + We can then actually fail over successfully.  It then happens enough that we really have to make the failover work.
  • - We do though then have to build a system that actually can fail over to a new data center.  That requires work and cost.  But didn’t you want that anyway?
I am old

I am old

“Did you get my email?”

This question leads to madness.

c++ : don’t use pair<>

I’m sure I’ll get some argument on this, but don’t use pair<>.  Except when you are using something already requiring it, such as stl classes and stl algorithms.

Instead of pair<A,B>, why not

struct X {
  A a;
  B b;
};

The above might not look so different but how about pair<int,int> versus

struct Point {
 int x, y;
};

More descriptive.  Templates become more difficult, thus pair<> being there in the first place, but often one doesn’t need such a level of generics in an app.  So take the better readability.

Where are the SSDs in the Cloud?

SSD storage technology is massively useful.  However, it seems to be unavailable in cloud computing offerings (at least the ones I know about).

This seems an obvious gap in those solutions.  SSDs are easily deployed in one’s own data center, but not in the cloud.  That’s not good — cloud computing makes sense.

There is an opportunity here for someone in this space to be first mover and thereby gain some market share.

Google 2 step authentication - way too unwieldy

I have tried the Google two step authentication, and it is…a pain.  With several browsers and a couple operating systems on my laptop, it’s asking me to verify on SMS for every single one of them.  Not to mention on my other computers.

There’s a real need for this — if one’s account is compromised, in your email are lots of info on other sites you use such as ecommerce sites.  They likely have the same password so they are compromised too.  You might even have some credit card telltales in your mailbox if you aren’t too careful.

And compromises are easy.  With a botnet millions of automated dictionary attack attempts can be done and are hard to stop.

So this is very much needed but this implementation is has way too much friction.  There are better options and/or tweaks that are needed:

  • For example, show a photo and I click on a specific spot on the photo as part of authentication.  That’s easy to remember.
  • Or ask me a question to which I know the answer such as “what was your first pet”.
  • Don’t make me reverify over and over when I’m on the same IP address (it seems to do this and there is an argument for that, for example on a large company’s network everything may look like one IP, but in reality most threats in that situation are external anyway).

Some of these variants aren’t quite as strong but it is key that a large percentage of users have the feature on, and they simply will not as-is.

All storage except multimedia will be SSDs soon

It seems the cost today is about $2 / GB and dropping at a good clip.

The Intel 320 drives are interesting as historically on low cost style SSDs write iops have been lower than reads (albeit still very fast compared to spinning disks).  The numbers below are the Intel specs.

Random 4KB Reads

  • 40GB up to 30,000 IOPS
  • 80GB up to 38,000 IOPS
  • 120GB up to 38,000 IOPS
  • 160GB up to 39,000 IOPS
  • 300GB up to 39,500 IOPS
  • 600GB up to 39,500 IOPS

Random 4KB Writes

  • 40GB up to 3,700 IOPS
  • 80GB up to 10,000 IOPS
  • 120GB up to 14,000 IOPS
  • 160GB up to 21,000 IOPS
  • 300Gb up to 23,000 IOPS
  • 600GB up to 23,000 IOPS
google search - invincible

Google’s strategy with search has worked out perfectly : basically, continual incremental improvement.  No one can catch up.  A good example is this search:

http://www.google.com/search?q=united+5336

Providing very useful information quickly.

Never underestimate the bandwidth of a stationwagon full of tapes
“Schemaless”

In the NoSQL world it is common to talk about schemaless databases or data models.

It would be more precise to say “dynamic schema”.  In MongoDB, there are databases; a system catalog of collections; documents within collections; explicitly declared indexes for a collection.  The big difference is that “columns”, or rather fields in the document data model, are not predeclared.  Each field/value in the document is dynamic and can be present or missing.  Each value has a datatype too, so it isn’t typeless but rather dynamic or what some might call duck typing.

Here’s an example in the mongo shell.  We may have a couple docs:

> db.persons.find()
{ “name” : “jane”, “age” : 25 }
{ “name” : “ben”, “age” : 30 }

We could then add a new person with an extra attribute:

> db.persons.insert({name:’julie’,age:28,likes:’baseball’})
> db.persons.find()
{ “name” : “jane”, “age” : 25 }
{ “name” : “ben”, “age” : 30 }
{ “name” : “julie”, “age” : 28, “likes” : “baseball” }

No “alter table” necessary.  This is very helpful with agile development methodologies. 

We can take it a step further however.  The value of a field need not be consistent from document to document.  Now, in practice, it is very very common for the contents of a collection to be homogeneous.  But we have the option.  For example suppose we want to add “likes” for ben, but ben likes a couple things.  What to do?

> db.persons.update({name:’ben’},{$set:{likes:[‘math’,’baseball’]}})
> db.persons.find()
{ “name” : “jane”, “age” : 25 }
{ “name” : “julie”, “age” : 28, “likes” : “baseball” }
{ “name” : “ben”, “age” : 30, “likes” : [ “math”, “baseball” ] }

In this example, things work out particularly elegantly as even though one likes value is an array, and the other a string, we can still do some queries across them that are interesting.  This is because when querying for a value, if the value is an array, MongoDB looks into the array:

> db.persons.find({likes:’baseball’})
{ “name” : “julie”, “age” : 28, “likes” : “baseball” }
{ “name” : “ben”, “age” : 30, “likes” : [ “math”, “baseball” ] }

Likewise we can index the field:

> db.persons.ensureIndex( { likes : 1 } )

All very handy and useful.  But you might ask “won’t my data get rather dirty with no schema constraints?”  I had this concern when we started; I assumed we would just add some constraint rules later when needed.  Oddly, there hasn’t been a lot of demand for the feature, so far.  Empirically, it seems the data doesn’t get too noisy.

One other very important note: the dynamic schema is not just for developer friendliness!  There is another good reason for it.  Imagine changing the schema in a database cluster involving 2,000 servers.  It might be tricky to change that global state globally in a consistent manner.  One goal here is to store very big data sets.  Alter table is probably not going to fly with billions or trillions of documents.

P.S. For compactness, the examples above do not show the _id field MongoDB or its driver automically adds to all documents.

P.P.S. Dynamic schema is not unique to MongoDB — some other products in the space do it too…of course I’m biased this is my favorite.

genomic sites worth a look

This is a space I know very little about, but these I found interesting:

  • SNPedia.  While a lot less information than things like Entrez Gene, SNPedia is far more digestible for the layman.