Cool that these guys have built a tool/stack to implement a complete hadoop/postgre layer (if I understood the article correctly).
But it brings up the question...
Why is data and data processing outstripping hardware capabilities at such an alarming rate? Is this whole non-relational database performance the right direction? or should we be focusing on new hardware solutions?
It's poor man's Vertica. Mostly good for analytics workloads.
It's quite strange that they didn't reference Bigtable paper at all, while saying "to the best of our knowledge, there exists no published deployment of a parallel database with nodes numbering into the thousands". Google had a dozen bigtable clusters with more than 500 nodes and at least one cluster with a few thousand nodes (for the main crawl db), more than 3 years ago.
But it brings up the question... Why is data and data processing outstripping hardware capabilities at such an alarming rate? Is this whole non-relational database performance the right direction? or should we be focusing on new hardware solutions?