Have the Tables Turned on NoSQL?

shrumm · on Jan 16, 2021

Honestly, I think the more important takeaway isn’t the decline of NoSQL. As the article says, it’s the conclusion that’s what’s good for FAANG isn’t necessarily what’s good for your average project. I know this sounds obvious but it’s been a constant source of frustration for me with the hype train.

You could apply the same logic to a whole bunch of tech like Kubernetes, service mesh etc etc and arrive at the same result.

Every tech has a trade off, understanding it is critical. Don’t pick tech spot SOLELY based on FAANG use.

zyang · on Jan 16, 2021

From inside FAANG, I have seen many projects over engineered to death. I think the problem is lack of experience combined with the selfish need to make the job more interesting.

14u2c · on Jan 16, 2021

> selfish need to make the job more interesting.

I'll admit I have been tempted to stray down this path in the past. I think for me it was the fact that the company refused to provide any time for training or improvement, leading to the desire to chose tech in projects that would build skills rather than being the most expedient.

yibg · on Jan 16, 2021

Resume based development.

zaltekk · on Jan 16, 2021

Otherwise known as Promotion Based Architecture. You’ve got to make the problem difficult enough and the solution complex enough to demonstrate you can “operate at the next level.”

bergie · on Jan 16, 2021

RDD - Resume-Driven Development

You can find this in lots of places not just FAANGs

amelius · on Jan 16, 2021

I wonder how well "I worked on Google Wave" is perceived on a resume.

makapuf · on Jan 16, 2021

Google wave was a very interesting project, I dont remember it being badly engineered. More interesting that another j2ee crud app.

amelius · on Jan 16, 2021

The origin of my comment was that Google Wave was often considered to be "engineering for engineering's sake".

nailer · on Jan 16, 2021

...we say, an then open Teams or Slack because that's where work lives now.

ramchip · on Jan 17, 2021

Yeah... sometimes you get a junior engineer who's drunk the koolaid a bit too much and think they need a bunch of fancy tech for their web dashboarding dealing with 5 req/s.

But I think it's pessimistic to reduce it to people trying to make the job more interesting, and at least with more senior people it's often about reusing technologies internally.

For instance, an application may have a use case that requires very high messaging rates, so they build up a team to operate Kafka clusters. Then later you end up with a bunch of teams using Kafka when far simpler things would do the trick, because it's already available and there's a whole team supporting it, with expertise to debug it if things go wrong. It doesn't look good if you take the system on its own, but in context it's a pretty reasonable decision.

IMHO sometimes this reasoning goes too far (I've seen people suggest we rewrite super relational apps to NoSQL to avoid operating SQL DBs!), but it usually comes from good intentions.

dward · on Jan 16, 2021

Spanner and Dremel/BigQuery are both SQL database in that you interact with them by sending them SQL. Maybe I don't understand the terminology.

alisonkisk · on Jan 16, 2021

For non-analytics use cases, only in the loosest sense. Much of the "schema" is key-value is binary blobs.

xyzzyz · on Jan 16, 2021

Dremel is not a database.

egeozcan · on Jan 16, 2021

The common pattern one sees in such over-engineered mortgage projects is that they usually have a cool-name, even though they are internal use only.

I guess this is because of the hope for it getting open sourced and the dev becoming famous. Also, the less experienced people are, the more they are thinking that they are breaking new grounds.

ikiris · on Jan 16, 2021

Its usually less about more interesting, and more about "show meaningful contributions for performance reviews"

hooloovoo_zoo · on Jan 16, 2021

Sometimes engineers have perverse incentives. If you want a FAANG job, it probably helps to have experience with FAANG tech, even if your current company doesn't need it.

ehnto · on Jan 16, 2021

It is a very frustrating industry to spend a long time in. A hamster wheel of constantly shifting goal posts for mastery. I wouldn't mind if they were fundamental advances in software development, but it's mostly the same stuff with small advances but massive learning curves of arbitrary, non-transferrable minutiae.

The side effect of this ever shifting tooling set, is almost no one masters anything. All software is varying levels of crap written by newbies, because when the tools change every project you are always a newbie in that tool set.

kqr · on Jan 16, 2021

> I wouldn't mind if they were fundamental advances in software development, but it's mostly the same stuff with small advances but massive learning curves of arbitrary, non-transferrable minutiae.

I think this is pinpointing the key exploitability in the market. If you have seen enough tech come and go, you can figure out which 98% of this year's idea are the same as the year before, and 40 years ago.

At that point you can start cutting through the bullshit and design things using brand new tech as if you had used it for 20 years already. At that point you're way ahead of the rest of the pack.

(And you can choose not to use the brand new thing, and argue convincingly for why the almost-exactly-the-same 30 year old, more mature and stable, tech is better.)

krab · on Jan 16, 2021

I mostly agree.

Though you may still develop blind spots and miss a crucial difference in this year's iteration of the same idea.

vinay_ys · on Jan 16, 2021

Firstly software development is not one monolithic job. There are many different jobs within software development.

You can be a systems engineer – understand the computers at a fundamental level – with focus on systems software – building low-level high-performance components/kernels – for storage systems, databases, in-memory systems etc (each of these have a lot of technical domain knowledge specialisation within them). You can spend 10 years in this field and continuously develop/enhance your expertise incrementally – it is quite stable and rewarding experience. A branch of this specialization is distributed systems engineering – where emphasis is on software operating in a distributed cluster over network – which has additional challenges unique to the aspect of being distributed.

You can be an applications engineer – understand application software engineering methodology – with focus on user/business facing application feature development in a scalable software team setting. The key focus here is not so much about low-level computer systems but more about software engineering discipline. It is about the inter/intra-team sport that is developing and continuously evolving a very large application software code base that is alive with changing/evolving user/business features. It is about modeling the functional domain of the user/business/real world. It is about the modern consumer Internet software development techniques – experimentation, live incremental safe feature releases etc. A branch of this specialization is developing application frameworks and tooling (rather than user/business functional features) to make the life of a feature engineer more productive. You can spend 10 years in this field and become an expert at software engineering discipline of churning out quality features on time and on budget.

The experience and learning in the above job families transcend any particular technology stack. The learnings are transferable from one tech-stack/functional-domain to another with relatively minimal effort. This effort is part of the work itself and doesn't turn you into a newbie for having to do it.

p.s: A generalist full stack engineer is usually an applications engineer who is somewhat good at systems engineering and is able to glue together systems to achieve the application features well. These engineers can take a startup from start and through initial growth phase and up to start of hyper growth phase. But there's a scale/performance threshold – where the scale of application deployment grows, performance starts to hurt your users, you need strong systems engineering specialists to fix those deeper systems/distributed-systems problems. Public cloud systems have been continuously raising that threshold since the beginning.

llbbdd · on Jan 16, 2021

As someone at a FAANG company right now, your second paragraph neatly identifies a core frustration I have with my job that I haven't been able to articulate before now. Might be time for a change...

theshadowknows · on Jan 16, 2021

I just learned last week that we're bringing Pega on for some projects. Never even heard of Pega before then lol. Time to sit down with a free account and learn something else that we'll throw away in five years.

treeman79 · on Jan 16, 2021

5 years is sadly decent run.

Was in a big Javascript project, where any article over 3 months old was useless or wrong. The internals of the app constantly being rewritten to flavor of the month.

jrochkind1 · on Jan 16, 2021

This rings so true to me.

I really like mastering things, so I've tried to stick with a technology stack (that I have kind of mastered), but I don't think it's been good for my career.

tjpnz · on Jan 16, 2021

It's not just engineers either. I've seen new EMs copy and paste ill fitting FAANG development practices into organizations and then bounce off to which ever company championed it not long after.

quetzthecoatl · on Jan 16, 2021

Major reason why senior engineers/architects go for whatever technology is the hottest in the market (ie used at FAANG) is because it's the safe choice. For every new project, or for a fresh rewrite, one can either go for the incremental improvement over whatever was in use, or for the revolutionary ones that worked for FAANG and open sourced by them. If one were to go with the former, their solution will always be compared against the hypothetical much better one using the latest shiny framework/componenets used by FAANG. It doesn't matter if you made the right choice, because for you to prove you did, you have to again rebuild the application using the FAANG framework and compare the profiling/scalability numbers. Much easier to just go with kubernets microservice service mesh on nosql.

AnIdiotOnTheNet · on Jan 16, 2021

I can't help but wonder if one of the big differences between real engineers and silicon valley software "engineers" is that real engineers don't make decisions this way.

sidlls · on Jan 16, 2021

I don't wonder at all: it's (more or less) true.

Other differences that are important:

- Real engineering interviews test skills that are pertinent to the day to day work of the engineers; FAANG (and copycat) interviews tend not to

- Real engineering isn't plagued by fad-following or driven by personalities: it's backed by research and established empirical practice

That said, the profession of "real engineering" has its share of problems: advancement is as political as it is in any other industry and it's got relatively low pay compared to the true value of the output are two of the biggest.

FartyMcFarter · on Jan 16, 2021

> Don’t pick tech spot SOLELY based on FAANG use.

Agreed, and I'd go even further than that:

FAANG use should be quite far down the list of criteria, since the use cases and engineering culture are so different to most people's.

spyspy · on Jan 16, 2021

I’d go even a step further and say FAANG companies, despite what their tech blogs might argue, aren’t at all immune from making very dumb decisions or adopting very silly practices. To follow those same ideas blindly could cripple your company.

bombcar · on Jan 16, 2021

In fact - FAANG companies have the resources to make horribly inefficient processes work (in human time or computer time) - I suspect many of the AI powered things are examples of this - and a smaller company will die trying to get it to work.

strken · on Jan 16, 2021

I don't think ML-powered systems are inefficient[0], so much as they are incremental gains over the pre-existing system that are only justified by huge sales volumes. If you can get a 4% lift in sales by using a complicated ML model with 100k features instead of a handful of basic heuristics, then whether it's worth spending engineering effort on depends on what your sales are.

[0] Inefficient in the sense that they're wasting money which could be easily reclaimed - they're probably going to be improved over time as the state of the art improves, but that relies on the whole field moving forward.

QuesnayJr · on Jan 16, 2021

I do some consulting, and one odd consequence of the rise of ML is that surprisingly many companies weren't even doing simple models.

xorcist · on Jan 16, 2021

One problem is that technology is sold with stories about wildly successful companies. This storytelling doesn't concern itself with real world use cases.

I can't count the number of times I've heard Docker pitched with "it's what Google uses!", while the truth was that they didn't.

Before that it was MongoDB which was "just like Spanner which was key to Google's success", while in reality it wasn't even similar.

And you should organize in Spotify-like tribes, even though that idea is one person's idea about how they wished things worked, filtered though an eco chamber of conference talks.

As an engineer it is easy to dismiss these ideas when you have enough behind the scenes knowledge. But the point of storytelling isn't to build technology, but to pitch it. It does a good job at that. So use it, but wisely.

Bombthecat · on Jan 16, 2021

I would argue that kubernetes is often overblown for small firms.

But service mesh is getting easier and easier by the day. And brings quite a lot of benefits!

tamrix · on Jan 17, 2021

Even within FAANG companies, developers still don't know how to write Sql.

f6v · on Jan 16, 2021

> Because an SQL database uses a schema or structure, this means changes are difficult. Say you’re running a production database full of a million records.

Articles like this one perpetuate the myths in the minds of young developers. First off, “millions of records” is nothing this days. More importantly, the scheme ends up living somewhere. If it’s not in your database, you’re likely managing it in the app. There’s no free lunch when it comes to scheme for a typical SaaS.

Twisell · on Jan 16, 2021

I personally stoped reading after this sentence :

> And while SQL statements are fun, it’s easy to drop all tables while futzing with a key or corrupting an entire repository with a malformed query.

This seem to indicate that the writer don't real get a grasp of very basic modern SQL databases features such as permissions, constraints and transactions.

Without such understanding how can one form an accurate comparison of NoSQL benefits vs SQL. Or maybe worse the author has a good understanding but prefer to make bold and false statements to push his point.

mosselman · on Jan 16, 2021

It isn’t even a permission thing. If you are just trying to query some data you don’t accidentally replace “select” with “delete”. The author is trying too hard to put sql in a bad light. Any syntax I have seen to query nosql databases looks clumsy to me as well. Sql I am familiar with, so I am probably biased, but they both have they pros and cons.

40four · on Jan 16, 2021

Totally agree, I thought this was a big red flag. What an odd thing to say.

Sure enough, the rest of the article it littered with similar strange, and inaccurate assertions. This is not a good article.

buckminster · on Jan 16, 2021

It reads like SEO spam: a plausible collection of statements scraped from other articles and then cobbled together with no understanding. It's a shame that stackoverflow has sunk to this.

40four · on Jan 16, 2021

It really does. Just kind of rambles, throws around a lot of tech words. Meanwhile none of the points really made any sense.

Never really read anything on their blog before, but this doesn’t make me want to.

hinkley · on Jan 16, 2021

> If it’s not in your database, you’re likely managing it in the app

“Managed in your app” sounds like a benign state of affairs.

Anything that doesn’t have a home ends up smeared across your entire codebase. It isn’t that it’s in the app, it’s that it’s everywhere in the app, meaning changing it becomes a huge investment of energy that people will try to avoid or put off.

Joeri · on Jan 16, 2021

You can have a SQL database and still end up with assumptions about the data smeared across your codebase. I've worked on multi-million line codebases on top of a SQL database where nobody dared change the schema of some (very non-optimal) tables because too much code directly depended on the structure of those tables. Having a clean and DRY data access layer is necessary regardless of the underlying database.

As soon as multiple independent codebases share the same database I would argue you need to put an API on top of that database and turn it into a microservice that owns its database. Otherwise the internal details of how the tables are structured will wrap itself into the codebases and make it very hard to evolve the database's schema.

dragonwriter · on Jan 16, 2021

> Having a clean and DRY data access layer is necessary regardless of the underlying database.

SQL databases (via views and even sprocs) allow you to abstract particular client’s view of the data from the base storage layer inside the database.

> As soon as multiple independent codebases share the same database I would argue you need to put an API on top of that database and turn it into a microservice that owns its database.

An RDBMS is is an integrated service that owns its own datastore with a very-well-defined, extremely battle-tested API designed to support multiple clients with completely different views of and access to the data, all as logically isolated as necessary from the design of the base storage layer.

If you aren't using an RDBMS, sure you may need to wrap something around the datastore that provides a tricky-to-get-right subset of what an RDBMS provides fairly trivial-to-use facilities for out of the box, just like like not using an RDBMS often forces you to do for another subset if you are concerned about integrity.

nojokes · on Jan 17, 2021

Corollary: do not let multiple applications to access the same database objects.

jacques_chester · on Jan 16, 2021

This is where views shine. "It's microservices for data!"

berkes · on Jan 16, 2021

But if you 'have it in the database' it will still be smeared across your app too.

'Putting stuff in one place', regardless of that place, is hard. Necessary, but hard. And it requires tradeoffs.

If you need a 'sorry this username is taken' friendly error, your app needs to handle constraint errors from your DB. Even if only on the translation layer. At which point you'll have it duplicated on multiple layers, add tight coupling between layers, or need to forego that message and e.g. settle with a generic exception instead.

JamesSwift · on Jan 16, 2021

The difference is that the actual constraint lives in one place and the rest of the locations are UX benefits to help the user. The system doesn't get into a bad state just because you forgot to add the constraint in the 100th location.

berkes · on Jan 16, 2021

> rest of the locations are UX benefits to help the user.

In my experience this is a pipe dream.

Maybe in my simple example, one could parse the Constraint Exception and map that to a field and user-friendly error. Maybe. No framework or ORM that I've ever seen that does this though.

But even when it does: it still requires you to do the parsing and mapping in the application: introducing a tight coupling (e.g. you cannot add a constraint without releasing new locales).

In practice, is my experience, you'll most likely have some constraints in the DB, some validations in your ORM, some of which overlap, some of which are unique to one of both.

Which is arguably worse than having each app that uses the database repeat that. It all depends on the use-case, obviously.

JamesSwift · on Jan 16, 2021

Definitely. You won't have the rich exception messages all over. However, if the rule is a _business rule_ then it must either live _in_ the DB, or (very frequently) live in a dedicated repository layer that all application access goes through. Otherwise its not a rule and you _will_ forget to enforce it at some point.

berkes · on Jan 18, 2021

This is the distinction we make too: business validations go in the database, ux-validations go in the application.

In practice, however, this means the business validations are duplicated all over the place (but always enforced, as last line of defence, in the database).

It also means customers get more frequent 500 errors (exceptions): when a business rule is implemented in DB but not (yet) in an application.

It is messy.

brabel · on Jan 16, 2021

> It isn’t that it’s in the app, it’s that it’s everywhere in the app

That's only if you don't know how to properly code a data access layer in your application. And if you have many apps using the DB, perhaps the data layer should be in a library.

hinkley · on Jan 16, 2021

I think you need to take a long hard look at the Venn diagram of the user base we are talking about.

People who haven’t even learned about Chesterton’s fence have no idea how much they don’t know about robust software.

dragonwriter · on Jan 16, 2021

It's also a lot easier to mess up evolving the logical schema and result with unexpected and incoherent database state if your store doesn't enforce the logical. schema. Sure, the more the logical schema is enforced, the more you are forced to do up-front when the logical schema changes, but that work prevents you from:

(1) apply a data migration that fails to result in a state that complies with the logical schema, or

(2) producing a state inconsistent with the logical schema because your application code doesn't correctly observe the schema, as defective code will fail for violating constraints instead.

physicles · on Jan 17, 2021

The book Designing Data-Intensive Applications talks about “schema on write” vs “schema on read”. In order to interpret your data, you must apply a schema, so your choice is whether you do that explicitly when the data is written, or implicitly when it’s read.

Or as Yoda would say, Schema read or schema write, there is no “no schema”.

TekMol · on Jan 16, 2021

Adding a column to a table might be a single line change in the application. No matter if you are using SQL or NoSQL.

Say your users table looked like this:

    id name country

And now you want to add their city:

    id name country city

This might cause the whole table to be locked and copied in SQL. While it might need no action at all in a NoSQL DB.

nlitened · on Jan 16, 2021

> This might cause the whole table to be locked and copied in SQL.

The answer is always the same to these kinds of complaints: just use Postgres please.

pseudoramble · on Jan 16, 2021

If you don’t mind me asking, how does Postgres fix this? Do they have a more sophisticated locking mechanism, or maybe a copy offline until it’s ready kind of a system?

funcDropShadow · on Jan 16, 2021

If the added column is nullable or if you provide a constant default value, the table is not copied, neither online nor offline.

izacus · on Jan 16, 2021

And the code dealing with the missing "city" field across your whole codebase just appears out of nowhere now? :)

Dealing with schema changes is complex - NoSQL just moves it elsewhere where its easier to forget about it.

bzb6 · on Jan 16, 2021

That’s what default field values and ORMs are for...

funcDropShadow · on Jan 16, 2021

You still have to decide if you want to use the default value or something else at every place.

Jeema101 · on Jan 16, 2021

> This might cause the whole table to be locked and copied in SQL

That was indeed the case in the past, but not so much anymore except for certain situations. MySQL, for example, has had support for in-place table alterations for a while [1]. I've used it in production and it works very well IME.

[1] https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl.ht...

jchb · on Jan 16, 2021

“might cause”, yes. But at least with Postgres that only happens if you add a default value to the new column. You can add the default value in your application instead, just like you would with your average NoSQL DB. Then, when you have low load on your system, you can migrate the rows in batches to have a default value, and eventually remove the application default.

phiresky · on Jan 16, 2021

Since Pg12, even adding a column with a default doesn't lock the table anymore - that only happens if you later change or remove the default

mfritsche · on Jan 16, 2021

But if you don't control the client, you will now have to deal with client side migration and server database version management. If you create an additional v3, you will need to decide to either keep v1->v2->v3 code or v1->v2, v1->v3. Also, reporting.

f6v · on Jan 16, 2021

Can’t you do this concurrently these days? Also, managing scheme in the database was the least of my issues when deleting/adding fields. You still need to make sure your clients are resilient against null or missing fields in the responses, and your scheduled jobs don’t query the data which isn’t there anymore. Point is, not having to alter the db scheme doesn’t make such a big difference. You still need to make sure your system overall is build in way that allows smooth data migrations.

arkh · on Jan 16, 2021

> a single line change in the application

That's if you only have one application accessing your data.

Once you get multiple apps adding and retrieving data from the same data store, having the database do some work starts to make more sense.

legerdemain · on Jan 16, 2021

This blog post is calling at least two different things SQL, and it's kind of infuriating.

SQL as a query language is never going away. Virtually every database has found it necessary to offer a SQL-like query language: Cassandra's CQL, HiveQL, Couchbase query language, and so on. SQL is a human-readable, composable formalism for describing data.

What's gone away is the practice of writing complex, highly linked, normalized database schemas with layers of constraints and foreign key references. That was banished to the land of stagnant enterprise 10 years ago and is not coming back.

The last 10-15 years have been an evolution from mostly static, deeply linked, highly structured data to shallow schemas, append-only updates, denormalized data, and stream processing. If you data is a stream of updates, there's not as much pressure to roll back. If your data is mostly defined by a series of processing pipelines that live entirely outside your data warehouse, there's not a lot of upside on enforcing constraints in the DB. If anything, we have learned that it's very useful to offer different denormalized views of the data in the DB to different consumers.

MS SQL Server is not roaring back. Businesses have just learned to unbundle data processing from data warehouses. Data warehouses now have fewer tasks to focus on, such as scalability. And if your DB is just a dumb replica with a flat schema, whether it's an RDBMS or not is pretty unimportant.

rufus_foreman · on Jan 16, 2021

>> What's gone away is the practice of writing complex, highly linked, normalized database schemas with layers of constraints and foreign key references

That hasn't gone away, that is the default for most corporate CRUD apps, for good reasons. Apps that aren't corporate CRUD apps are a rounding error away from not existing. You use the term data warehouse 3 times, are you sure you are describing a general case rather than one you are familiar with?

If you want to offer different views of data in a normalized database to different consumers, one way to do that would be to use views.

CharlesW · on Jan 16, 2021

> This blog post is calling at least two different things SQL, and it's kind of infuriating.

For better or worse, that's not what "NoSQL" means. I understand that the name is a bit infuriating, and I personally map it to "non-relational" as I read.

> What's gone away is the practice of writing complex, highly linked, normalized database schemas with layers of constraints and foreign key references.

This may be true for some use cases, but in general that's wishful thinking.

CaptainZapp · on Jan 16, 2021

> This may be true for some use cases, but in general that's wishful thinking.

Why would avoiding implementation of database level constraints be "wishful thinking"?

Au contraire: I strongly believe that data level constraints should be tied to the database.

Sure, there's a trade off between indexing, constraints and performance. But I rather have an additional unique index on a column than relying on the fact that all developers in an organization always do the right thing.

If the database is my responsibility then I'll make damn sure that it's not possible to fuck it up with shlocky code.

jolux · on Jan 16, 2021

There still seems to be a decreasing amount of appreciation for the relational model and the power of SQL databases to maintain data integrity. I'm not quite sure why because relational databases are a uniquely powerful tool in software engineering.

smadge · on Jan 17, 2021

> SQL is a ... composable formalism

I think SQL is a poor language in terms of composability, in contrast to a functional query language (Frankel and Buneman, 1979) or a relational algebra (Codd, 1970).

jeffbee · on Jan 16, 2021

SQL should go away, though. It is an astonishingly poor method of communicating your query to the database server. Generating and parsing it is problematic and expensive. It really has nothing going for it, except that a lot of people already know it.

not_knuth · on Jan 16, 2021

Interesting. I've never seen SQL as a limiting factor, since 99% of the cases I use I just need to get something by it's id or a simple query (select, join, where, order).

And it's really easy to learn too. I've seen many people without coding background pick it up and this is definitely a bonus where I work. Otherwise that 1 CS/Data Science guy is pre-occupied with hammering out queries for everyone.

What kind of limitations do you run into? And what alternative do you propose (besides GraphQL)?

jakear · on Jan 16, 2021

Not the parent, but I quite like the Kusto query language. It’s Microsoft specific, but the overall concept is nice and could be implemented more broadly. The operations are described as a pipeline, which to me is much more readable (and writable!) than SQL, where it feels like I’m always bouncing my mental cursor around to figure out what a query is doing. I’m sure that’d reduce with more SQL exposure, but understanding Kusto came pretty much instantly for me.

Also, each operation being its own line in the pipeline makes modification an absolute breeze, simply comment out, reorder, etc lines and the result will usually also be a valid query.

https://docs.microsoft.com/en-us/azure/data-explorer/kusto/q...

elygre · on Jan 16, 2021

I was reading the same parent post, also thinking about kusto!

It really is very nice. It would be super interesting to see how it would fit on top of Postgres or Oracle.

iudqnolq · on Jan 16, 2021

Kusto looks very similar to the Ecto DSL for SQL queries (elixir library). I really like the pipeline operation, I wish more languages added it.

https://hexdocs.pm/ecto/Ecto.Query.html#module-macro-api

edmundsauto · on Jan 16, 2021

SQL is pseudo-readable by people like PMs, non-technical analysts, marketing, customer success, etc.

As a test: I just showed my non-technical wife the following snippet of SQL, and she was able to tell me what it retrieved, as well as modify it to find a different player or statistic.

SELECT player_name, COUNT(*) as num_hits FROM baseball_players WHERE result='base_hit' GROUP BY 1;

chihuahua · on Jan 16, 2021

I think your COUNT(*) is just going to be 1 unless you add a GROUP BY to your query.

elygre · on Jan 16, 2021

The good part is that making error in the application layer (like in the example) are less likely to destroy the database, because the database has a schema.

(Yeah, the example was a query. I noticed.)

edmundsauto · on Jan 16, 2021

Updated, thank you!

jakear · on Jan 16, 2021

Pretty telling for the “look SQL is so easy!” example to have a bug.. :)

As mentioned in a sibling comment, Kusto Query Language^1 is potentially more readable here:

> SELECT player_name, COUNT(*) as num_hits FROM baseball_players WHERE result='base_hit' GROUP BY 1;

baseball_players

| where result=‘base_hit’

| summarize num_hits = count() by player_name

| project player_name, num_hits

1: yes this is msft specific, yes I work at msft, no not on that team. Just a happy user. The point is the overall pipeline concept more than msft’s specific implementation. https://docs.microsoft.com/en-us/azure/data-explorer/kusto/q...

edmundsauto · on Jan 20, 2021

How dare you underestimate me! I would have screwed up the query in any language ;)

This is also quite readable, no arguments there! I personally think SQL is a great language for broad appeal. I understand why people don't like it - there are many funky aspects - but I also understand why it's become dominant. Because it's just so damn useful.

WRT the MS specific language, my issue there is portability of knowledge. Someone with experience in say Oracle, 20 years ago, can reuse those skills with SQL. IMO, we need more common-language, independent-implementation tools like SQL in order to enable more people to code.

ethbr0 · on Jan 16, 2021

The biggest benefit to anything approaching pipelining is making data-interactive developers think about intermediate and transient state.

In my enterprise coding experience, most app developers don't thing of db's as anything other than "one, current state." Which makes debugging a nightmare.

edmundsauto · on Jan 27, 2021

I’d love to hear more of your thoughts on this - I hadn’t considered DBs as a form of state. Obviously they are, it just hasn’t occurred to me.

If I may borrow a related concept - what other mental models should I CASCADE the implications of this new perspective to? Ie, how does this change how you approach problems? Are there limits where you would call out?

jeff-davis · on Jan 16, 2021

Over the course of SQL's existence, how many entire programming languages have built up entire ecosystems? It doesn't make sense that people would upend their language stack every ten years, but then be afraid to learn a new query language.

Most developers need to know their programming language deeply; but not necessarily the query language, which might be hidden behind an ORM anyway.

In theory, at least, changing the query language would disrupt fewer people than, say, moving from C++ to .NET or Perl to Ruby or Python to Go.

nicoburns · on Jan 16, 2021

I would be very willing to change query langauges. Problem is there's nothing that conpetes with SQL that offers even close to equivalent functionality.

sharadov · on Jan 16, 2021

What in your opinion is a better way of communicating with the database other than SQL?

snuxoll · on Jan 16, 2021

I mean, having a sane protocol designed for use by applications rather than humans would be nice.

It’s not that SQL is bad, it’s that mechanically generating it is wrought with peril, gotchas, and a huge mismatch between the code we write to work with the data we retrieve from these sources.

Hell, s-expr’s would be a much better format and would require little implementation work.

ethbr0 · on Jan 16, 2021

How would s-expr's be a better format? (My lisp exposure is minimal)

snuxoll · on Jan 17, 2021

For an application level interface they make sense - they’re trivial to generate and parse in any language, don’t require expensive bespoke parsers and lexers to be written, and the basic constructs are more appropriate for a query interface than something like JSON or other common serialization formats while still being human readable if needed.

Again, SQL isn’t a terrible language - it’s just not designed to be mechanically generated in a sane manner. Libraries like jOOQ are useful because they handle a lot of pitfalls with runtime-generated queries, when we could avoid them altogether by having a better way of applications (rather than users) to query and manipulate data.

jeffbee · on Jan 16, 2021

I’ve seen a -lot- of engineering time spent trying to adapt to transient quirks of sql query planners, where the programmer expected a filtered range scan and the database elected for a materialized temp table, or similar disasters. Each and every one of those people would have been better served if they could have just specified the op tree that they wanted to be executed.

ako · on Jan 16, 2021

The right way to execute the query depends on the data in the database, and parameters in your query. I think overall a lot of database time and development time has been saved by depending on query planners.

CaptainZapp · on Jan 16, 2021

Don't forget index statistics, which can turn any query into a full fledged catastrophy if they're not up to date.

In my experience it's extremely rare for an optimizer to go awry on a well maintained database.

And for the rare case where your data - and access patterns are so weird that it does happen you can always employ hints. Which is a bit nasty, granted. But it's not that you would use them excessively on a well designed and maintained database.

It's probably obvious that I very much agree with your take.

ethbr0 · on Jan 16, 2021

Hints are required in an efficient, general purpose system.

Either you can have everything fully-specified at all times (a waste of time an effort, as parent noted), or you can limit general application (no overriding defaults), or you can allow a method for overriding defaults (when necessary).

Of those, it seems shortsighted for people to complain about the last, given it's strongly arguably the best of the three options.

sharadov · on Jan 16, 2021

Absolutely agree.

ineedasername · on Jan 16, 2021

Examine the execution plan, use "hints". This is pretty common DBA stuff. If by "engineer" you mean a non-dba programmer, then that's the problem: lack of domain expertise.

kamranjon · on Jan 16, 2021

Use EXPLAIN ANALYZE and save yourself some time. It's not that difficult to figure out what the planner is doing and tweak it. I think SQL is the best we got but I'd love to hear what you think is a better alternative.

jusssi · on Jan 16, 2021

I don't think parent was saying they haven't used EXPLAIN ANALYZE or it's equivalent.

Explain tools are quite good at showing what the db chose to do. They generally suck at explaining why that particular plan was chosen, or why a particular strategy was not used (i.e. why the table scan when there's a perfectly good index).

ethbr0 · on Jan 16, 2021

Isn't that inherent in the abstraction represented by SQL?

In that, it's optional that the DBS explains why it chooses what it chose (although it can happily tell you what).

Were it to do so, that'd be a pretty big crack in the abstraction layer to peer through, and would likely cause more footgunning by (1) developers peeking at the why, (2) developers assuming stability of the why, & (3) ossification of the underlying DBS engines, as now program functioning depends on their internals behaving a certain way.

(Although I guess it's a general win for SQL that we're discussing performance differences, rather than correctness differences)

jusssi · on Jan 16, 2021

As it is, people already have to look into "why" for practical considerations. A database failing to perform will lead to a timeout somewhere else and a 500 error served to an end user.

With the lack of tooling support, information is extracted by modifying the query by trial and error (seeing what needs to be changed to flip it over to desired behavior), from dark corners of the internet, and by reading the source code of the storage engine.

My latest expedition into these matters was a case where a table scan was technically faster (and favored by the query planner) than a low-cardinality index, but used a lot more CPU. So when the DB was hit with several cases of that particular query simultaneously, the server would run out of CPU and everything slowed to a crawl.

jeffbee · on Jan 16, 2021

I find the faith on display in this thread in database query planners to be charming, in the same way that toddlers who believe in the tooth fairy are charming. I guess I'm the only person who has ever needed to debug why MySQL creates on-disk temporary ISAM files for UNION statements producing 2 rows? There are infinite edge cases in these DBMS engines.

sharadov · on Jan 16, 2021

Only cases I've seen that when the stats are out of whack, because the vacuum strategy is not right.

didibus · on Jan 16, 2021

Not saying it's a better choice, but it's an alternative choice I've seen: https://en.m.wikipedia.org/wiki/Datalog#:~:text=Datalog%20is....

z3t4 · on Jan 16, 2021

SQL is very expressive compared to rolling your own map reduce. And SQL databases has a lot of optimizations. That said, accessing the data array directly is often easier and faster.

ethbr0 · on Jan 16, 2021

The DB-as-cathedral didn't scale as our volume of data did.

I don't think NoSQL was as much a conscious choice, as much as the only option when even a medium sized business is unable to afford / scale the number of human DBAs they'd need to keep pace.

Everything has a trade-off, and I think we accepted more (accessible, scalable) data >> more (pristine) data.

(And obviously, the tooling around newer technologies has gotten way better, while SQL was already very mature)

hyperpallium2 · on Jan 16, 2021

Why did NoSQL become popular? Was it the huge datasets required by the internet?

Before SQL, there was already no SQL. I don't mean that as a semantic joke, but that databases existed before relational databases that were faster. Relational DB were too slow to even be usable, until B-trees made them barely feasible in performance (and still much slower than previous DB).

The advantage was flexibility: you could change the database organization without having to rewrite application. Similarly, if your application needed data in a different form, you could make it seem that the database was already in that form.

So SQL was like a glue between systems that could transform the structure of the data - much like high school algebra can put an equation into a different form, that is equivalent but more convenient.

I can imagine, that back in 1970's, computing power was growing much faster year-by-year, than typical database sizes were. So, although "slow", they became "fast enough" for more and more use-cases.

But in 2010's, internet datasets were growing much faster - and computing power wasn't. So Relational DBs weren't "fast enough" for these cases... hence "NoSQL".

Is that about right?

aidenn0 · on Jan 16, 2021

> But in 2010's, internet datasets were growing much faster - and computing power wasn't. So Relational DBs weren't "fast enough" for these cases... hence "NoSQL".

I joke that the only thing a NoSQL database can do faster than an RDBMS is give you the wrong answer to a query.

There are two different features that people typically think of when they think of NoSQL as compared to a traditional RDBMS:

1. Document database (i.e. unstructured, or at least weakly structured data)

2. Eventual consistency

You could have an ACID document database, or an eventually consistent relational database, so the two are actually orthogonal. It's definitely easier to get something up and running quickly if you aren't going to implement all of SQL though.

If you make money serving up ads on lists of documents, then getting any answer quickly is going to be better than getting the "right" answer slowly, so for google at least, this made sense, as google's databases actually were that big.

IMO the rest of it was people cargo-culting, not understanding the hard-won knowledge of the 70s, and also not realizing that, if your working set was 100GB, you could just buy a server with 256GB of ram and have zero performance issues. Prototypes went up fast, benchmarks were great, and then somewhere down the line, someone wanted to run a query on the data, and discovered that fast writes come at a significant cost.

f6v · on Jan 16, 2021

Every time I heard “why not mongo?! NoSQL is fast!”, my pg instance barely used 50 gb and was nowhere near the scaling issues.

Also, I’ve seen so many developers who never even heard term “eventual consistency” in my time interviewing, it’s terrifying.

fctorial · on Jan 16, 2021

With m.2 nvme, you don't even need a ton of ram for that. They're fast enough.

strken · on Jan 16, 2021

If your database keeps growing, you'll eventually exceed the write capacity of a single server, and have to write to more than one place.

You're correct that most of the NoSQL hype was cargo-culting, but I don't think it came solely from Google trying to serve ads quicker, I think it also came from companies having too much data to scale vertically.

claytongulick · on Jan 16, 2021

I don't know.

After hating it a lot for a long time, I've come to really appreciate MongoDB's aggregation pipelines.

They're very explicit and easy to control / optimize / understand.

Yes, it sort of feels like you're writing an EXPLAIN plan, but I'm starting to actually enjoy that now that I've gotten deeper into the guts of it.

overcast · on Jan 16, 2021

NoSQL rocketed in popularity, because it required zero working knowledge of how databases work. You could get up and running on a project, without having to worry about what tables, columns, and their relationships to one another meant. You could throw ANY data in, and generally get it back out.

sharadov · on Jan 16, 2021

Exactly! Developers did not want to learn messy databases. I think a lot of folks without experience entered the industry ( mostly from boot camps and such and ruined everything)

claytongulick · on Jan 16, 2021

I agree with you partially.

I also think that it became popular because a lot of data for many cases is naturally semi-structured documents / complex graphs.

Strictly modeling this in many cases is either impossible or prohibitively complex.

Like most tech fads, NoSQL is best used as just another tool in the toolbox.

It is entirely appropriate for a huge class of problems, and entirely inappropriate for others.

f6v · on Jan 16, 2021

I can almost guarantee that for most of the SaaS startups which still go with the React, Node, Mongo stack the data is structured. They have users, orders and whatnot. It just takes some experience to foresee the upcoming product changes. But as someone said here, the nosql stack has been incredibly popular among recent bootcamp graduates.

claytongulick · on Jan 16, 2021

We're going to have to disagree there.

Loosely structured graph data is a thing, and there are many uses cases for it.

This is why pretty much every rdbms these days has a json datatype.

dodobirdlord · on Jan 16, 2021

I think that's not the reason. NoSQL rocketed in popularity on the back of adoption by a few large companies with scale problems that had to abandon relational databases due to scale issues. If the requirement is to serve very high low-latency throughput to back something like shopping on Amazon, then relational databases and SQL in particular aren't very helpful. You know your data access patterns up front, and can optimize your database to support exactly your API's access patterns. Ad hoc queries on the production database are prohibited, data analysis work gets done with some kind of ETL pipeline, and the choice to trade off any part of ACID for more throughput and lower latency is a no-brainer.

overcast · on Jan 16, 2021

A few large companies helped to get it onto the radar of a lot of inexperienced developers, who found how easy it was just to plug away on it. All of that performance nonsense was second fiddle by a long shot for the vast majority of users.

barrkel · on Jan 16, 2021

NoSQL didn't support joins (it's much easier to get predictable performance from key lookups), was trivial to shard (because no joins), and supported the common B2C scaling pattern of small amounts of data on millions of users.

With SQL it's easy to write a badly performing query which does lots of inefficient joins in the DB. NoSQL doesn't give you as many tools to offload your computation so it lives in your app instead, where it's easier to scale out (i.e. throw money at the problem).

I'm not getting into schemas vs schemaless (e.g. the cost of migrations when you have billions of rows) or denormalization (e.g. stuffing joined entites into both ends of an association, ugh), etc., there are other pros and cons. But IMO the lack of compute offload is a positive feature of most NoSQL.

jolux · on Jan 16, 2021

But you’re also offloading data integrity to your application, which can introduce significant latency if you don’t do it right.

asdfasgasdgasdg · on Jan 16, 2021

A while ago I'd say! Last time I heard anyone seriously excited about NoSQL was several years ago. It still has its place, but it seems like Postgres is the hype these days.

random5634 · on Jan 16, 2021

Yay - I dodged the noSQL hype! And agreed on postgresql (and sqllite if something small and simple needed without any overhead)

symlinkk · on Jan 16, 2021

Weird to describe an open source relational database as “hype”. It’s the assumed default. It’s like describing water as “hype”.

TedDoesntTalk · on Jan 16, 2021

> It’s the assumed default

It is now, but before it had that privilege, MySQL did. The hype he is referring to may have taken the crown off MySQL and given it to PostGres. But make no mistake....postgres is not water. How do i know? Because it, too, will one day be unseated. And water does not lose its crown.

smt88 · on Jan 16, 2021

I think Postgres is popular because of its emphasis on being explicit, strict, and correct.

We're seeing the same thing in programming language adoption, where TypeScript is exploding in popularity and seemingly every language is getting static types if it didn't already have them.

To me, the rise of Postgres (and its spiritual siblings, languages with expressive, static type systems) are about maturing of the industry rather than hype.

joshxyz · on Jan 16, 2021

> I think Postgres is popular because of its emphasis on being explicit, strict, and correct.

And good documentation. Very good documentation.

7thaccount · on Jan 16, 2021

I think you have to look at it less as Postgres and more as SQL vs NoSQL.

SQL has been the default for eons, (although I still use a 70s era heirarchy based database on a daily basis), but that is another story. My company has dozens of large production scale databases and I think only one or two NoSQL products. We don't use Postgres unfortunately though.

NoSQL went through a hype in some circles (stayed non-existent in mine for the most part), but in my eyes, SQL has always been the work horse and was never not the default in the greater industry in my eyes. The Postgres implementation has become pretty popular recently, but so has Oracle and others in the past.

I think the commenter was getting at this.

TedDoesntTalk · on Jan 16, 2021

> 70s era heirarchy based database on a daily basis

Your file system perhaps?

7thaccount · on Jan 17, 2021

No haha, a real database as part of a major product. It's cool in a way, but very frustrating compared to SQL. I guess the closest thing people probably could compare it to is the MUMPS running in hospitals.

LukeShu · on Jan 16, 2021

Fitting comparison; it's been weird watching the rise of reddit.com/r/HydroHomies . We live in weird times, if water can be "hype", so can PostgreSQL!

darkerside · on Jan 16, 2021

It is fitting. Water is a miracle drug. And we take it for granted. Needs more hype!

NewJazz · on Jan 16, 2021

I always think of Heinlein's water brothers when I see that subreddit.

zikzak · on Jan 16, 2021

I had to check. It's really a subreddit.

rubatuga · on Jan 16, 2021

That subreddits name had an interesting history if you google it.

NewJazz · on Jan 16, 2021

https://knowyourmeme.com/memes/sites/water-niggas-hydro-homi...

schoolornot · on Jan 16, 2021

I'm surprised Postgres has continued to be popular despite having multi-master or sharding support.

sitharus · on Jan 16, 2021

Why? Very few products need sharding, let alone multi-master. Sure a popular social media platform would, but most development in the world is for small to medium scale line-of-business apps. Postgres is _fantastic_ for these.

plaur782 · on Jan 16, 2021

This is the key point we see - most applications just do not generate write traffic that is beyond what a recent release of Postgres can handle.

sharadov · on Jan 16, 2021

I manage a large sized Postgres farm with 100s of instances, and there has been one case where we need multi-master, and I went with Galera cluster for MariaDB. You can shard using the citus extension for Postgres.

plaur782 · on Jan 16, 2021

Depending on requirements, there are an increasing number of options for "active-active" Postgres deployments. A colleague wrote this on a federated active-active configuration on Kubernetes: https://info.crunchydata.com/blog/active-active-postgres-fed...

zozbot234 · on Jan 16, 2021

Postgres supports table partitioning and foreign data wrappers (used for accessing remote SQL databases) which can be used to set up sharding as described in the postgres docs.

speedgoose · on Jan 16, 2021

It's because most developers don't need multi-primary databases, which have downsides too.

oneweekwonder · on Jan 16, 2021

Was using aws pg service and wanted our prod@aws synced to dev/backup@localhost and found Bucardo[0] to do a great job.

"Bucardo is an asynchronous PostgreSQL replication system, allowing for both multi-master and multi-slave operations"

[0]: https://bucardo.org/Bucardo/

petters · on Jan 16, 2021

The excitement about NoSQL at Google seems to have stopped with the introduction of Spanner, at least.

kthejoker2 · on Jan 16, 2021

It is fractally amazing to see the exact same false dichotomy within data stores, DBMSes, and query engines themselves playing out in the "market view" of those same products.

That is:

The tradeoffs between all these systems has always been the effort required to create, modify, and maintain well-groomed (albeit rigid) schemas and data models versus the speed, scale and agility of a schema-on-read / "unstructured" data storage mechanism.

Which is then counterbalanced by the tradeoff between getting quick, accurate (albeit rigid) answers of a well-managed data warehouse vs. having to string together fragile, complex ad-hoc wrangling and querying code.

So pick your poison: a junk drawer full of Legos or a beautiful sculpture with the head and an arm missing.

And the obvious answer is for most organizations you need both! Agility for bottoms-up discovery and exploration, and rigidity for top-down hard facts and shared objectives. (Maybe it's a lakehouse, maybe it's not, TBD.)

And then there's this meta thing where NoSQL was pitched as a disruptor, agile, low barriers to entry, and RDBMSes and data warehouse vendors got this reputation as slow, rigid, too in love with their creations to change ...

And now there's this reverse pushback - oh, actually these NoSQL vendors need to grow up and mature their products, that agility was just a lot of hype and chaos, these data warehouse vendors had the right ideas, they've learned to play the NoSQL vendors' game better than they have and their go-to-market strategies have stood the test of time ..

When (again!) the answer is you need both: disruptors bringing different paradigms to market, letting organizations pick and choose capabilities based on their needs, making legacy vendors adapt and evolve.

Funny to see that rhyme.

mojuba · on Jan 16, 2021

At the end of the day it's dynamic vs. static typing. Both have a place under the Sun.

dazhbog · on Jan 16, 2021

So glad I migrated to SQL recently. I thought I had unstructured data and I had no real need for relational data. But oh boy I was wrong. Want a customer list, billing, emails, linked accounts with those users, etc. All of this was such a pain in mongo and remnants of messy schema still lurk in our codebase. Reminds me alot coding in typed languages like C vs. python or JS. But in the case of mongo I think I was getting the worst of everything ;)

xwdv · on Jan 16, 2021

I think the reason this happens is people have a poor understanding of what is unstructured data.

Most data is going to have predictable structure to it, so you might as well just make it official.

In fact, in all my years, I don’t see any real reason to use MongoDB over SQL, unless you’re storing log files or something.

atom_arranger · on Jan 16, 2021

I’ve been thinking lately that maybe the most reasonable path would be using SQL early on so you have a very clear picture of your schema, and you can do migrations easily.

Once you scale up and your schema and access patterns have solidified then you can make the switch to NoSQL where it makes sense.

mercer · on Jan 17, 2021

I suppose it depends on the specific project or feature.

Usually I go for the approach you describe. But more than once I got bitten by doing this for a feature or project where things were still very much in flux, and/or in a prototyping phase. In those cases, starting with 'NoSQL' (JSONB columns in Postgres though) would've saved me a lot of trouble, and it would have been much easier, relatively speaking, to migrate my data into proper tables once things solidified.

Still, I do find that going for 'SQL' by default has usually been the better choice.

joshxyz · on Jan 16, 2021

This this this. At the sight of the first relational data I noped the fucked out immediately at nosql db's

falcolas · on Jan 16, 2021

Nah, the pendulum has just swung back in the opposite direction. Give it 10 or 20 years, and just like strongly typed programming languages, NoSQL will be all the rage again, and you'll be "an idiot" for not using it, again.

jabberwcky · on Jan 16, 2021

Fashion-oriented posts like this are always a shitfest, and this one is no different. It is setting up a variety of different architectural styles as if they were in competition for the "top spot", which is to say the only option that should be applied in all use cases. This isn't just garbage, it's actively encouraging a whole breed of shitty engineers who never learn how to approach solving a problem.

> The goal of a NoSQL database, on the other hand, is to ensure ultimate scalability by making sure that the data is stored in a format that can be shared—or sharded—across multiple servers

From here, it then proceeds to list architectural specialisms that have absolutely nothing to do with scaleability

- "Document stores" excel at managing compound representations of data, they do an amazing job of minimizing IO when many small sets of (usually hierarchically structured) data can be stored as a single unit. Document stores map particularly well to the "REST" service architectures in the original Sam Ruby meaning of the word

- Graph databases are (usually but not always) document stores that excel at indexing and executing transitive queries. Their innovation is not in storage, but in querying particular kinds of data sets with complex (and possibly undefined upfront) relationships using queries that are also likely complex and possibly undefined. This has more to do with expressiveness than scaleability

- Column stores excel at managing timeseries. Like document stores, their entire point is IO and processing optimizations that become possible when data is in a particular shape -- varying with a particular profile, and with high redundancy when viewed along a single (usually time) axis. Column stores absolutely rock when applied to the right kind of data -- they can provide 20x storage size improvements and similar query execution time improvements. Finally we can say that column stores have something to do with scaleability. A 20x improvement in hardware utilization could very much be make or break for many kinds of common project

- Time series databases are column stores.

> Because companies like Google and Amazon created these databases for their own massive data stores, the goal was to reduce the time needed to grab a piece of data

Every. Single. One of these architectural styles long predates the FAANG-industrial complex.

> NoSQL databases don’t offer much in the way of transaction management or real coding

Real coding?

> NoSQL databases like MongoDB just take data and store it

Nobel Prize stuff right here.

I stopped reading

ineedasername · on Jan 16, 2021

I think a good proportion of posts like this are just for resume padding, to be able to point to a bunch of "think pieces" you've written that make you not just a programmer, but a "thought leader".

redwood · on Jan 16, 2021

I got a good chuckle out of "real coding" too

redwood · on Jan 16, 2021

This is filled with incomplete information. MongoDB has had transactions since 4.0 and a strong consistency model by default from the start. That's not to say they didn't have some bad defaults early on... This article makes gross generalizations and just doesn't really add all that much value.

Document databases offer a full-featured general-purpose alternative and really shouldn't be compared to Key/Value stores at all. They're only being lumped together since both are "nosql", a fairly tired term at this point

erik_seaberg · on Jan 16, 2021

https://jepsen.io/analyses/mongodb-4.2.6 reports the default read and write concerns were extremely aggressive, and even the safest available values had issues.

tomnipotent · on Jan 16, 2021

It's a really bad article, just filled with gibberish starting with its description of SQL and NoSQL. Fortunately the comments here will provide better material.

Kaze404 · on Jan 16, 2021

I have a very antagonistic relationship with NoSQL databases because the vast majority of people get nothing out of using one, and yet every resource a newcomer to programming (on the Node.js ecosystem at least) recommends using MongoDB with Mongoose (an ORM for a NoSQL database? Why?), leading them down a path they really have no business walking because they could have instead learned the widely used, time-tested traditional SQL databases.

ZephyrBlu · on Jan 16, 2021

> and yet every resource a newcomer to programming (on the Node.js ecosystem at least) recommends using MongoDB with Mongoose (an ORM for a NoSQL database? Why?)

It's because it's easy. Who cares about thinking? Just throw your data into Mongo and it'll work (For your toy project where nothing matters).

It's sad to see that there's such a massive lack of the type of attitude described in this article: https://www.norvig.com/21-days.html.

Kaze404 · on Jan 16, 2021

That's a great article, thank you.

yawnxyz · on Jan 16, 2021

this is where MongoDB excels... at toy projects. It's great for designers like me to get my feet wet

dlvktrsh · on Jan 16, 2021

I've been the exact person you're talking about, do u think I should switch to postgres instead?

I'm trying to build an Instagram bot that collects all sorts of user metadata and their interactions with other users maybe evern someday make a Twitter version of the saem bot and try to mine some more data

petersellers · on Jan 16, 2021

For small projects like this either one would probably be fine. Hell, you can try implementing with both just to see what you like and don't like from each.

mercer · on Jan 17, 2021

I suppose switching depends in part on how much work that would be.

That said, I tend to pick Postgres as a default because using JSONB columns I can get the benefits of 'NoSQL' and switch over to 'SQL' while staying within the same database.

Kaze404 · on Jan 16, 2021

I don't think you'll benefit from switching either way, but in my opinion you'll benefit from learning Postgres when you have the time / start a new project.

codazoda · on Jan 16, 2021

Each has its place.

Three years ago I started a new project and elected MariaDB. I was coming from a project that was using MongoDB. Because the new project seemed to have very structured data, mostly coming from third party systems, I opted for a structured solution.

Three years later and my structured database has tons of tables and requires lots of brain twisting joins. It slowly evolved this way, while our UI basically evolved to use a single React "state" to represent an "order".

It's tempting to consider what it might be like to store an order as a single Mongo document and forget all this structure.

ramraj07 · on Jan 16, 2021

This never made sense. How can you forget the structure? Either you code the Structure in the schema or in random places in your code as dictionary keys, which seems far more unwieldy.

codazoda · on Jan 16, 2021

I used to agree, but this project feels different. I don't think it would require much structure in my system. I just deal with a single order. Then, I take pieces of that and send it out to a couple 3rd party API's. Those API calls are structured, sure, but so is a document. I only load orders by their order number and then deal in the order as a whole.

My joins are primarily to pull in all the pieces I need for an order.

Maybe this is just my current, "the grass is greener" view, but I wonder.

paulryanrogers · on Jan 16, 2021

Postgresql has JSONB if the data isn't too dense. Then again there's also the file system.

CharlesW · on Jan 16, 2021

> Each has its place.

NoSQL encompasses many, very different types of databases. Were you thinking about document-store DBs here?

codazoda · on Jan 16, 2021

Yes, for sure. My NoSQL experience is primarily with MongoDB, which is a document store.

nunodonato · on Jan 16, 2021

I use mysql for all my pet projects...but thats just because it is what I have always done for years. Should I switch to Maria DB?why?

codazoda · on Jan 16, 2021

MariaDB was created by the same guy who created MySQL after MySQL was sold to Oracle. They have diverged now but MariaDB has great compatibility (it works out of the box with many tools). It's just my preference right now.

nunodonato · on Jan 16, 2021

Yeah I was aware of its history and compatibility. Just never got into what made it worth to make the switch. Although ditching anything related with Oracle is always a good reason

herodoturtle · on Jan 16, 2021

Interesting blog post. Thanks for sharing. This part, in particular, resonated with me:

> Querying data is a little harder. Apache’s Cassandra uses Cassandra Query Language or CQL which, interestingly, does not allow for joins. MongoDB just sends JSON objects in reaction to requests. Need all users in Ohio? MongoDB sends a big chunk of data.

I fondly recall the late night debates with fellow colleagues in the industry, several years ago, when we were pitching a database design to a startup bank in South Africa.

Back then during those fights some even suggested that the NoSQL vs YesSQL debate was a religious war - much like vi vs emacs - but in the case of data storage it quickly became obvious that each philosophy had its respective strengths and weaknesses - which were in turn easy to understand, to sell, and to add value with.

But nowadays I must confess I do not know of many shops using NoSQL, and I suspect it is for the reasons quoted from the blog post that I shared above.

I would love to read your insight if you've been part of a big NoSQL deployment. We struggled to sell it, so I suspect we must have missed out on some interesting opportunities.

RcouF1uZ4gsC · on Jan 16, 2021

One thing that I think is going away is eventual consistency at the application layer. It is too much of a technical debt and error prone for most applications. It is much easier to reason about a consistent database.

And systems like Google Spanner, and CockroachDB show that you can have a consistent database with good scaling and good performance.

chadcmulligan · on Jan 16, 2021

NoSQL was the last time I ranted about a stupid technology that became popular, nice to see its finally being put to death publicly. I just ignore them now and wait for their inevitable death (node.js I'm looking at you)

Edit: was just thinking the defining characteristic of these sorts of technologies is they are advocated as replacements for things that already exist, and the people advocating them are not experts in the things they are trying to replace. So they don't understand the reasons behind why things are done - NoSQL was obvious for any database person - no transactions, no normalisation. Node.js - tries to replace server coding with something vastly inferior, and it sort of works until you need a proper server.

Edit2: lol, time will tell

Kaze404 · on Jan 16, 2021

I think the difference is Node.js really does bring significant advantages that the majority of developers can benefit from (or at the very least consider). It's hype is/was deserved in my opinion.

ian-g · on Jan 16, 2021

It might be a silly question, but when you say you're waiting for node.js to die, do you mean node specifically or server-side JS?

chadcmulligan · on Jan 16, 2021

I think the only reason you'd use server side JS is you don't know C or C#. I'd be hard pressed to imagine someone who knows a number of server side environments and languages choosing JS as the solution - strong typing and performance are the obvious possible problems, then multi threading performance etc.

The only reason anyone uses JS is the browser constraint, remove that and there are a lot of better solutions.

strken · on Jan 16, 2021

I'd recommend using JS (or TypeScript) for a while to understand how Node.js performs before making comparisons like this, because it does surprisingly well under the kind of load most web or API servers see. The standard library and the package ecosystem use asynchronous IO for nearly everything, which makes multithreading almost irrelevant, and the V8 JavaScript engine is extremely fast.

.NET makes it harder to do asynchronous IO, but easier to do traditional multithreading. I'm hesitant to say whether I think the average server running on .NET performs better than the average server running on Node.js, but I'm confident the performance difference isn't as wide as you might believe, and Node.js might even have the advantage.

mb7733 · on Jan 16, 2021

I'm not going to stand here and say node is the end all be all of server side languages... but you really can't think of a reason that one might choose node over C for writing the back end of a web app?

I'll reverse it and ask: Why would you want to write an app to serve some CRUD API in C? This isn't and has never been a very popular choice.

mercer · on Jan 17, 2021

Agreed. I still generally prefer Python/Ruby/Elixir over Node, but comparing it to C or .NET doesn't really make sense to me.

kube-system · on Jan 16, 2021

I keep going back to JS (and TS) because I can build more in less time. I have little to no mental switching costs between front end and back end, I can share code directly between the two layers, and JSON is native.

I can always add type checking later when and where I need it. Sure, it isn’t high performance, but it is acceptable performance for most anything that the majority of people are doing.

js4ever · on Jan 16, 2021

Before switching to node 3 years ago I was using C# since 2004. Main reason to switch was npm vs nuget. Second reason was performance! Yes node is multithreaded and faster than C#. I was not able to go above 25k rps with C#, check where I am with Node here: https://vms2.terasp.net/

richajak · on Jan 16, 2021

Visited your github page. Still not sure, what makes yours faster than vanilla nodejs? do u use standard nodejs api or binding with code written in others? enlighten me :-) can i use it for standard json rest api?

js4ever · on Jan 17, 2021

I'm using uWebsocket C++ library in nodejs to make it a lot faster than vanilla nodejs. Of course you can use it to create REST API, but also websocket and even serve static files all in the same process.

Please check here: https://github.com/elestio/cloudgate/blob/master/apps/CatchA... https://github.com/elestio/cloudgate/blob/master/apps/CatchA... https://github.com/elestio/cloudgate/blob/master/apps/CatchA...

knicknic · on Jan 16, 2021

There are, but not with as many developers as JS. And because JS has so many developers it is better in some metrics.

wnnzl · on Jan 16, 2021

It’s all about trade-offs.

Do you need availability or consistency? https://martin.kleppmann.com/2015/05/11/please-stop-calling-...

I would personally never build anything mission critical using NoSQL, sure it’s fast and easy to use, but it might render unreliable in some situations when it’s most important.

However 99% of stuff SWE use to build are nowhere close to that level of importance.

As long as you know your tool and requirements feel free to use whatever you want, even JSON file on hard drive.

ccortes · on Jan 16, 2021

I don't have much experience but you can easily mess up both SQL and NoSQL. I recently picked up Datomic and honestly I'm not looking forward using SQL again for a while.

niffydroid · on Jan 16, 2021

NoSQL, I always understand it as Not Only SQL. At our place we use MongoDB (Main store), SQL, Big Query, Redis, ElasticSearch, then we also store data in S3 that we don't want to query or have the cost of storing in the DB.

Pick the right DB for your requirements. Management of them isn't that hard as they're hosted solutions, we've only got to deal with the cost of when a version is EOL, so upgrades or when certain queries will no longer work.

snidane · on Jan 16, 2021

Stonebraker often amusingly describes the NoSQL meaning changing over time as adopters come to realization that SQL is not going away.

- NoSQL as No SQL at all

- NoSQL as Not Only SQL

- NoSQL as Not Yet SQL

paulgdp · on Jan 16, 2021

I'm surprised that this article is not even mentioning one NewSQL DB like:

- Google Spanner

- CockroachDB

- TiDB

- YugabyteDB

- FaunaDB

Can it be that those are so little known by most developers?

kthejoker2 · on Jan 16, 2021

Who needs this kind of scale besides FAANG-like companies?

Why would you start some garage SaaS with YugabyteDB?