Honestly, I think the more important takeaway isn’t the decline of NoSQL. As the article says, it’s the conclusion that’s what’s good for FAANG isn’t necessarily what’s good for your average project. I know this sounds obvious but it’s been a constant source of frustration for me with the hype train.
You could apply the same logic to a whole bunch of tech like Kubernetes, service mesh etc etc and arrive at the same result.
Every tech has a trade off, understanding it is critical. Don’t pick tech spot SOLELY based on FAANG use.
From inside FAANG, I have seen many projects over engineered to death. I think the problem is lack of experience combined with the selfish need to make the job more interesting.
I'll admit I have been tempted to stray down this path in the past. I think for me it was the fact that the company refused to provide any time for training or improvement, leading to the desire to chose tech in projects that would build skills rather than being the most expedient.
Otherwise known as Promotion Based Architecture. You’ve got to make the problem difficult enough and the solution complex enough to demonstrate you can “operate at the next level.”
Yeah... sometimes you get a junior engineer who's drunk the koolaid a bit too much and think they need a bunch of fancy tech for their web dashboarding dealing with 5 req/s.
But I think it's pessimistic to reduce it to people trying to make the job more interesting, and at least with more senior people it's often about reusing technologies internally.
For instance, an application may have a use case that requires very high messaging rates, so they build up a team to operate Kafka clusters. Then later you end up with a bunch of teams using Kafka when far simpler things would do the trick, because it's already available and there's a whole team supporting it, with expertise to debug it if things go wrong. It doesn't look good if you take the system on its own, but in context it's a pretty reasonable decision.
IMHO sometimes this reasoning goes too far (I've seen people suggest we rewrite super relational apps to NoSQL to avoid operating SQL DBs!), but it usually comes from good intentions.
The common pattern one sees in such over-engineered mortgage projects is that they usually have a cool-name, even though they are internal use only.
I guess this is because of the hope for it getting open sourced and the dev becoming famous. Also, the less experienced people are, the more they are thinking that they are breaking new grounds.
Sometimes engineers have perverse incentives. If you want a FAANG job, it probably helps to have experience with FAANG tech, even if your current company doesn't need it.
It is a very frustrating industry to spend a long time in. A hamster wheel of constantly shifting goal posts for mastery. I wouldn't mind if they were fundamental advances in software development, but it's mostly the same stuff with small advances but massive learning curves of arbitrary, non-transferrable minutiae.
The side effect of this ever shifting tooling set, is almost no one masters anything. All software is varying levels of crap written by newbies, because when the tools change every project you are always a newbie in that tool set.
> I wouldn't mind if they were fundamental advances in software development, but it's mostly the same stuff with small advances but massive learning curves of arbitrary, non-transferrable minutiae.
I think this is pinpointing the key exploitability in the market. If you have seen enough tech come and go, you can figure out which 98% of this year's idea are the same as the year before, and 40 years ago.
At that point you can start cutting through the bullshit and design things using brand new tech as if you had used it for 20 years already. At that point you're way ahead of the rest of the pack.
(And you can choose not to use the brand new thing, and argue convincingly for why the almost-exactly-the-same 30 year old, more mature and stable, tech is better.)
Firstly software development is not one monolithic job. There are many different jobs within software development.
You can be a systems engineer – understand the computers at a fundamental level – with focus on systems software – building low-level high-performance components/kernels – for storage systems, databases, in-memory systems etc (each of these have a lot of technical domain knowledge specialisation within them). You can spend 10 years in this field and continuously develop/enhance your expertise incrementally – it is quite stable and rewarding experience. A branch of this specialization is distributed systems engineering – where emphasis is on software operating in a distributed cluster over network – which has additional challenges unique to the aspect of being distributed.
You can be an applications engineer – understand application software engineering methodology – with focus on user/business facing application feature development in a scalable software team setting. The key focus here is not so much about low-level computer systems but more about software engineering discipline. It is about the inter/intra-team sport that is developing and continuously evolving a very large application software code base that is alive with changing/evolving user/business features. It is about modeling the functional domain of the user/business/real world. It is about the modern consumer Internet software development techniques – experimentation, live incremental safe feature releases etc. A branch of this specialization is developing application frameworks and tooling (rather than user/business functional features) to make the life of a feature engineer more productive. You can spend 10 years in this field and become an expert at software engineering discipline of churning out quality features on time and on budget.
The experience and learning in the above job families transcend any particular technology stack. The learnings are transferable from one tech-stack/functional-domain to another with relatively minimal effort. This effort is part of the work itself and doesn't turn you into a newbie for having to do it.
p.s: A generalist full stack engineer is usually an applications engineer who is somewhat good at systems engineering and is able to glue together systems to achieve the application features well. These engineers can take a startup from start and through initial growth phase and up to start of hyper growth phase. But there's a scale/performance threshold – where the scale of application deployment grows, performance starts to hurt your users, you need strong systems engineering specialists to fix those deeper systems/distributed-systems problems. Public cloud systems have been continuously raising that threshold since the beginning.
As someone at a FAANG company right now, your second paragraph neatly identifies a core frustration I have with my job that I haven't been able to articulate before now. Might be time for a change...
I just learned last week that we're bringing Pega on for some projects. Never even heard of Pega before then lol. Time to sit down with a free account and learn something else that we'll throw away in five years.
Was in a big Javascript project, where any article over 3 months old was useless or wrong. The internals of the app constantly being rewritten to flavor of the month.
I really like mastering things, so I've tried to stick with a technology stack (that I have kind of mastered), but I don't think it's been good for my career.
It's not just engineers either. I've seen new EMs copy and paste ill fitting FAANG development practices into organizations and then bounce off to which ever company championed it not long after.
Major reason why senior engineers/architects go for whatever technology is the hottest in the market (ie used at FAANG) is because it's the safe choice. For every new project, or for a fresh rewrite, one can either go for the incremental improvement over whatever was in use, or for the revolutionary ones that worked for FAANG and open sourced by them. If one were to go with the former, their solution will always be compared against the hypothetical much better one using the latest shiny framework/componenets used by FAANG. It doesn't matter if you made the right choice, because for you to prove you did, you have to again rebuild the application using the FAANG framework and compare the profiling/scalability numbers. Much easier to just go with kubernets microservice service mesh on nosql.
I can't help but wonder if one of the big differences between real engineers and silicon valley software "engineers" is that real engineers don't make decisions this way.
- Real engineering interviews test skills that are pertinent to the day to day work of the engineers; FAANG (and copycat) interviews tend not to
- Real engineering isn't plagued by fad-following or driven by personalities: it's backed by research and established empirical practice
That said, the profession of "real engineering" has its share of problems: advancement is as political as it is in any other industry and it's got relatively low pay compared to the true value of the output are two of the biggest.
I’d go even a step further and say FAANG companies, despite what their tech blogs might argue, aren’t at all immune from making very dumb decisions or adopting very silly practices. To follow those same ideas blindly could cripple your company.
In fact - FAANG companies have the resources to make horribly inefficient processes work (in human time or computer time) - I suspect many of the AI powered things are examples of this - and a smaller company will die trying to get it to work.
I don't think ML-powered systems are inefficient[0], so much as they are incremental gains over the pre-existing system that are only justified by huge sales volumes. If you can get a 4% lift in sales by using a complicated ML model with 100k features instead of a handful of basic heuristics, then whether it's worth spending engineering effort on depends on what your sales are.
[0] Inefficient in the sense that they're wasting money which could be easily reclaimed - they're probably going to be improved over time as the state of the art improves, but that relies on the whole field moving forward.
One problem is that technology is sold with stories about wildly successful companies. This storytelling doesn't concern itself with real world use cases.
I can't count the number of times I've heard Docker pitched with "it's what Google uses!", while the truth was that they didn't.
Before that it was MongoDB which was "just like Spanner which was key to Google's success", while in reality it wasn't even similar.
And you should organize in Spotify-like tribes, even though that idea is one person's idea about how they wished things worked, filtered though an eco chamber of conference talks.
As an engineer it is easy to dismiss these ideas when you have enough behind the scenes knowledge. But the point of storytelling isn't to build technology, but to pitch it. It does a good job at that. So use it, but wisely.
> Because an SQL database uses a schema or structure, this means changes are difficult. Say you’re running a production database full of a million records.
Articles like this one perpetuate the myths in the minds of young developers. First off, “millions of records” is nothing this days. More importantly, the scheme ends up living somewhere. If it’s not in your database, you’re likely managing it in the app. There’s no free lunch when it comes to scheme for a typical SaaS.
> And while SQL statements are fun, it’s easy to drop all tables while futzing with a key or corrupting an entire repository with a malformed query.
This seem to indicate that the writer don't real get a grasp of very basic modern SQL databases features such as permissions, constraints and transactions.
Without such understanding how can one form an accurate comparison of NoSQL benefits vs SQL. Or maybe worse the author has a good understanding but prefer to make bold and false statements to push his point.
It isn’t even a permission thing. If you are just trying to query some data you don’t accidentally replace “select” with “delete”. The author is trying too hard to put sql in a bad light. Any syntax I have seen to query nosql databases looks clumsy to me as well. Sql I am familiar with, so I am probably biased, but they both have they pros and cons.
It reads like SEO spam: a plausible collection of statements scraped from other articles and then cobbled together with no understanding. It's a shame that stackoverflow has sunk to this.
> If it’s not in your database, you’re likely managing it in the app
“Managed in your app” sounds like a benign state of affairs.
Anything that doesn’t have a home ends up smeared across your entire codebase. It isn’t that it’s in the app, it’s that it’s everywhere in the app, meaning changing it becomes a huge investment of energy that people will try to avoid or put off.
You can have a SQL database and still end up with assumptions about the data smeared across your codebase. I've worked on multi-million line codebases on top of a SQL database where nobody dared change the schema of some (very non-optimal) tables because too much code directly depended on the structure of those tables. Having a clean and DRY data access layer is necessary regardless of the underlying database.
As soon as multiple independent codebases share the same database I would argue you need to put an API on top of that database and turn it into a microservice that owns its database. Otherwise the internal details of how the tables are structured will wrap itself into the codebases and make it very hard to evolve the database's schema.
> Having a clean and DRY data access layer is necessary regardless of the underlying database.
SQL databases (via views and even sprocs) allow you to abstract particular client’s view of the data from the base storage layer inside the database.
> As soon as multiple independent codebases share the same database I would argue you need to put an API on top of that database and turn it into a microservice that owns its database.
An RDBMS is is an integrated service that owns its own datastore with a very-well-defined, extremely battle-tested API designed to support multiple clients with completely different views of and access to the data, all as logically isolated as necessary from the design of the base storage layer.
If you aren't using an RDBMS, sure you may need to wrap something around the datastore that provides a tricky-to-get-right subset of what an RDBMS provides fairly trivial-to-use facilities for out of the box, just like like not using an RDBMS often forces you to do for another subset if you are concerned about integrity.
But if you 'have it in the database' it will still be smeared across your app too.
'Putting stuff in one place', regardless of that place, is hard. Necessary, but hard. And it requires tradeoffs.
If you need a 'sorry this username is taken' friendly error, your app needs to handle constraint errors from your DB. Even if only on the translation layer. At which point you'll have it duplicated on multiple layers, add tight coupling between layers, or need to forego that message and e.g. settle with a generic exception instead.
The difference is that the actual constraint lives in one place and the rest of the locations are UX benefits to help the user. The system doesn't get into a bad state just because you forgot to add the constraint in the 100th location.
> rest of the locations are UX benefits to help the user.
In my experience this is a pipe dream.
Maybe in my simple example, one could parse the Constraint Exception and map that to a field and user-friendly error. Maybe. No framework or ORM that I've ever seen that does this though.
But even when it does: it still requires you to do the parsing and mapping in the application: introducing a tight coupling (e.g. you cannot add a constraint without releasing new locales).
In practice, is my experience, you'll most likely have some constraints in the DB, some validations in your ORM, some of which overlap, some of which are unique to one of both.
Which is arguably worse than having each app that uses the database repeat that. It all depends on the use-case, obviously.
Definitely. You won't have the rich exception messages all over. However, if the rule is a _business rule_ then it must either live _in_ the DB, or (very frequently) live in a dedicated repository layer that all application access goes through. Otherwise its not a rule and you _will_ forget to enforce it at some point.
This is the distinction we make too: business validations go in the database, ux-validations go in the application.
In practice, however, this means the business validations are duplicated all over the place (but always enforced, as last line of defence, in the database).
It also means customers get more frequent 500 errors (exceptions): when a business rule is implemented in DB but not (yet) in an application.
> It isn’t that it’s in the app, it’s that it’s everywhere in the app
That's only if you don't know how to properly code a data access layer in your application. And if you have many apps using the DB, perhaps the data layer should be in a library.
It's also a lot easier to mess up evolving the logical schema and result with unexpected and incoherent database state if your store doesn't enforce the logical. schema. Sure, the more the logical schema is enforced, the more you are forced to do up-front when the logical schema changes, but that work prevents you from:
(1) apply a data migration that fails to result in a state that complies with the logical schema, or
(2) producing a state inconsistent with the logical schema because your application code doesn't correctly observe the schema, as defective code will fail for violating constraints instead.
The book Designing Data-Intensive Applications talks about “schema on write” vs “schema on read”. In order to interpret your data, you must apply a schema, so your choice is whether you do that explicitly when the data is written, or implicitly when it’s read.
Or as Yoda would say, Schema read or schema write, there is no “no schema”.
If you don’t mind me asking, how does Postgres fix this? Do they have a more sophisticated locking mechanism, or maybe a copy offline until it’s ready kind of a system?
> This might cause the whole table to be locked and copied in SQL
That was indeed the case in the past, but not so much anymore except for certain situations. MySQL, for example, has had support for in-place table alterations for a while [1]. I've used it in production and it works very well IME.
“might cause”, yes. But at least with Postgres that only happens if you add a default value to the new column. You can add the default value in your application instead, just like you would with your average NoSQL DB.
Then, when you have low load on your system, you can migrate the rows in batches to have a default value, and eventually remove the application default.
But if you don't control the client, you will now have to deal with client side migration and server database version management. If you create an additional v3, you will need to decide to either keep v1->v2->v3 code or v1->v2, v1->v3. Also, reporting.
Can’t you do this concurrently these days? Also, managing scheme in the database was the least of my issues when deleting/adding fields. You still need to make sure your clients are resilient against null or missing fields in the responses, and your scheduled jobs don’t query the data which isn’t there anymore. Point is, not having to alter the db scheme doesn’t make such a big difference. You still need to make sure your system overall is build in way that allows smooth data migrations.
This blog post is calling at least two different things SQL, and it's kind of infuriating.
SQL as a query language is never going away. Virtually every database has found it necessary to offer a SQL-like query language: Cassandra's CQL, HiveQL, Couchbase query language, and so on. SQL is a human-readable, composable formalism for describing data.
What's gone away is the practice of writing complex, highly linked, normalized database schemas with layers of constraints and foreign key references. That was banished to the land of stagnant enterprise 10 years ago and is not coming back.
The last 10-15 years have been an evolution from mostly static, deeply linked, highly structured data to shallow schemas, append-only updates, denormalized data, and stream processing. If you data is a stream of updates, there's not as much pressure to roll back. If your data is mostly defined by a series of processing pipelines that live entirely outside your data warehouse, there's not a lot of upside on enforcing constraints in the DB. If anything, we have learned that it's very useful to offer different denormalized views of the data in the DB to different consumers.
MS SQL Server is not roaring back. Businesses have just learned to unbundle data processing from data warehouses. Data warehouses now have fewer tasks to focus on, such as scalability. And if your DB is just a dumb replica with a flat schema, whether it's an RDBMS or not is pretty unimportant.
>> What's gone away is the practice of writing complex, highly linked, normalized database schemas with layers of constraints and foreign key references
That hasn't gone away, that is the default for most corporate CRUD apps, for good reasons. Apps that aren't corporate CRUD apps are a rounding error away from not existing. You use the term data warehouse 3 times, are you sure you are describing a general case rather than one you are familiar with?
If you want to offer different views of data in a normalized database to different consumers, one way to do that would be to use views.
> This blog post is calling at least two different things SQL, and it's kind of infuriating.
For better or worse, that's not what "NoSQL" means. I understand that the name is a bit infuriating, and I personally map it to "non-relational" as I read.
> What's gone away is the practice of writing complex, highly linked, normalized database schemas with layers of constraints and foreign key references.
This may be true for some use cases, but in general that's wishful thinking.
> This may be true for some use cases, but in general that's wishful thinking.
Why would avoiding implementation of database level constraints be "wishful thinking"?
Au contraire: I strongly believe that data level constraints should be tied to the database.
Sure, there's a trade off between indexing, constraints and performance. But I rather have an additional unique index on a column than relying on the fact that all developers in an organization always do the right thing.
If the database is my responsibility then I'll make damn sure that it's not possible to fuck it up with shlocky code.
There still seems to be a decreasing amount of appreciation for the relational model and the power of SQL databases to maintain data integrity. I'm not quite sure why because relational databases are a uniquely powerful tool in software engineering.
I think SQL is a poor language in terms of composability, in contrast to a functional query language (Frankel and Buneman, 1979) or a relational algebra (Codd, 1970).
SQL should go away, though. It is an astonishingly poor method of communicating your query to the database server. Generating and parsing it is problematic and expensive. It really has nothing going for it, except that a lot of people already know it.
Interesting. I've never seen SQL as a limiting factor, since 99% of the cases I use I just need to get something by it's id or a simple query (select, join, where, order).
And it's really easy to learn too. I've seen many people without coding background pick it up and this is definitely a bonus where I work. Otherwise that 1 CS/Data Science guy is pre-occupied with hammering out queries for everyone.
What kind of limitations do you run into? And what alternative do you propose (besides GraphQL)?
Not the parent, but I quite like the Kusto query language. It’s Microsoft specific, but the overall concept is nice and could be implemented more broadly. The operations are described as a pipeline, which to me is much more readable (and writable!) than SQL, where it feels like I’m always bouncing my mental cursor around to figure out what a query is doing. I’m sure that’d reduce with more SQL exposure, but understanding Kusto came pretty much instantly for me.
Also, each operation being its own line in the pipeline makes modification an absolute breeze, simply comment out, reorder, etc lines and the result will usually also be a valid query.
SQL is pseudo-readable by people like PMs, non-technical analysts, marketing, customer success, etc.
As a test: I just showed my non-technical wife the following snippet of SQL, and she was able to tell me what it retrieved, as well as modify it to find a different player or statistic.
SELECT
player_name,
COUNT(*) as num_hits
FROM
baseball_players
WHERE result='base_hit' GROUP BY 1;
The good part is that making error in the application layer (like in the example) are less likely to destroy the database, because the database has a schema.
How dare you underestimate me! I would have screwed up the query in any language ;)
This is also quite readable, no arguments there! I personally think SQL is a great language for broad appeal. I understand why people don't like it - there are many funky aspects - but I also understand why it's become dominant. Because it's just so damn useful.
WRT the MS specific language, my issue there is portability of knowledge. Someone with experience in say Oracle, 20 years ago, can reuse those skills with SQL. IMO, we need more common-language, independent-implementation tools like SQL in order to enable more people to code.
The biggest benefit to anything approaching pipelining is making data-interactive developers think about intermediate and transient state.
In my enterprise coding experience, most app developers don't thing of db's as anything other than "one, current state." Which makes debugging a nightmare.
I’d love to hear more of your thoughts on this - I hadn’t considered DBs as a form of state. Obviously they are, it just hasn’t occurred to me.
If I may borrow a related concept - what other mental models should I CASCADE the implications of this new perspective to? Ie, how does this change how you approach problems? Are there limits where you would call out?
Over the course of SQL's existence, how many entire programming languages have built up entire ecosystems? It doesn't make sense that people would upend their language stack every ten years, but then be afraid to learn a new query language.
Most developers need to know their programming language deeply; but not necessarily the query language, which might be hidden behind an ORM anyway.
In theory, at least, changing the query language would disrupt fewer people than, say, moving from C++ to .NET or Perl to Ruby or Python to Go.
I would be very willing to change query langauges. Problem is there's nothing that conpetes with SQL that offers even close to equivalent functionality.
I mean, having a sane protocol designed for use by applications rather than humans would be nice.
It’s not that SQL is bad, it’s that mechanically generating it is wrought with peril, gotchas, and a huge mismatch between the code we write to work with the data we retrieve from these sources.
Hell, s-expr’s would be a much better format and would require little implementation work.
For an application level interface they make sense - they’re trivial to generate and parse in any language, don’t require expensive bespoke parsers and lexers to be written, and the basic constructs are more appropriate for a query interface than something like JSON or other common serialization formats while still being human readable if needed.
Again, SQL isn’t a terrible language - it’s just not designed to be mechanically generated in a sane manner. Libraries like jOOQ are useful because they handle a lot of pitfalls with runtime-generated queries, when we could avoid them altogether by having a better way of applications (rather than users) to query and manipulate data.
I’ve seen a -lot- of engineering time spent trying to adapt to transient quirks of sql query planners, where the programmer expected a filtered range scan and the database elected for a materialized temp table, or similar disasters. Each and every one of those people would have been better served if they could have just specified the op tree that they wanted to be executed.
The right way to execute the query depends on the data in the database, and parameters in your query. I think overall a lot of database time and development time has been saved by depending on query planners.
Don't forget index statistics, which can turn any query into a full fledged catastrophy if they're not up to date.
In my experience it's extremely rare for an optimizer to go awry on a well maintained database.
And for the rare case where your data - and access patterns are so weird that it does happen you can always employ hints. Which is a bit nasty, granted. But it's not that you would use them excessively on a well designed and maintained database.
It's probably obvious that I very much agree with your take.
Hints are required in an efficient, general purpose system.
Either you can have everything fully-specified at all times (a waste of time an effort, as parent noted), or you can limit general application (no overriding defaults), or you can allow a method for overriding defaults (when necessary).
Of those, it seems shortsighted for people to complain about the last, given it's strongly arguably the best of the three options.
Examine the execution plan, use "hints". This is pretty common DBA stuff. If by "engineer" you mean a non-dba programmer, then that's the problem: lack of domain expertise.
Use EXPLAIN ANALYZE and save yourself some time. It's not that difficult to figure out what the planner is doing and tweak it. I think SQL is the best we got but I'd love to hear what you think is a better alternative.
I don't think parent was saying they haven't used EXPLAIN ANALYZE or it's equivalent.
Explain tools are quite good at showing what the db chose to do. They generally suck at explaining why that particular plan was chosen, or why a particular strategy was not used (i.e. why the table scan when there's a perfectly good index).
Isn't that inherent in the abstraction represented by SQL?
In that, it's optional that the DBS explains why it chooses what it chose (although it can happily tell you what).
Were it to do so, that'd be a pretty big crack in the abstraction layer to peer through, and would likely cause more footgunning by (1) developers peeking at the why, (2) developers assuming stability of the why, & (3) ossification of the underlying DBS engines, as now program functioning depends on their internals behaving a certain way.
(Although I guess it's a general win for SQL that we're discussing performance differences, rather than correctness differences)
As it is, people already have to look into "why" for practical considerations. A database failing to perform will lead to a timeout somewhere else and a 500 error served to an end user.
With the lack of tooling support, information is extracted by modifying the query by trial and error (seeing what needs to be changed to flip it over to desired behavior), from dark corners of the internet, and by reading the source code of the storage engine.
My latest expedition into these matters was a case where a table scan was technically faster (and favored by the query planner) than a low-cardinality index, but used a lot more CPU. So when the DB was hit with several cases of that particular query simultaneously, the server would run out of CPU and everything slowed to a crawl.
I find the faith on display in this thread in database query planners to be charming, in the same way that toddlers who believe in the tooth fairy are charming. I guess I'm the only person who has ever needed to debug why MySQL creates on-disk temporary ISAM files for UNION statements producing 2 rows? There are infinite edge cases in these DBMS engines.
SQL is very expressive compared to rolling your own map reduce. And SQL databases has a lot of optimizations. That said, accessing the data array directly is often easier and faster.
The DB-as-cathedral didn't scale as our volume of data did.
I don't think NoSQL was as much a conscious choice, as much as the only option when even a medium sized business is unable to afford / scale the number of human DBAs they'd need to keep pace.
Everything has a trade-off, and I think we accepted more (accessible, scalable) data >> more (pristine) data.
(And obviously, the tooling around newer technologies has gotten way better, while SQL was already very mature)
Why did NoSQL become popular? Was it the huge datasets required by the internet?
Before SQL, there was already no SQL. I don't mean that as a semantic joke, but that databases existed before relational databases that were faster. Relational DB were too slow to even be usable, until B-trees made them barely feasible in performance (and still much slower than previous DB).
The advantage was flexibility: you could change the database organization without having to rewrite application. Similarly, if your application needed data in a different form, you could make it seem that the database was already in that form.
So SQL was like a glue between systems that could transform the structure of the data - much like high school algebra can put an equation into a different form, that is equivalent but more convenient.
I can imagine, that back in 1970's, computing power was growing much faster year-by-year, than typical database sizes were. So, although "slow", they became "fast enough" for more and more use-cases.
But in 2010's, internet datasets were growing much faster - and computing power wasn't. So Relational DBs weren't "fast enough" for these cases... hence "NoSQL".
> But in 2010's, internet datasets were growing much faster - and computing power wasn't. So Relational DBs weren't "fast enough" for these cases... hence "NoSQL".
I joke that the only thing a NoSQL database can do faster than an RDBMS is give you the wrong answer to a query.
There are two different features that people typically think of when they think of NoSQL as compared to a traditional RDBMS:
1. Document database (i.e. unstructured, or at least weakly structured data)
2. Eventual consistency
You could have an ACID document database, or an eventually consistent relational database, so the two are actually orthogonal. It's definitely easier to get something up and running quickly if you aren't going to implement all of SQL though.
If you make money serving up ads on lists of documents, then getting any answer quickly is going to be better than getting the "right" answer slowly, so for google at least, this made sense, as google's databases actually were that big.
IMO the rest of it was people cargo-culting, not understanding the hard-won knowledge of the 70s, and also not realizing that, if your working set was 100GB, you could just buy a server with 256GB of ram and have zero performance issues. Prototypes went up fast, benchmarks were great, and then somewhere down the line, someone wanted to run a query on the data, and discovered that fast writes come at a significant cost.
If your database keeps growing, you'll eventually exceed the write capacity of a single server, and have to write to more than one place.
You're correct that most of the NoSQL hype was cargo-culting, but I don't think it came solely from Google trying to serve ads quicker, I think it also came from companies having too much data to scale vertically.
NoSQL rocketed in popularity, because it required zero working knowledge of how databases work. You could get up and running on a project, without having to worry about what tables, columns, and their relationships to one another meant. You could throw ANY data in, and generally get it back out.
Exactly! Developers did not want to learn messy databases. I think a lot of folks without experience entered the industry ( mostly from boot camps and such and ruined everything)
I can almost guarantee that for most of the SaaS startups which still go with the React, Node, Mongo stack the data is structured. They have users, orders and whatnot. It just takes some experience to foresee the upcoming product changes. But as someone said here, the nosql stack has been incredibly popular among recent bootcamp graduates.
I think that's not the reason. NoSQL rocketed in popularity on the back of adoption by a few large companies with scale problems that had to abandon relational databases due to scale issues. If the requirement is to serve very high low-latency throughput to back something like shopping on Amazon, then relational databases and SQL in particular aren't very helpful. You know your data access patterns up front, and can optimize your database to support exactly your API's access patterns. Ad hoc queries on the production database are prohibited, data analysis work gets done with some kind of ETL pipeline, and the choice to trade off any part of ACID for more throughput and lower latency is a no-brainer.
A few large companies helped to get it onto the radar of a lot of inexperienced developers, who found how easy it was just to plug away on it. All of that performance nonsense was second fiddle by a long shot for the vast majority of users.
NoSQL didn't support joins (it's much easier to get predictable performance from key lookups), was trivial to shard (because no joins), and supported the common B2C scaling pattern of small amounts of data on millions of users.
With SQL it's easy to write a badly performing query which does lots of inefficient joins in the DB. NoSQL doesn't give you as many tools to offload your computation so it lives in your app instead, where it's easier to scale out (i.e. throw money at the problem).
I'm not getting into schemas vs schemaless (e.g. the cost of migrations when you have billions of rows) or denormalization (e.g. stuffing joined entites into both ends of an association, ugh), etc., there are other pros and cons. But IMO the lack of compute offload is a positive feature of most NoSQL.
A while ago I'd say! Last time I heard anyone seriously excited about NoSQL was several years ago. It still has its place, but it seems like Postgres is the hype these days.
It is now, but before it had that privilege, MySQL did. The hype he is referring to may have taken the crown off MySQL and given it to PostGres. But make no mistake....postgres is not water. How do i know? Because it, too, will one day be unseated. And water does not lose its crown.
I think Postgres is popular because of its emphasis on being explicit, strict, and correct.
We're seeing the same thing in programming language adoption, where TypeScript is exploding in popularity and seemingly every language is getting static types if it didn't already have them.
To me, the rise of Postgres (and its spiritual siblings, languages with expressive, static type systems) are about maturing of the industry rather than hype.
I think you have to look at it less as Postgres and more as SQL vs NoSQL.
SQL has been the default for eons, (although I still use a 70s era heirarchy based database on a daily basis), but that is another story. My company has dozens of large production scale databases and I think only one or two NoSQL products. We don't use Postgres unfortunately though.
NoSQL went through a hype in some circles (stayed non-existent in mine for the most part), but in my eyes, SQL has always been the work horse and was never not the default in the greater industry in my eyes. The Postgres implementation has become pretty popular recently, but so has Oracle and others in the past.
No haha, a real database as part of a major product. It's cool in a way, but very frustrating compared to SQL. I guess the closest thing people probably could compare it to is the MUMPS running in hospitals.
Why? Very few products need sharding, let alone multi-master. Sure a popular social media platform would, but most development in the world is for small to medium scale line-of-business apps. Postgres is _fantastic_ for these.
I manage a large sized Postgres farm with 100s of instances, and there has been one case where we need multi-master, and I went with Galera cluster for MariaDB. You can shard using the citus extension for Postgres.
Depending on requirements, there are an increasing number of options for "active-active" Postgres deployments. A colleague wrote this on a federated active-active configuration on Kubernetes: https://info.crunchydata.com/blog/active-active-postgres-fed...
Postgres supports table partitioning and foreign data wrappers (used for accessing remote SQL databases) which can be used to set up sharding as described in the postgres docs.
It is fractally amazing to see the exact same false dichotomy within data stores, DBMSes, and query engines themselves playing out in the "market view" of those same products.
That is:
The tradeoffs between all these systems has always been the effort required to create, modify, and maintain well-groomed (albeit rigid) schemas and data models versus the speed, scale and agility of a schema-on-read / "unstructured" data storage mechanism.
Which is then counterbalanced by the tradeoff between getting quick, accurate (albeit rigid) answers of a well-managed data warehouse vs. having to string together fragile, complex ad-hoc wrangling and querying code.
So pick your poison: a junk drawer full of Legos or a beautiful sculpture with the head and an arm missing.
And the obvious answer is for most organizations you need both! Agility for bottoms-up discovery and exploration, and rigidity for top-down hard facts and shared objectives. (Maybe it's a lakehouse, maybe it's not, TBD.)
And then there's this meta thing where NoSQL was pitched as a disruptor, agile, low barriers to entry, and RDBMSes and data warehouse vendors got this reputation as slow, rigid, too in love with their creations to change ...
And now there's this reverse pushback - oh, actually these NoSQL vendors need to grow up and mature their products, that agility was just a lot of hype and chaos, these data warehouse vendors had the right ideas, they've learned to play the NoSQL vendors' game better than they have and their go-to-market strategies have stood the test of time ..
When (again!) the answer is you need both: disruptors bringing different paradigms to market, letting organizations pick and choose capabilities based on their needs, making legacy vendors adapt and evolve.
So glad I migrated to SQL recently. I thought I had unstructured data and I had no real need for relational data. But oh boy I was wrong. Want a customer list, billing, emails, linked accounts with those users, etc. All of this was such a pain in mongo and remnants of messy schema still lurk in our codebase. Reminds me alot coding in typed languages like C vs. python or JS. But in the case of mongo I think I was getting the worst of everything ;)
I’ve been thinking lately that maybe the most reasonable path would be using SQL early on so you have a very clear picture of your schema, and you can do migrations easily.
Once you scale up and your schema and access patterns have solidified then you can make the switch to NoSQL where it makes sense.
I suppose it depends on the specific project or feature.
Usually I go for the approach you describe. But more than once I got bitten by doing this for a feature or project where things were still very much in flux, and/or in a prototyping phase. In those cases, starting with 'NoSQL' (JSONB columns in Postgres though) would've saved me a lot of trouble, and it would have been much easier, relatively speaking, to migrate my data into proper tables once things solidified.
Still, I do find that going for 'SQL' by default has usually been the better choice.
Nah, the pendulum has just swung back in the opposite direction. Give it 10 or 20 years, and just like strongly typed programming languages, NoSQL will be all the rage again, and you'll be "an idiot" for not using it, again.
Fashion-oriented posts like this are always a shitfest, and this one is no different. It is setting up a variety of different architectural styles as if they were in competition for the "top spot", which is to say the only option that should be applied in all use cases. This isn't just garbage, it's actively encouraging a whole breed of shitty engineers who never learn how to approach solving a problem.
> The goal of a NoSQL database, on the other hand, is to ensure ultimate scalability by making sure that the data is stored in a format that can be shared—or sharded—across multiple servers
From here, it then proceeds to list architectural specialisms that have absolutely nothing to do with scaleability
- "Document stores" excel at managing compound representations of data, they do an amazing job of minimizing IO when many small sets of (usually hierarchically structured) data can be stored as a single unit. Document stores map particularly well to the "REST" service architectures in the original Sam Ruby meaning of the word
- Graph databases are (usually but not always) document stores that excel at indexing and executing transitive queries. Their innovation is not in storage, but in querying particular kinds of data sets with complex (and possibly undefined upfront) relationships using queries that are also likely complex and possibly undefined. This has more to do with expressiveness than scaleability
- Column stores excel at managing timeseries. Like document stores, their entire point is IO and processing optimizations that become possible when data is in a particular shape -- varying with a particular profile, and with high redundancy when viewed along a single (usually time) axis. Column stores absolutely rock when applied to the right kind of data -- they can provide 20x storage size improvements and similar query execution time improvements. Finally we can say that column stores have something to do with scaleability. A 20x improvement in hardware utilization could very much be make or break for many kinds of common project
- Time series databases are column stores.
> Because companies like Google and Amazon created these databases for their own massive data stores, the goal was to reduce the time needed to grab a piece of data
Every. Single. One of these architectural styles long predates the FAANG-industrial complex.
> NoSQL databases don’t offer much in the way of transaction management or real coding
Real coding?
> NoSQL databases like MongoDB just take data and store it
I think a good proportion of posts like this are just for resume padding, to be able to point to a bunch of "think pieces" you've written that make you not just a programmer, but a "thought leader".
This is filled with incomplete information. MongoDB has had transactions since 4.0 and a strong consistency model by default from the start. That's not to say they didn't have some bad defaults early on... This article makes gross generalizations and just doesn't really add all that much value.
Document databases offer a full-featured general-purpose alternative and really shouldn't be compared to Key/Value stores at all. They're only being lumped together since both are "nosql", a fairly tired term at this point
https://jepsen.io/analyses/mongodb-4.2.6 reports the default read and write concerns were extremely aggressive, and even the safest available values had issues.
It's a really bad article, just filled with gibberish starting with its description of SQL and NoSQL. Fortunately the comments here will provide better material.
I have a very antagonistic relationship with NoSQL databases because the vast majority of people get nothing out of using one, and yet every resource a newcomer to programming (on the Node.js ecosystem at least) recommends using MongoDB with Mongoose (an ORM for a NoSQL database? Why?), leading them down a path they really have no business walking because they could have instead learned the widely used, time-tested traditional SQL databases.
> and yet every resource a newcomer to programming (on the Node.js ecosystem at least) recommends using MongoDB with Mongoose (an ORM for a NoSQL database? Why?)
It's because it's easy. Who cares about thinking? Just throw your data into Mongo and it'll work (For your toy project where nothing matters).
I've been the exact person you're talking about, do u think I should switch to postgres instead?
I'm trying to build an Instagram bot that collects all sorts of user metadata and their interactions with other users maybe evern someday make a Twitter version of the saem bot and try to mine some more data
For small projects like this either one would probably be fine. Hell, you can try implementing with both just to see what you like and don't like from each.
I suppose switching depends in part on how much work that would be.
That said, I tend to pick Postgres as a default because using JSONB columns I can get the benefits of 'NoSQL' and switch over to 'SQL' while staying within the same database.
I don't think you'll benefit from switching either way, but in my opinion you'll benefit from learning Postgres when you have the time / start a new project.
Three years ago I started a new project and elected MariaDB. I was coming from a project that was using MongoDB. Because the new project seemed to have very structured data, mostly coming from third party systems, I opted for a structured solution.
Three years later and my structured database has tons of tables and requires lots of brain twisting joins. It slowly evolved this way, while our UI basically evolved to use a single React "state" to represent an "order".
It's tempting to consider what it might be like to store an order as a single Mongo document and forget all this structure.
This never made sense. How can you forget the structure? Either you code the Structure in the schema or in random places in your code as dictionary keys, which seems far more unwieldy.
I used to agree, but this project feels different. I don't think it would require much structure in my system. I just deal with a single order. Then, I take pieces of that and send it out to a couple 3rd party API's. Those API calls are structured, sure, but so is a document. I only load orders by their order number and then deal in the order as a whole.
My joins are primarily to pull in all the pieces I need for an order.
Maybe this is just my current, "the grass is greener" view, but I wonder.
MariaDB was created by the same guy who created MySQL after MySQL was sold to Oracle. They have diverged now but MariaDB has great compatibility (it works out of the box with many tools). It's just my preference right now.
Yeah I was aware of its history and compatibility. Just never got into what made it worth to make the switch. Although ditching anything related with Oracle is always a good reason
Interesting blog post. Thanks for sharing. This part, in particular, resonated with me:
> Querying data is a little harder. Apache’s Cassandra uses Cassandra Query Language or CQL which, interestingly, does not allow for joins. MongoDB just sends JSON objects in reaction to requests. Need all users in Ohio? MongoDB sends a big chunk of data.
I fondly recall the late night debates with fellow colleagues in the industry, several years ago, when we were pitching a database design to a startup bank in South Africa.
Back then during those fights some even suggested that the NoSQL vs YesSQL debate was a religious war - much like vi vs emacs - but in the case of data storage it quickly became obvious that each philosophy had its respective strengths and weaknesses - which were in turn easy to understand, to sell, and to add value with.
But nowadays I must confess I do not know of many shops using NoSQL, and I suspect it is for the reasons quoted from the blog post that I shared above.
I would love to read your insight if you've been part of a big NoSQL deployment. We struggled to sell it, so I suspect we must have missed out on some interesting opportunities.
One thing that I think is going away is eventual consistency at the application layer. It is too much of a technical debt and error prone for most applications. It is much easier to reason about a consistent database.
And systems like Google Spanner, and CockroachDB show that you can have a consistent database with good scaling and good performance.
NoSQL was the last time I ranted about a stupid technology that became popular, nice to see its finally being put to death publicly. I just ignore them now and wait for their inevitable death (node.js I'm looking at you)
Edit: was just thinking the defining characteristic of these sorts of technologies is they are advocated as replacements for things that already exist, and the people advocating them are not experts in the things they are trying to replace. So they don't understand the reasons behind why things are done - NoSQL was obvious for any database person - no transactions, no normalisation. Node.js - tries to replace server coding with something vastly inferior, and it sort of works until you need a proper server.
I think the difference is Node.js really does bring significant advantages that the majority of developers can benefit from (or at the very least consider). It's hype is/was deserved in my opinion.
I think the only reason you'd use server side JS is you don't know C or C#. I'd be hard pressed to imagine someone who knows a number of server side environments and languages choosing JS as the solution - strong typing and performance are the obvious possible problems, then multi threading performance etc.
The only reason anyone uses JS is the browser constraint, remove that and there are a lot of better solutions.
I'd recommend using JS (or TypeScript) for a while to understand how Node.js performs before making comparisons like this, because it does surprisingly well under the kind of load most web or API servers see. The standard library and the package ecosystem use asynchronous IO for nearly everything, which makes multithreading almost irrelevant, and the V8 JavaScript engine is extremely fast.
.NET makes it harder to do asynchronous IO, but easier to do traditional multithreading. I'm hesitant to say whether I think the average server running on .NET performs better than the average server running on Node.js, but I'm confident the performance difference isn't as wide as you might believe, and Node.js might even have the advantage.
I'm not going to stand here and say node is the end all be all of server side languages... but you really can't think of a reason that one might choose node over C for writing the back end of a web app?
I'll reverse it and ask: Why would you want to write an app to serve some CRUD API in C? This isn't and has never been a very popular choice.
I keep going back to JS (and TS) because I can build more in less time. I have little to no mental switching costs between front end and back end, I can share code directly between the two layers, and JSON is native.
I can always add type checking later when and where I need it. Sure, it isn’t high performance, but it is acceptable performance for most anything that the majority of people are doing.
Before switching to node 3 years ago I was using C# since 2004.
Main reason to switch was npm vs nuget. Second reason was performance!
Yes node is multithreaded and faster than C#. I was not able to go above 25k rps with C#, check where I am with Node here: https://vms2.terasp.net/
Visited your github page. Still not sure, what makes yours faster than vanilla nodejs? do u use standard nodejs api or binding with code written in others? enlighten me :-) can i use it for standard json rest api?
I'm using uWebsocket C++ library in nodejs to make it a lot faster than vanilla nodejs. Of course you can use it to create REST API, but also websocket and even serve static files all in the same process.
I would personally never build anything mission critical using NoSQL, sure it’s fast and easy to use, but it might render unreliable in some situations when it’s most important.
However 99% of stuff SWE use to build are nowhere close to that level of importance.
As long as you know your tool and requirements feel free to use whatever you want, even JSON file on hard drive.
I don't have much experience but you can easily mess up both SQL and NoSQL. I recently picked up Datomic and honestly I'm not looking forward using SQL again for a while.
NoSQL, I always understand it as Not Only SQL.
At our place we use MongoDB (Main store), SQL, Big Query, Redis, ElasticSearch, then we also store data in S3 that we don't want to query or have the cost of storing in the DB.
Pick the right DB for your requirements. Management of them isn't that hard as they're hosted solutions, we've only got to deal with the cost of when a version is EOL, so upgrades or when certain queries will no longer work.
1. fast hardware, ie. SSD and a lot of RAM allow nowdays to run classic SQL engines without understanding much how to make them perform. All these "millions" or records fit in RAM these days.
2. availability of SQL[-like to various degree] frontends to NoSQL engines.
> First, we have to remember that NoSQL databases are probably great for Amazon and Google but not so great for your side hustle
Hold up, what? I hooked up a sign up form with MongoDB Atlas, and it's working brilliantly and it's pretty much free for my side hustle... and it was almost effortless to learn how to implement it... so I don't really agree with this article.
Some background: I'm a UX designer, not a dev, and don't have any experience with MySQL (or any free-ish, easy cloud hosted MySQL services, so MDB Atlas was a no-brainer)
(recently I've also been experimenting with Fauna and Supabase)
Atlas is a fully managed instance. Whats being referenced to here is running your own Cassandra cluster, which is by all accounts, heck of a lot harder than running a Postgres instance.
I stopped reading at "Traditional SQL uses related tables connected by IDs" ... that's what ends up with databases designed with pointless auto-incremented proxy-primary keys.
Any experience with something like distributed SQL from CockroachDB? The project sounds amazing to be quite frank, but I'd love to hear from someone with first hand experience.
I just wish that Cloud SQL on Google Cloud would make it easy to do multi region replication seems impossible with Postgres.
I have been using a NoSQL solution lately and each time when you have the need to do joins its a pain as it needs to be done manually and the lack of full text search ain’t fun either which required to use some paid full text search solution. Datastore painful
The article says that NoSQL has no relations. Is that the case? I would have assumed that, say, you'd make a blog system by making user entries with a list of blog post IDs and then each blog post gets its own entry with its data. If not, are you querying and processing a user's entire blog post history everytime you make an update?
> The article says that NoSQL has no relations. Is that the case?
No, not as stated in the article. It is absolutely false that “no matter what format they store data in, these databases don’t support relations between data.” (that's particularly laughable for graph databases.)
This sounds like the author knows that “nonrelational” is another term for “NoSQL”, but, as is distressingly common in the field, doesn't know what “relational” means. (It means it doesn't follow the relational model, who is centrally about (though the model has other elements) storing and operating on data in the form of “relations”, a particular logical abstraction (tables, views, etc., are all concrete realizations of this abstraction.)
The distinction between 'SQL' and 'NoSQL' has made less and less sense over time. I can add a JSONB column to my Postgres database and use SQL to query that data, index it, etc. So is that NoSQL or SQL?
And while I'm not familiar with MongoDB, I used RethinkDB for a while and while I suppose it would be considered 'NoSQL' it had quite a number of features that I'd associate with a relational DB.
It would be like having a "discussion" inside a column in Excel, since as a Q&A site they have no support for threading, and even the comments cap out at about 6 before the site starts recommending one switch to their separate chat site
I wholeheartedly agree with their prohibition on opinions in a Q&A site when HN and Reddit already exist for discussing things
I find document stores shine when you have known access patterns and can line your data up to meet those patterns. I find relational stores shine when you have unknown access patterns.
Ultimately comes down to the right tool for the job. Larger organizations tend to short list a selection of databases to choose from as new applications create new data requirements. These tend to include a NoSQL option, some combination of legacy databases, a data caching or message broker tool, and an open source relational database. Postgres has a lot of momentum as the "new" relational option.
This same article could have been published in 2011 with different headings. In fact, it almost seems like they intended to publish an informational article and some editor came in to write some headings likely to inspire some hot takes in response.
When was the last time you heard someone seriously say that "NoSQL"—that's right, don't even name a database or name any characteristic of its operation, how it scales, how it's queried, its consistency guarantees, its maintenance overhead—is easily comparable to a vertically-scaling SQL engine? The whole rhetoric of tables turning implies that you'd choose a database for cultural reasons outside of hiring ability.... who thinks like that?
You could apply the same logic to a whole bunch of tech like Kubernetes, service mesh etc etc and arrive at the same result.
Every tech has a trade off, understanding it is critical. Don’t pick tech spot SOLELY based on FAANG use.