I've always wondered why Postgres is so insanely popular. I mean it has some nic...

tpmoney · on Oct 18, 2024

> I've always wondered why Postgres is so insanely popular.

In no particular order, my preference for postgres is driven by:

  * Date / time functions that don't suck
  * UTF-8 is really UTF-8
  * 99% of a backup can be done live with nothing more than rsyncing the data directory and the WAL files
  * Really comprehensive documentation
  * LTREE and fuzzy string match extensions
  * Familiarity from using it for years

MySQL/Maria I'm sure is fine, but it's one of hose things where it's just different enough and I haven't encountered a compelling use case for changing my preference.

fhdsgbbcaA · on Oct 18, 2024

UTF-8 is what made me switch. It’s insane MySQL has something called UTF-8 that isn't really UTF-8, but do have a type UTF8MB4 that actually is correct. This means if you use UFT-8 in MySQL, you can’t use emoji for example.

bastawhiz · on Oct 18, 2024

And the fact that adding real utf-8 support limited (limits?) the length of strings that can be indexed

evanelias · on Oct 18, 2024

Postgres limits btree keys to 2704 bytes, which is actually slightly smaller than MySQL's limit of 3072 bytes, assuming the default InnoDB storage engine.

That said, when using utf8mb4 in an index key, MySQL uses the "worst case" of each character being 4 bytes. So it effectively limits the max key size to 3072/4 = 768 characters, when a column is using the utf8mb4 character set.

For practical purposes, this doesn't cause much pain, as it's generally inadvisable to use complete long-ish strings as a key. And there are various workarounds, like using prefixes or hashes as the key, or using binary strings as keys to get the full 3072 bytes (if you don't need collation behaviors).

bastawhiz · on Oct 18, 2024

> So it effectively limits the max key size to 3072/4 = 768 characters, when a column is using the utf8mb4 character set.

This is exactly what I mean. 768 characters for an index is woefully bad. And for no obviously great reason: you can just index the encoded UTF-8 text.

This was literally reason why a former company (who will remain nameless) refused to add Unicode support. It's not even an imagined problem.

sgarland · on Oct 18, 2024

You should not be indexing 768 characters in any circumstance I can imagine. Go ahead and try it. Spin up two tables, fill them with a few million rows, and slap and index on them. Give one a reasonable prefix limit, and let the other go wild. Make sure you ANALYZE each, then run queries in a loop and check the times.

Spoiler: I literally did this a couple of days ago. The index size bloat means that any possible savings you might have gained from collisions are obliterated from page fetches. I tested with a measly 128 characters vs. a prefix of 16, and that was enough for the average query time to be equal, with the smaller index winning for the minimum.

evanelias · on Oct 18, 2024

Why did you need to index fairly long strings in their entirety in a way that preserves collation behaviors?

And why is a 768 character limit woefully bad, but a 2704 character limit is totally fine?

bastawhiz · on Oct 18, 2024

A URL, for instance, can't be safely stored in 768 characters, but it can be stored safely in 2704. If you then wanted to sort those URLs so that all URLs for each domain and path within that domain are adjacent, you need an index. Especially if you want to paginate over them with a cursor. Doing that without an index on the raw value is a royal pain in the ass.

Hell, even just being able to sort user-submitted strings up to a kilobyte. Why up to a kilobyte? Some users have strings that are kind of long. If I have to define a second column that's the truncated prefix, that's just a silly waste of space because MySQL decided to use utf-32 under the hood.

The_Colonel · on Oct 18, 2024

> it can be stored safely in 2704

No, it can't. URL doesn't have any length limit, regardless of the fact that different software will impose different limits.

bastawhiz · on Oct 18, 2024

Browser address bars have a limit of 2048, so if that's your use case, yes it's safe.

The_Colonel · on Oct 19, 2024

Safari has 80 000, Firefox 65K.

There are plenty of needs to store URLs which will never go through a browser.

You can only claim that "some URL use cases" can be stored in 2048 characters.

evanelias · on Oct 18, 2024

> A URL, for instance

VARBINARY is typically ok for that I'd think? Then you can utilize the full 3072 byte limit for the key, since there's no character set applied.

> even just being able to sort user-submitted strings up to a kilobyte

As a software engineer, I completely agree. But as a DBA, I am obligated to make a "tsk" sound and scowl disapprovingly!

crazygringo · on Oct 18, 2024

To be honest, indexes aren't designed for that. They're meant for fast lookup of short identifiers. Things like people's names and product ID's. Not long URL's. It's not performant.

If you need to keep a million long URL's in a defined sort order, my first recommendation would be, don't -- see if there's another way to achieve your end result. But if you absolutely have to, then create a new integer column to be your sort key, and use a little bit of extra code to give it values that produce the same sort order.

Creating short numerical primary keys for long strings is a common database technique.

bastawhiz · on Oct 18, 2024

> indexes aren't designed for that. They're meant for fast lookup of short identifiers. Things like people's names and product ID's. Not long URL's. It's not performant.

This is objectively false. If this was true, indexes wouldn't serve range queries. You couldn't index on dates. You couldn't sort numbers.

> But if you absolutely have to, then create a new integer column to be your sort key, and use a little bit of extra code to give it values that produce the same sort order.

This fails when you need to insert new values into the table. Then you not only need to figure out the new integer value (how, if you can't efficiently compare sorted string values???), you need to update all the integers to make room.

crazygringo · on Oct 18, 2024

Sorry, I was considering short things like dates and numbers as identifiers. I realize that's not quite right -- what I should have said was that indexes are designed for short things period (short identifiers being one of those things). Thanks.

> This fails when you need to insert new values into the table.

Yes, that's part of the extra code you need to keep the values accurately sorted. There are a lot of different particular code solutions that might work -- whether allowing for collisions and re-ordering every night with a cron job, or putting large gaps between numbers, or using floats.

But my main point stands, which is that standard relational databases are not designed to be able to maintain a sorted index of long URL's out of the box. Indexes aren't meant for that and they won't work, and this is by design. You're going to have to roll your own code for that.

Fortunately I've never come across a case in the wild where maintaining a globally sorted list of long items was required (though I'm not saying they never exist). E.g. if you're building a spider that needs to match against URL's, you'd index a short hash of the URL as a non-unique index. Or if you wanted to display sorted URL's for a site, you'd index by domain name only, and then sort the remainder of the URL at query time.

bastawhiz · on Oct 18, 2024

> But my main point stands, which is that standard relational databases are not designed to be able to maintain a sorted index of long URL's out of the box.

You keep saying that, but Postgres does a great job with no issues without any extra work. MySQL is alone in being suboptimal. "It's not designed for that" isn't a good answer, if it works great. Show me how the underlying data structures fail or perform poorly if it's really not something you should do.

evanelias · on Oct 18, 2024

> MySQL is alone in being suboptimal.

It's only suboptimal if you choose the wrong column type for the task at hand. For storing URLs, you almost certainly don't want collation behaviors, such as accent insensitivity or case insensitivity. So VARBINARY is a better choice here anyway.

And as several other commenters have mentioned, at large scale, indexing a bunch of long URLs in b-trees is indeed a bad practice performance-wise in any relational database. You won't be able to fit many entries per page, so read performance will be slow, especially for range scans.

In that situation it's almost always better to use a non-unique index over a prefix (if you need sorting and range scans) or a hash (if you don't), and disambiguate collisions by having the full value in an unindexed column. And/or split the URL up between the domain name and path in separate columns. If needed, normalize the domain names into a separate table so that the URL table can refer to them by numeric ID. etc. All depends on the specific use-case.

crazygringo · on Oct 18, 2024

No, Postgres doesn't. 2730 bytes is not long enough to hold all URL's encountered in the wild. But also, your performance will suffer if you use that whole length. You generally don't want to be doing that.

The difference between MySQL and Postgres here is negligible. It doesn't matter exactly where you define the limit of a short field, except it should probably be able to hold a maximum length filename which is 255 characters, plus some room to spare. Both MySQL and Postgres do this fine.

fweimer · on Oct 18, 2024

You might just load someone else's data, and the index is desirable in general for speeding up analytic queries. It's possible to work around that, of course. But depending on what you do, it can make writing efficient queries against the data more difficult. That's just a distraction because most of the time, those long columns won't matter anyway.

homebrewer · on Oct 18, 2024

I won't defend that utf8 brain damage, but the defaults are sane since 2018 — you don't need to set the encoding, it's set to proper utf8 out of the box. MySQL 8 cleaned up a lot of this legacy stuff.

fhdsgbbcaA · on Oct 18, 2024

Good to hear they saw the light but after I switched to Postgres I never had a single regret.

In a competitive market where people make very long term engineering decisions based on stability and reliability you can’t fuck up this badly and survive.

sgarland · on Oct 18, 2024

> This means if you use UFT-8 in MySQL, you can’t use emoji for example.

I for one have always viewed this as a perk.

fhdsgbbcaA · on Oct 18, 2024

A database that doesn’t give you back what you put into it is never a perk. It literally can’t handle storing and retrieving the data.

sgarland · on Oct 18, 2024

I don’t want to see emoji in my database. The customer is only right in matters of taste, not engineering.

fhdsgbbcaA · on Oct 18, 2024

Ok so if you are doing sentiment analysis of user product reviews you want to silently truncate emoji because you don’t like them? That’s a good idea how?

ttfkam · on Oct 18, 2024

Uhh… not wanting to see emojis is a matter of taste, not engineering.

sgarland · on Oct 18, 2024

MySQL does have ON UPDATE for its DATETIME, though; something that Postgres inexplicably still lacks.

fanf2 · on Oct 18, 2024

Isn’t ON UPDATE related to foreign keys and independent of the data type? https://www.postgresql.org/docs/current/ddl-constraints.html...

paulryanrogers · on Oct 18, 2024

Maybe they're thinking of TIMESTAMP in MySQL, which IIRC would auto update its value on any update to the row. Which was useful for uodated_at like columns. Though I think they later limited it to only the first TIMESTAMP column in a table.

sgarland · on Oct 18, 2024

No, it works for both [0] types. The first TIMESTAMP thing you’re referring to is that if a specific variable isn’t set, the first TIMESTAMP column automatically gets auto updates applied on creation and update, unless you explicitly defined it to not. This was the default behavior in 5.7, but has since been changed.

[0]: https://dev.mysql.com/doc/refman/8.0/en/timestamp-initializa...

ttfkam · on Oct 18, 2024

Transactional DDL!

fzeindl · on Oct 18, 2024

* transactional DDL

* comprehensive transaction model using different modes

* PostGIS and lots of other great extensions

* supports most of the current SQL standard and is clear on interpretation of edge-cases in the documentation

* support for writing stored procedures in any major programming language

* many useful functions regarding dates, sets, ranges, json, xml, ...

* custom datatypes

* extremely thought-out and standardized approach to development: if a feature is included it generally works well in interaction with everything else

* syntax, semantics and performance are all very predictable

* great documentation

Regarding MySQL / MariaDB: MySQL optimized for performance first. Until 2010 the standard-storage-engine MyISAM didn't even support transactions.

PostgreSQL always focused on correctness and stability and then made sure everything performed.

arkh · on Oct 18, 2024

> * custom datatypes

Good in theory. But last time I checked the main libs to connect to pgsql, everything you get back from the database are strings. So you need something in your app to convert those strings to the equivalent data structures.

ttfkam · on Oct 18, 2024

You're thinking only in terms of application. Types in the db save storage space, allow for better validation than plain strings, can be correlated cleanly with other columns with the same type, etc.

Yes, more drivers and libraries should support the more expansive data type list, but even just within the database itself there are multiple advantages.

stickfigure · on Oct 18, 2024

What's the alternative? MySQL? No transactional DDL, immediate fail.

cosmotic · on Oct 18, 2024

It's not just DDL that isn't transactional, there's a whole bunch of other things that aren't. And they break the transactionality silently. It's like an obstical course where bumping into something might be fatal.

evanelias · on Oct 18, 2024

What specific non-DDL things are you referring to here?

Aside from DDL, the only other major ones are manipulating users/grants, manipulating replication, a small number of other administrative commands, and LOCK TABLES.

This is all documented very clearly on https://dev.mysql.com/doc/refman/8.4/en/implicit-commit.html. Hardly an "obstical course".

stickfigure · on Oct 18, 2024

"Aside from missing his head, the patient appears to be in fine shape."

evanelias · on Oct 18, 2024

That hardly seems equivalent. Why do you need to e.g. reconfigure replication inside of a transaction in the first place?

The lack of transactional DDL is a totally valid complaint, but the non-DDL stuff is just a total head-scratcher to me. Aside from DDL, implicit commits have literally never impacted me in my 21 years of using MySQL.

stickfigure · on Oct 19, 2024

Sorry - I was trying to make light of the discussion. DDL is so important that it's silly to talk about the other stuff.

jes5199 · on Oct 18, 2024

I worked for a company that migrated from mysql to postgres, but then got big enough they wanted to hire fulltime database experts and ended up migrating back to mysql because it was easier to find talent

bastawhiz · on Oct 18, 2024

Dunno if that says much about Postgres, but it says a lot about the company

icedchai · on Oct 18, 2024

Ugh. I worked with MySQL earlier in my career (until about 10 years ago.) All the companies since have been Postgres. All my personal projects are Postgres. I can't imagine going back.

cvalka · on Oct 18, 2024

justin_oaks · on Oct 18, 2024

> It really feels like early 1990s vintage Unix software. It's clunky and arcane and it's hard to feel confident doing anything complex with it.

How software "feels" is subjective. Can you be more specific?

dalyons · on Oct 18, 2024

It requires a ton of somewhat arcane maintenance at scale. Vacuum shenanigans, Index fragmentation requiring manual reindexing, Txid wraparounds. I like Postgres but it’s definitely way more work to maintain a large instance than mysql. MySQL just kinda works

arkh · on Oct 18, 2024

Having to tinker with pg_hba.conf files on the server so manage how users can connect.

paulryanrogers · on Oct 18, 2024

I'd agree that is annoying yet usually just a one off task, unless you really want different IP allowlists per user.

Tostino · on Oct 18, 2024

In complex environments it is not just a one off task. I dealt with it by automating my infrastructure with ansible, but without some tooling it sucks.

threeseed · on Oct 18, 2024

The command line experience is old school style i.e. to show tables.

  \c database
  \dt

Versus:

  use database
  show tables

georgyo · on Oct 18, 2024

I started with MySQL in 2006 for my personal projects, but what first won me over to psql was those commands.

Today I use CLIs like usql to interact with MySQL and SQLite so I can continue to use those commands.

At first glance they may be less obvious, but they are significantly more discoverable. \? Just shows you all of them. In MySQL it always feels like I need to Google it.

stephenr · on Oct 18, 2024

> At first glance they may be less obvious, but they are significantly more discoverable. \? Just shows you all of them. In MySQL it always feels like I need to Google it.

In MySQL either `?` or `help` or `\?` will show you the help...

rootusrootus · on Oct 18, 2024

I assume this is really what it comes down to. If psql added those verbose-but-descriptive commands a whole bunch of people comfortable with mysql would be a lot happier using postgres.

dventimi · on Oct 18, 2024

That's psql.

fhdsgbbcaA · on Oct 18, 2024

It’s also faster to type.

eYrKEC2 · on Oct 18, 2024

Not after you have to google, "What's the equivalent of `show tables` in postgres?", because the psql command names are completely arbitrary.

ahoka · on Oct 18, 2024

They kinda make sense if you consider that Postgres was not an SQL database in the beginning. Quirky though.

Symbiote · on Oct 18, 2024

They are clearly abbreviations.

\c is for connect.

\dt is for describe tables.

mxey · on Oct 18, 2024

\? shows the help

fhdsgbbcaA · on Oct 18, 2024

Which you need to do exactly once.

kalleboo · on Oct 18, 2024

I need to manually admin my database server maybe once every 2 years or so. Definitely not remembering them 2 years later.

dventimi · on Oct 19, 2024

Sounds like a YP

Scramblejams · on Oct 18, 2024

> I've always wondered why Postgres is so insanely popular.

Just another anecdote: MySQL lost data for me (2004). I spent some time evaluating the projects and Postgres’ development process seemed much more mature — methodical, careful, and focused on correctness. Boring, which I loved.

I didn’t need whatever perf advantage MySQL had so I switched to Postgres and never looked back. And then the Oracle drama and Monty’s behavior around it — not saying he was wrong or right, but it was the opposite of boring — just reinforced my decision.

I like to play with new tech in various spots of the stack, but for filesystems and databases I go boring all the way.

paulryanrogers · on Oct 18, 2024

I've never lost data with PostgreSQL. MySQL had enough data loss bugs and foot guns that I ran into a few of them.

moogly · on Oct 18, 2024

> I've always wondered why Postgres is so insanely popular

Real answer: no licensing cost

vbezhenar · on Oct 18, 2024

For me Postgres is 100% predictable and reliable. It's neither clunky nor arcane in my experience. I don't need to think about it, I just SQL it and that's about it. It quietly works in the background. At some scale there might be some issues, but there is always known path to solve things.

DonHopkins · on Oct 18, 2024

Because it's not tainted and cursed by Oracle, like MySQL (and Oracle).

immibis · on Oct 18, 2024

That's what MariaDB is for, right? I'm surprised to hear people recommend the Oracle fork of MySQL (still called MySQL because they own the trademark) rather than the original project (now called MariaDB)