Hacker Newsnew | past | comments | ask | show | jobs | submit | reedlaw's commentslogin

Are you saying Chinese is more concise than English? Chinese poetry is concise, but that can be true in any language. For LLMs, it depends on the tokenizer. Chinese models are of course more Chinese-friendly and so would encode the same sentence with fewer tokens than Western models.

> Are you saying Chinese is more concise than English?

Yeah, definitely. It lacks case and verb conjugations, plus whole classes of filler words, and words themselves are on average substantially shorter. If you listen to or read a hyper-literal transliteration of Chinese speech into English (you can find fun videos of this on Chinese social media), it even resembles "caveman speech" for those reasons.

If you look at translated texts and compare the English versions to the Chinese ones, the Chinese versions are substantially shorter. Same if you compare localization strings in your favorite open-source project.

It's also part of why Chinese apps are so information-dense, and why localizing to other languages often requires reorganizing the layout itself— languages like English just aren't as information-dense, pixel for pixel.

The difference is especially profound for vernacular Chinese, which is why Chinese people often note that text which "has a machine translation flavor" is over-specified and gratuitously prolix.

Maybe some of this washes out in LLMs due to tokenization differences. But Chinese texts are typically shorter than English texts and it extends to prose as well as poetry.

But yeah this is standard stuff: Chinese is more concise and more contextual/ambiguous. More semantic work is allocated in interpretation than with English, less is allocated in the writing/speaking.

Do you speak Chinese and experience the differences between Chinese and English differently? I'm a native English speaker and only a beginner in Chinese but I've formed these views in discussion with Chinese people who know some English as well.


Chinese omits articles, verbs aren't conjugated, and individual characters carry more meaning than English letters, but other than those differences I don't have the impression that Chinese communication is inherently more concise. Some forms of official speech are wordy. Writing is denser, but the amount of information conveyed through speech is about the same. There are jokes about ambiguous words or phrases in both Chinese and English. So I was surprised at your take, but no objection to your points above. Ancient Chinese, on the other hand, is extremely concise, but so are other ancient languages like Hebrew, although in a different way. So it seems that ancient languages are compressed but challenging and modern languages have unpacked the compression for ease of understanding.

I'm going to guess Chinese and English is going to come out about the same, when someone invents the right metric to compare them. I recall reading about a study somewhere that compared speech in multiple languages wrt. amount of information communicated per second, and the reported result was they were all the same, because speakers of more verbose languages (longer words, simpler grammar) unknowingly compensate speaking faster than baseline.

That's a really interesting point about Ancient Chinese and other ancient scripts. I'd love to learn more about that.

I'm also more curious about tokenizers for LLMs than I've ever been before, both for Chinese and English. I feel like to understand I'll need to look at some concrete examples, since sometimes tokenization can be per word or per character or sometimes chunks that are in between.


I've come to the conclusion that Hacker News is the best aggregator out there. Substack knows my interests yet gives terrible recommendations. Youtube constantly recommends the same videos or exaggerates my interest in a topic based on a few views, spamming me with related content until I watch something unrelated. The only downside of Hacker News is that its focus is narrower than other sites. But perhaps because the focus is "Anything that good hackers would find interesting" there is a bias towards things I find interesting with less noise than more commercial offerings.

Claude has trained me on the use of the word 'invariant'. I never used it before, but it makes sense as a term for a rule the system guarantees. I would have used 'validation' for application-side rules or 'constraint' for db rules, but 'invariant' is a nice generic substitute.


Why is Haskell irrelevant to the argument that LLMs can't reliably permute programming knowledge from one language to another? In fact, the purity of the language and dearth of training data seems like the perfect test case to see whether concepts found in more mainstream languages are actually understood.


Because human programmers routinely fail to do that too. Haskell is an obscure language that came out of academic research. Several of the core semantics (like lazyness by default) didn't get adopted anywhere else and are only found in Haskell.


Then I would say this is another proof that LLMs lack intellect or ability to reason about universals. See https://michaelmangialardi.substack.com/i/186405810/test-4-p...


This is the second endorsement I've seen today. I gave OpenSpec a shot and was dismayed by the Explore prompt. [1] Over 1,000 words with verbose, repetitive instructions which will lead to context drift. The examples refer to specific tools like SQLite and OAuth. That won't help if your project isn't related to those.

I do like the basic concept and directory structure, but those are easy enough to adopt without all the cruft.

1. https://github.com/Fission-AI/OpenSpec/blob/main/src/core/te...


Do you have examples of the task maturation cycle? I'm not sure how it would work for tasks like extracting structured data from images. It seems it could only work for tasks that can be scripted and wouldn't work well for tasks that need individual reasoning in every instance.


No practical code example, sorry. The post is based on my own experience using agents, and I haven't reached a reusable generalization yet.

That said, two cases where I noticed the pattern:

Meal planning: I had a weekly ChatGPT task that suggested dinner options based on nutritional constraints and generated a shopping list (e.g. two dinners with 100g of chicken -> buy 200g). After a few iterations, it became clear that with a fixed set of recipes and their ingredients, a simple script generating combinations was enough. The agent's reasoning had already done its job — it helped me understand the problem well enough to replace itself.

QA exploration: I was using an agent to explore a web app as a QA tester. It took several minutes per run. After some iterations, the more practical path was having it log its explorations to a file, then derive automated tests from that log. The agent still runs occasionally, but the tests run frequently and cheaply.

Regarding your point about tasks that need individual reasoning every time — I think you're right, and that's actually the core of the idea. Not every task matures into a script. Extracting structured data from images probably stays deliberative if the images vary significantly. The cycle only applies to tasks that, after enough repetitions, reveal a stable pattern. The agent itself is what helps you discover whether that pattern exists.


How do you even begin to define objective measurements of software engineering productivity? You could use DORA metrics [1] which are about how effectively software is delivered. Or you could use the SPACE Framework [2] which is more about the developer experience.

1. https://cloud.google.com/blog/products/devops-sre/using-the-...

2. https://space-framework.com/


I don't have time for that mysticism. I just know.


Plasma Bigscreen has been around for 6 years: https://itsfoss.com/news/plasma-bigscreen-comeback/


I was referring to https://news.ycombinator.com/item?id=47283124

Anyway, I will try it in its current status. I basically need a launcher for the desktop Jellyfin app and not much more



Isn't this the premise behind Dilbert?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: