More

yorwba · 2026-04-13T12:56:06 1776084966

Yes, metamath uses a large collection of specialized but reusable building blocks, so it doesn't blow up exponentially. However, if you want to "just do gradient descent" on general trees built from a single universal primitive, you now have to rediscover all those building blocks on the fly. And while the final result may have a compact representation as a DAG by merging common subexpressions, you also need to be able to represent potential alternative solutions, and that's where the exponential blowup comes in.

Or you could accept that there's already a large collection of known useful special functions, and work with shallower trees of those instead, e.g. https://arxiv.org/abs/1905.11481

yorwba · 2026-04-13T10:27:34 1776076054

Yeah, of those 6 tasks, only "halluc-doc-http-handler" isn't within 1% of the previous result. 86.6% is 13/15 rounded down, so if they sampled 15 attempts for that task, the probability of getting 100% when the true success rate was 13/15 would be (13/15)^15 > 0.11, which is not all that unlikely.

yorwba · 2026-04-13T08:19:18 1776068358

I think you underestimate the amount of knowledge needed to deal with the complexities of language in general as opposed to specific applications. We had algorithms to do complex mathematical reasoning before we had LLMs, the drawback being that they require input in restricted formal languages. Removing that restriction is what LLMs brought to the table.

Once the difficult problem of figuring out what the input is supposed to mean was somewhat solved, bolting on reasoning was easy in comparison. It basically fell out with just a bit of prompting, "let's think step by step."

If you want to remove that knowledge to shrink the model, we're back to contorting our input into a restricted language to get the output we want, i.e. programming.

yorwba · 2026-04-12T08:18:21 1775981901

There are US-based companies offering inference for MiniMax models charging slightly less than what MiniMax charges. MiniMax themselves claim to be using data centers in the US. US companies training their own closed-weight models charge so much more because they can. They're monopoly providers for their own models, so they can ask for whatever amount people are willing to pay.

operatingthetan · 2026-04-12T08:26:18 1775982378

Interesting, what companies?

yorwba · 2026-04-12T08:44:51 1775983491

OpenRouter has a list of providers: https://openrouter.ai/minimax/minimax-m2.5

yorwba · 2026-04-12T08:03:00 1775980980

When people criticize Aisle's methodology, they aren't "defending Mythos," they're bashing Aisle for their disingenuous claims.

yorwba · 2026-04-11T18:56:11 1775933771

The first time was under the name Adriana Shayota: https://www.justice.gov/usao-sdca/pr/federal-jury-convicts-s...

yorwba · 2026-04-11T18:11:14 1775931074

We don't even need to hypothesize that much on the irrelevant nonsense, since they helpfully provide data with the detected vulnerability patched: https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jag... and half of the small models they touted as finding the vulnerability still found it in the patched code in 3/3 runs. A model that finds a vulnerability 100% of the time even when there is none is just as informative as a model that finds a vulnerability 0% of the time even when there is one. You could replace it with a rock that has "There's a vulnerability somewhere." engraved on it.

They're a company selling a system for detecting vulnerabilities reliant on models trained by others, so they're strongly incentivized to claim that the moat is in the system, not the model, and this post really puts the thumb on the scale. They set up a test that can hardly distinguish between models (just three runs, really??) unless some are completely broken or work perfectly, the test indeed suggests that some are completely broken, and then they try to spin it as a win anyway!

A high false-positive rate isn't necessarily an issue if you can produce a working PoC to demonstrate the true positives, where they kinda-sorta admit that you might need a stronger model for this (a.k.a. what they can't provide to their customers).

Overall I rate Aisle intellectually dishonest hypemongers talking their own book.

yorwba · 2026-04-10T15:48:47 1775836127

Previous discussion: https://news.ycombinator.com/item?id=47716043 (764 points 5 hours ago, 384 comments)

yorwba · 2026-04-09T11:30:44 1775734244

I guess it's based on Wikidata's "coordinate location" property. You can find the Wikidata item corresponding to a Wikipedia article using the "Tools" dropdown in the desktop view. (Or just search on Wikidata directly.)

yorwba · 2026-04-09T10:53:28 1775732008

Mozilla Corporation may have enough money, but they don't develop Thunderbird. If you used the donation form on this page, you didn't donate to Mozilla Corporation, but to the company developing Thunderbird. So all is fine.

swiftcoder · 2026-04-09T14:44:21 1775745861

Mozilla Corporation (for-profit Firefox management org) doesn't take donations, and are mostly funded by selling search placement to Google.

The Mozilla Foundation (non-profit parent org) does take donations. Which they could presumably funnel some of down to thunderbird development, but they chose not to, and now have this other for-profit management org fundraising Thunderbird separately...