Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When humans, or dogs or cats for that matter, react to novel situations they encounter, when they appear to generalize or synthesize prior diverse experience into a novel reaction, that new experience and new reaction feeds directly back into their mental model and alters it on the fly. It doesn't just tack on a new memory. New experience and new information back-propagates constantly adjusting the weights and meanings of prior memories. This is a more multi-dimensional alteration than simply re-training a model to come up with a new right answer... it also exposes to the human mental model all the potential flaws in all the previous answers which may have been sufficiently correct before.

This is why, for example, a 30 year old can lose control of a car on an icy road and then suddenly, in the span of half a second before crashing, remember a time they intentionally drifted a car on the street when they were 16 and reflect on how stupid they were. In the human or animal mental model, all events are recalled by other things, and all are constantly adapting, even adapting past things.

The tokens we take in and process are not words, nor spatial artifacts. We read a whole model as a token, and our output is a vector of weighted models that we somewhat trust and somewhat discard. Meeting a new person, you will compare all their apparent models to the ones you know: Facial models, audio models, language models, political models. You ingest their vector of models as tokens and attempt to compare them to your own existing ones, while updating yours at the same time. Only once our thoughts have arranged those competing models we hold in some kind of hierarchy do we poll those models for which ones are appropriate to synthesize words or actions from.

 help



In a word, JEPA?

No. Not at all like that. I said:

>> nor spatial artifacts

I meant visual patterns, too. You're thinking about what I said on too granular a level. JEPA is visual, based ultimately on pixels. The tokens may be digested from pixels until they're as large as whole recognizable objects, but the tokens are not whole mental models themselves.

Here's an example of humans evaluating competing mental models as tokens: You see a car, it's white, it's got some blood stains on the door, and it's traveling towards a red light at 90 miles an hour in a 30 mph residential zone, while you're about to make a left turn. A human foot is dangling from the trunk.

You refer to several mental models you have about high speed chases, drug cartels in the area, murders, etc. You compare these models to determine the next action the car might take.

What were the tokens in this scenario? The color of the car, the pixels of blood, the speed, the traffic pattern? Or whole models of understanding behavior where you had to choose between a normal driver's behavior and that of someone with a dead body fleeing a crime scene?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: