Hacker News
Matrix Orthogonalization Improves Memory in Recurrent Models
imurray
|next
[-]
https://github.com/adrianjav/pogo — POGO: A Proximal One-step Geometric Orthoptimizer
https://arxiv.org/abs/2602.14656 — An Embarrassingly Simple Way to Optimize Orthogonal Matrices at Scale; Adrián Javaloy, Antonio Vergari
BirbSingularity
|next
|previous
[-]
dapperdrake
|root
|parent
|next
[-]
Trigonometric polynomials are also polynomials. And linear spaces are all "the same". That is what the definition is for. Even the transpose-mapping is linear.
chimpanzee2
|root
|parent
|next
|previous
[-]
Anyone else feel this?
digdugdirk
|root
|parent
|next
[-]
I'm a mechanical engineer by training, and have similar vibes with the similarities I see between llm training and metallurgy. I could probably put together a formal concept for these vibes at this point, but is there actually a "there" there? I have no idea. And it would take me years to actually dive in and learn everything to gain the deep understanding that would be required to know if I'm just experiencing my own brand of AI psychosis or not.
It's a brave new world, that's for sure.
duped
|root
|parent
|next
|previous
[-]
In the 1920s we had legions of very smart, highly trained (arguably better trained in mathematics) basically chucking relays and vacuum tubes together with reckless abandon to build the most valuable and complicated systems mankind had ever come up with (telephony, radio, radar, etc). They had no idea how they worked and only ad-hoc rules of thumb to construct them.
It took the insight of a handful of these people both in and outside of industry to formalize the theory of operation of most of what people were already building and then use that theory to establish formal design practices.
The people before these theories were realized were exceptionally smart and good at what they did, it's just they didn't have better design tools to reason about the things they were building.
And once they had those tools they didn't 10x or 100x overnight.
cyanydeez
|root
|parent
|previous
[-]
I think to feel what you're feeling, you've bought into "all we need is more context". I think evolution demonstrates that's not really true.
geysersam
|root
|parent
|next
[-]
chimpanzee2
|root
|parent
|previous
[-]
reminds me of the famous anecdote of a 19th century physics professor who said "there is nothing left to be discovered in physics, only minor corrections"
then came Einstein...
seanhunter
|root
|parent
|next
[-]
cyanydeez
|root
|parent
[-]
This means we can just jump over to mars, then explore other planets, etc, etc.
We know tons of regimes where there is non-continuous progress. Finding a smart dude with an anecdote does not invalidate the breadth and width of all human experience with non-continuous systems.
Some dude thought all fluid was newtonian, and then we discovered non-newtonian fluid. It does exactly what yuou don't expect. Which basically demos physics is complex but that still doesn't mean progress is fluid.
seanhunter
|root
|parent
[-]
cyanydeez
|root
|parent
[-]
Another good line to look at is how people believe in ghosts: people with established religions without "ghosts" are less likely to believe in ghosts than people with atheism, even when they'd supposedly be skeptics superstitious claims.
Having functional paradigms is important, and being confident that there isn't a magical extrapolation into AGI is healthier than there being some magical exponential increase that you have to ride the dragon.
Sorry man, we're not solipsistic here. There are reasonable beliefs that are justifiable, instructive and then there are ones that require cherry picking technology indistinguishable from magic without reference to reality and physics.
wwweston
|root
|parent
[-]
Who are the religions without ghosts?
And is the overall point “just because we’ve made big leaps of progress doesn’t mean every challenge is tractable let alone a moonshot sprint away especially those we have solid theoretical limits on”? That’s a point I certainly find amenable to I just want to make sure I’m not missing something more subtle or sophisticated.
cyanydeez
|root
|parent
|previous
[-]
I don't need to bet anything. I'm not a sociopath who thinks the AI god needs to be built, appeased, etc. That's the torment nexus.
So, it's pretty easy to see realistically if you are satisified with local models and how they affect what you actually do.
I can see the POV of a software engineer that isn't specialized to any specific topic being replaced by various models.
But again, I see the sigmoid, not the "AGI" or the "this baby has grow very big in 1 year, urely it'll become a giant in 5.
hgoel
|root
|parent
|previous
[-]
Linear algebra is used everywhere, orthogonalization, SVD, eigenvalues etc are valuable because the resulting properties are very useful in many places.
BirbSingularity
|root
|parent
[-]
hasley
|next
|previous
[-]
I wonder what would be the result if they used a matrix that is orthogonal and closest to the source matrix. Usually one uses the Frobenius norm (root of the sum of all squared matrix entries). Maybe, one could even try another norm that gives a sparser matrix.
aesthesia
|root
|parent
|next
[-]
CamperBob2
|root
|parent
|previous
[-]
I wonder if there's a similar shortcut representation that we will eventually realize we should be using for ML. I suppose if there is one, it won't have native GPU support, so no one will bother looking for it.
phkahler
|next
|previous
[-]
big-chungus4
|root
|parent
|next
[-]
bee_rider
|root
|parent
|next
|previous
[-]
phkahler
|root
|parent
[-]
bee_rider
|root
|parent
[-]
impossiblefork
|root
|parent
|previous
[-]
If we have an singular value decomposition, M=USV^*, the columns of U are linearly independent they are a basis for the space M maps things into, and the columns of V are linearly independent then it's a basis for the space it maps things from, and [M]_{BB'} = S.
harveyrook
|next
|previous
[-]
bee_rider
|root
|parent
[-]
The concept of nonlinear eigenvalues exists, but it is a bit more exotic.
dapperdrake
|root
|parent
[-]
Someone found a way to get "something like" a tri-diagonal matrix that was equivalent to the LLM they were studying in 2022.
Apologies for being informal and hand-wavey. Been a long time and I probably forgot a few important points.