Language Models are Injective and Hence Invertible Artwork

AI Research Today

AI Research Today unpacks the latest advancements in artificial intelligence, one paper at a time. We go beyond abstracts and headlines, walking through architectures, experiments, training details, ablations, failure modes, and the implications for future work. Each episode will choose between one and three new, impactful research papers and go through them in depth. We will discuss the papers at the level of an industry practitioner or AI researcher. If you want to understand the newest topics in AI research but don't have the time to dig through the papers yourself, this is your solution.

All Episodes

AI Research Today

Language Models are Injective and Hence Invertible

March 23, 2026 • Aaron

0:00 | 26:45

Send us Fan Mail

In this episode, we break down a fascinating new result from recent research: that modern Transformer language models are almost surely injective—meaning different prompts map to unique internal representations, with no information loss.

We dig into the paper:
Read the paper on arXiv

At the core of the proof is a surprisingly deep mathematical idea: Transformers are real analytic functions of their parameters, which allows researchers to rigorously reason about when “collisions” (two prompts producing the same representation) can occur. The result? Collisions only happen on a measure zero set—mathematically possible, but practically never observed.

We unpack:

What it means for a function to be real analytic
Why this implies near-perfect uniqueness of representations
How gradient descent preserves this property during training
And what this says about interpretability, privacy, and reversibility of LLMs

We also explore the practical implications—if models are truly invertible, could we reconstruct inputs from activations? What does that mean for safety and data leakage?

About the Host

This episode is brought to you by Arkitekt AI — an automated enterprise software development platform that builds full analytics, ML, and data systems from natural language.

Learn more: https://arkitekt-ai.com

SPEAKER_00 0:15

Hello, and welcome to another episode of AI Research Today. Thanks for tuning in. I'm your host, Aaron, from Architect AI. The paper we'll be covering today is called Language Models Are Injective and Hence Invertible. So I'll give a little bit of background as to why I chose the paper and um what it's about. Um one thing I'll mention is uh the company, I think in all the other episodes I haven't mentioned the company I'm with, Architect AI, is an uh automated enterprise software development platform. So able to reconstruct in an automated way from the ground up, pretty complex enterprise-grade software solutions. It's being deployed across a pretty wide variety of industries, which is why I choose a lot of agent types of papers to focus in on, because that's a lot of uh the work that we're doing. Um I haven't released an episode in uh couple weeks. I usually try to go to bi-weekly cadence, um, but honestly, I just hadn't found a paper that really caught my eye too much. Um honestly, you know, it reading a bunch of papers in the AI space, a lot of them kind of start to look uh rinse and repeat, like take RL variant A and apply it into situation B, run benchmark and report results. Um, so you know, I I think uh I I've been recently interested in non-transformer based architectures, uh, particularly things like Jeppa, the joint embedding predictive architecture from Jan Lacun. Um, and his uh new startup that he's been promoting. Um, I'm blanking on the name of it, but they're sort of promoting that world model first approach to um generation. Um so yeah, I I'll probably try and cover some of those papers um in the near future. Um, the one that caught my eye today, language models are injective and hence invertible. Um this is uh, you know, my I I have a background in math and physics, uh obviously in AI as well, but um, I I can always appreciate when papers take a more mathematical approach to AI. I think it's something the field lacks a bit of. A lot of it was derived pretty experimentally. Um, the transformer architecture, all the deep learning architectures really were derived based on experimental intuition. Um there has been some theoretical backbone start to be established from the math side of things, I guess more of like the applied math side of things. And there have been some interesting papers explaining experimental best practices um behind certain things. One of them, uh one of my favorite papers I'll maybe cover on a future um episode. Um, so but yeah, I always like when uh some mathematical underpinnings are given to the AI world. Um, so the paper today um tries to relate language models um as invertible functions, uh left invertible, mind you. Um the paper's out of uh University of Athens um and uh University in Rome. Um and the well, let me just read a quick segment from the abstract. I think it does a pretty good job of explaining the the paper's topic. Transformer components such as nonlinear activations and normalizations are inherently non-injective. So just a note, non-injective for maybe not the most mathy people, uh, non-injective just means that two inputs um can uh different inputs can map to the same output. Um so things like reluct functions would be non-injective, right? Because a lot of things are mapped to zero, so you could have multiple inputs mapped to the same output. So if you look at the output, you can't necessarily tell what the input was because there's not a backwards mapping that exists uniquely. Um inherently non-injective, so different inputs can map to the same output, preventing exact recovery of the input from model's representation. So that's what I was just mentioning. If I give an output to a function with out distinct inputs to output mappings, um, you don't know what my original input was. Um so they first prove mathematically that transformers map discrete input sequences um into continuous representations that are injective and therefore lossless. Lossless meaning we don't lose information going from inputs to outputs because we can recover inputs from outputs. So they they mentioned that a couple of times in the paper. Lossy compression is typically something that's assumed in transformers, they argue that that's not the case. Um and they then perform billions of collision tests. Collision tests meaning if we take two different inputs and try and um slightly vary or take an input, slightly vary it, uh, and then can can we get the same hidden state output representation? Um, if no, then it proves their initial um uh uh assumption. Um, and then they also provide this algorithm to given a hidden state representation of a decoder-only transformer model, we can recover the input prompt, which is pretty cool. And they show this working in a variety of different models and sizes. Um, so yeah, we'll first kind of walk through um a more broad explanation of their initial postulate, and uh then I'll briefly talk about the recovery algorithm algorithm. It's fairly straightforward, so I might just more talk about performance results from the collision tests. Um, but uh so the core question they seek to answer is whether internal representations faithfully preserve input uh information. So let's think again about the relu function. So for a large portion, um any input x is mapped to zero. So let's say I take I'm let I'm gonna take just a uh softball example. I'm gonna take x equals one and x equals two. Let's say those both map to zero in this function. So if I tell you, okay, my output was zero, what was my input? You don't know, it could have been one or two. Um so that the the question is whether that is the case regarding hidden states and input prompts. So if I give a hidden state representation, so the output of one uh decoder block in my um transformer architecture, I tell you the hidden state representation. Can you tell me what the original input prompt is? Um, there's been some debate about whether this is a yes or no question, and that's what this paper attempts to resolve. Um they say that it's in fact almost surely injective. So they don't say that it is completely injective, they say that it's almost surely injective. Um, and this, as you might guess, is due to the output of a probabilistic argument that we'll cover coming up here. Uh, but basically they show that the the probabilistic realm in which this is not the case, that it is not injective, is effectively zero. Um and that, in fact, even after initialization and then uh gradient descent steps towards optimization, this this condition still holds true. Um, I will say the paper is pretty long. So the the core read is eight pages, and then there's a couple pages of references, but the whole the whole paper itself is 58 pages. So pretty pretty beefy uh paper to pour through the the the appendices are all um the proofs. Uh the paper itself does more of like a hand wavy sort of thing. I don't mean that in in a derogatory way, but they provide sort of a hand wavy thing to describe what the proof does. Um, and then if you want the details of the proof, you can go to the uh appendix to look at it. Um so okay, so the key idea is that the components of the the transformer, so these would be things like embeddings, layer norms, MLPs, residual wiring, are smooth enough that the model as a whole behaves predictively with respect to its parameters. Um and they show in fact that the uh parameter set on which collisions could occur has measure zero, or that the volume of the uh parameter set that would allow this condition to hold true in the weight space is um of volume zero. Uh and they do this by relying on the um real analysticity of the individual components themselves. Um so real analytic function in real analysis uh just means that um any point in a space can be represented perfectly um within some error by a uh polynomial um function. So um and properties of uh properties of these functions um are what allow them to generate these proofs that the volumes um in which uh collisions could occur for these functions are effectively zero in the parameter space. Um okay. So again, um the function it's real analytic if so like let's let's say we have a point x0, we can write it um f of x uh at this um point x naught as a sum over uh some polynomial series. Um and since they're polynomial, then um it's infinitely derivative derivable um and the Taylor series at a point converges to the exact function itself, not just an approximation of it. Um okay. So our I'll I'll I'll read the a couple lines from one of their um blurbs here. Our central finding is that the causal decoder-only transformer language model is almost surely injective. Um so if we take some model with some width D, uh at least one attention head per block, real analytic components, meaning like the RELU function, the weight ties, layer norms, all RELU, um, or sorry, is all real analytic. Uh, and we have some finite vocabulary and finite context length. We initialize parameters theta at random with any distribution we want. Um, the key is that it's continuous, like Gaussian or something. Um, and then we train for t number, t steps, t gradient descent steps um with probability one, different inputs will have different outputs. Um okay, so this is uh the statement their paper wishes to prove. Um and so we'll turn now to uh quick kind of summaries of the few proofs that they walk through in the paper. Um so let's say, let's see, uh theorem 2.1 transformers are real analytic. So uh again with the same setup, dimension D embedding, context length K, um MLP is a real analytic, the MLP activation is real analytic, so like tan H, Relu, etc. Um, we have some sequence S, which is some element of the K-dimensional vocabulary. Um and um or sorry, not the the vocabulary is less than the the max context length K. Um I I maybe I said that in a confusing way. So we just have a sequence that's within the context window, basically. Um, so uh the statement is that the transformer as a whole is real analytic jointly in the parameters and inputs. Um, and this is just a simple function composition argument that um real analytic functions themselves composed of other analytics functions, which could be a sum, um, are then themselves real analytic. Um and so since the transformer is a finite composition of blocks like that, the entire map or transformer itself is also uh real analytic. Now, one interesting point that they call out is that uh this assumption no longer holds for quantized models or models with weight tying um in which you're artificially constraining your parameter space, um, then there's more of a possibility of these collisions uh occurring. Um okay, so the next theorem uh transformers are almost sure, have almost sure injectivity at initialization. So if we draw theta from some distribution, let's say Gaussian, so we've just initialized the parameters of the weight matrices and other parameters that are initialized in our transformer. Um we take two distinct prompts, S and S prime. Um there the statement is the probability that the output of each of those is equal, given that the inputs are different, is zero. So okay, uh let's see how how how do we do this here? So um let's take these two input prompts that are different, s and s prime, and let's consider the difference between their outputs. So this is denoted as h in the text. Um but uh let's let's say the r of s minus r of s prime where where r is our transformer, each of these uh keep would have the same theta set of parameters, and then we're looking at the um squared norm, basically like the distance from these two outputs. So we have two different inputs, and we have a function that's just the difference in the outputs of the transformer. Um, this function itself is real analytic um because it's composed of real analytic functions. Um and the uh a statement from real analysis in real analytic functions says that either h is identically zero or its zero set has Lebesque measure zero. Um, so the the Labesque measure measure zero would refer to the volume of the parameters which allow this to be true. If h is identically zero, that means that the outputs are equivalent, which breaks the um injectivity. So we want to rule out that these outputs could be the same, meaning that the only case remaining for this function is that the parameter set that allows this to be the same as Lebesgue measure zero. Um and so they rule out, they they find a single case where the um h is identically zero is broken. Um, and this is just done using some sort of um arguments around the difference and last tokens for these two input sequences. Um you just uh freeze the last uh token state, um, and then you can do some operations upstream of that uh to show that um the outputs are in fact different. Um and this uh is what breaks that initial um identical to zero argument. Um so uh if H is not identically zero, the collision set then has Lebesgue measure zero. Uh and so the probability of sampling from a volume of um measure zero is zero, and so that's how they wind up at this statement that the transformers are um injective. Um they also make another uh uh follow-up theorem, theorem 2.3, that injectivity is preserved under training. So this one's important, I guess. You know, like if if we initialize some set of parameters at time theta, and then we train our model for t steps of gradient descent, um, does the injectivity condition still hold? Um, and they make another kind of similar argument that um we take the the gradient to step gradient step function itself, phi, and then your update is phi minus eta um grad L of theta, um, where your L is your training loss. Um this itself is real analytic, um, and then they make another uh real analytic composition argument to make the case that starting from a continuous sample um leaves you in this continuous sample, and so your real analyticity uh is is preserved, which means that your parameter set is still of measure zero and your um updated parameter at state t. So these are sort of the core um theorems that are derived in the paper. Um they do come up with a recovery algorithm, SIPIT, um, which allows them, given some hidden state representation, to recover the input prompt. Um, I won't go through details of that algorithm, but I will discuss a little bit about the experiments they run. Um so they take five billion pairwise comparisons, five billion prompts, uh, and observe no collisions across all models and layers. So this framework could be we we're just mapping hidden states to prompts. Um, so this could be measured at any layer. Uh the they try this for a few different models 5.4, Mistral 7, Llama 8, um, and then Tiny Stories 33M, which I'm not even familiar with that one. Um, and then they also look at some of the original GPT-2 models and Gemma 3 up to 12B. Um, and um again in the non-quantized version of the models, they observe no collisions. Um they also call out uh they don't observe collisions in the quantized models, but that mathematically it should still be possible. Um sorry, I just noticed that as I was scrolling through the paper, probably worth calling out. Um, and then they perform tests of their SIPIT algorithm um to recover input prompts. So they do this for Mistral 8B and Llama 7B, and they report 100% accuracy in linear time um with the percentage of vocabulary explored. So they only explore 1% of the vocabulary, but the vocabulary of these models is absolutely massive, so it would be a difficult experiment to um to explore much more or a more complete percentage of the vocabulary. Um so yeah, that's uh kind of takes us to the end of the paper. Um They provide reproducibility setup so they tell the hardware, software, data sets. Um, they have the prompt benchmark. Um, it doesn't say they released the code uh to run this, but um yeah, they have some things released. Uh like I said, the back half of the paper is a lot of proofs involving polynomials that are real analytic and some statements, um, things like Taylor series and that sorts of those sorts of things to prove the theorems we kind of hand waved our way through above. Um, but yeah, I think one one thing I did uh I will say I found a little annoying was I I feel like some I I mean I feel hypocritical here, but I I definitely think AI was used to write some of the paper, um, just because the first two or three pages repeat the same three points multiple times, which just sort of seems like you're trying to make it longer, and so you just end up repeating things. Um but not a huge deal. Um, the results itself are pretty interesting. Another complaint I had is the um initial figure, figure one in the paper, has a map between a saddle and a latent space with an epsilon and delta. Um, it makes you think of the continuous limit definition um from real analysis or from calculus, I guess, as well. Um, but uh really they're just trying to show that different inputs map to different outputs, so it's not like a um uh a limit argument or continuous function argument, continuity to point or limits approaching each other, sort of thing, just felt a little misleading. Um, but yeah, beyond that, I like the approach. I think it's pretty cool that you can recover the inputs from the outputs. I think some interesting downstream effects to consider from something like this is um IP theft. So if we can reconstruct input prompts from hidden states, and they do mention this in the paper as well. Um, does this allow some sort of way to potentially design something that allows you to recover weights um given outputs of a model? Um the there were there's a lot of controversy in the news with some of the Chinese firms um that have been accused of mass stealing outputs from models like Claude and OpenAI, and then and then distilling the knowledge into their own transformers. Um so yeah, IP theft is already a concern in the industry, especially if you spend a billion dollars to train one of these models, someone comes along and swipes the weights. Uh, that's not a place you want to be in. Um so yeah, I think some potentially nefarious ways things like this could be put into effect. Um, but nonetheless, I like the approach to defining the transformer more mathematically and trying to understand its function in a more abstract space, I think frees you of the constraints of having to worry about, you know, minor variant of attention number 4003, and more into the overarching um setup of what these models are doing and how to relate them to each other in a more abstract space. Um, but yeah, that's uh all I got today. I'll post the link to the paper in the episode description. Um if you have questions for me, feel free to email me um at uh support at architect-ai.com. That's ark i t-e k t um.com. And other than that, yeah, if you have any papers you'd like me to discuss, or if you've written a paper you'd like to discuss, come on as a guest in the podcast, I would be open to that discussion. Just send me an email or leave a comment on a thread and I'll pick it up. And other than that, I appreciate everyone listening.