

that suggests to me either you know how it was trained, or are making assumptions about how it was trained.
The general way it is trained is known, specifics and technics are not known, but the public do know how one of the flagship model was trained, the training process of deepseek r1 was documented in their research paper: https://arxiv.org/pdf/2501.12948
Which I did read a chunk when it was released.
And if the idea that there would be nobody training an LLM to do addition in this manner, I invite you to glance the Wikipedia article on addition.
The LLMs have multiple way to do additions, i’ll showcase two as an example, I asked ChatGPT 4.1 to solve a big addition. Here it’s output:
I need to add these two large numbers carefully.
First, I'll write them one under the other, aligned at the right:
21,500,996,531,314,339
+ 96,357,764,557,683,165,397,721
----------------------------------
Working from right to left:
Units place: 9 + 1 = 10. Write 0, carry 1.
Tens place: 3 + 2 + 1(carry) = 6. Write 6, no carry.
Hundreds place: 3 + 7 = 10. Write 0, carry 1.
Thousands place: 4 + 7 + 1(carry) = 12. Write 2, carry 1.
Ten thousands place: 1 + 9 + 1(carry) = 11. Write 1, carry 1.
Hundred thousands place: 3 + 3 + 1(carry) = 7. Write 7, no carry.
Millions place: 1 + 5 = 6. Write 6, no carry.
Ten millions place: 5 + 6 = 11. Write 1, carry 1.
Hundred millions place: 3 + 1 + 1(carry) = 5. Write 5, no carry.
Billions place: 5 + 3 = 8. Write 8, no carry.
Ten billions place: 9 + 8 = 17. Write 7, carry 1.
Hundred billions place: 9 + 6 + 1(carry) = 16. Write 6, carry 1.
Trillions place: 0 + 7 + 1(carry) = 8. Write 8, no carry.
Ten trillions place: 0 + 5 = 5. Write 5, no carry.
Hundred trillions place: 5 + 5 = 10. Write 0, carry 1.
Quadrillions place: 1 + 4 + 1(carry) = 6. Write 6, no carry.
Ten quadrillions place: 2 + 6 = 8. Write 8, no carry.
For the remaining places, I just copy the digits from the second number:
7, 7, 5, 3, 6, 9
Therefore, the answer is:
96,357,786,056,867,851,761,260
You can notice, the whole reasoning is correct, but it wrote the wrong response, I can expand more on this if you want (I do some research on it on my free time)
This reasoning of decomposing the addition was of course learned from training data.
Now, the trigonometry used to calculate additions that i talked earlier, is not for writing a “reasoning” but when it try to write the correct response. It was created by the backpropagation trying to find a local minimum that can solve additions in order to more accuratly predict the next token.
so I would point out that technically LLMs have “tensors” not “neurons”.
I get that tensors are designed to behave like neurons, and this is just me being pedantic. I know what you mean when you say neurons, just wanted to clarify and be consistent. No shade intended.
Artificial neurons were made to behave like neurons: https://en.wikipedia.org/wiki/Artificial_neuron
And the terminology used, is neurons, cf the paper i sent earlier about how they do additions: https://arxiv.org/pdf/2502.00873
I’ve been re reading my response and my bad, I meant “artificial neurons were inspired from neurons”, not to behave like, they have little in common.
If you asked an human that speak german and nothing else, a question in english, it would also respond in german (that they cant understand you).
LLMs sometimes (not often enough) do respond they don’t know.