How Large Language Models Think
AI is rapidly evolving. Visual and large language models can create images, essays,
computer code, and more and even correct their mistakes. So, how do they
“think?”
According to U.C. Santa Barbara’s Fabian Offert, people have been claiming that large
language models have a ‘world model’ of certain things, including computation. Offers, an
assistant professor of digital humanities says that it’s not just superficial knowledge that
coding words often appear together, but a more comprehensive understanding of
computation itself.
Large language models have abilities you wouldn’t expect if they were merely
predicting the next word in a sequence. They can write a novel and computer code. LLMs
seem to have contextual memory in a way that simple Markov chains and predictive
algorithms don’t.
Offert asked ChatGPT to carry out a few tasks, including coding a Markov chain that would
generate text based on the novel “Eugene Onegin,” by Alexander Pushkin. After a couple
of false starts, the A.I. produced a working Python code for a word-level Markov chain
approximation of the book. Then, the task was to simulate the output of a Markov simply.
Chain. He found that the A.I. could simulate a Markov chain at the level of words and phrases
but couldn’t estimate the output of a Markov chain letter-by-letter. If it genuinely possessed a
concept of computation, predicting a letter-level Markov chain should be pretty straightforward.
Results showed, however, that ChatGPT does not have a world model of computation. Offert wants to understand better the new entities that have come into being over the last few years. “What can we know with these things? And what can we know about these things?”
“More and more, the questions that technical researchers ask about A.I. are, at their
core, humanities questions,” Offert said. “They’re about fundamental philosophical insights,
like what it means to have knowledge about the world and how we represent knowledge
about the world.”
Offert believes that the humanities and social sciences have a more active part to play in
the development of A.I. Offert is trying to understand how the models represent the world
and make decisions. Because they do know the world, he assures us —
connections gleaned from their training data. Going beyond epistemological interest, the
topic is also of practical importance for aligning the motivations of A.I. with those of its
human users.