Why Encyclopedias Are Still Important

Here is a little argument for the enduring necessity of encyclopedias, despite the rise of LLMs. This will have two parts: the first more philosophical, developing principles about the “organic” nature of intelligence and judgment, and the second an application of these principles to the purposes of encyclopedias.

LLMs (basically, AI engines that we can interact with) have, of course, been developing amazingly. But since ChatGPT 3.0 was first introduced, it has become increasingly clear that these tools are asymptotically approaching the quality of their best training material. They can also make immediate and obvious inferences from it. But my view—which I will not argue for here, or not much—is that they will not surpass it.

The limitations of LLMs are (and were) predictable, given the nature of the fundamental pattern-matching technology. It’s just predicting what an intelligent person would say, based on its inputs. Some people, however, were expecting this very technology to be little different from human intelligence, or even an improvement over it. That, I think, is a mistake. To be sure, the technology is amazing. It will certainly get smarter than it is now. But, apart from speed and sheer breadth of knowledge, “godlike” or “superhuman” intelligence is not in the offing, at least not with current AI technology. The best we can hope for is ever faster and high-quality reflections and applications of training data, which is all, ultimately, human. (Of course, it is possible to train a model on non-human inputs, such as raw numbers and video, and this is extremely important as well; but what results is not even intelligence, but an increasingly sophisticated attempt to predict similar numbers and video.)

It is worth dwelling on what LLMs are missing. What they lack, they lack utterly and totally, and always will. Articulating precisely what is lacking is difficult. We might try to put it this way: they lack an independent means of judgment beyond inputs. But this formulation is ultimately uninformative, because we are still only saying that the outputs of LLMs are determined by their inputs. We already knew that.

This question—what LLMs are missing—is philosophical. We are hard put to explain what it is about the products of human intelligence that reflects something in addition to our own inputs. Some reductionists, of a behaviorist stripe, might placidly assure us that ultimately, we too are just complex functions of our sensory inputs. Given enough time and sophistication, the LLMs too will develop human-like intelligence, they might say.

But, it seems to me, this misses the point, because we can already see that LLMs are missing something, and will go right on missing it.

Let me get a proposal on the table, then. Never mind human beings, biological organisms have rich and direct inputs from an analog world, and more importantly, they are decidedly not reducible to their inputs. Being organic, they are individuals with a history, whose brains are constantly running, processing, often without any specific inputs or observable outputs. What makes living intelligence organic is the degree to which cognitive processing is integrated into the life of the organism itself. Things are going on “in the depths”—and that is because the organism itself is an inherently complex entity. They do act as functional input-output machines sometimes; even the most intelligent human beings do. But so often they do not. And the point here does not concern unpredictability, nor that we are analog and not digital. Some organic beings are quite predictable indeed, in some ways more predictable than LLMs. But the biological processes, when we state a judgment or report something imagined, for example, are not mere functions. Again, they are expressions—almost appendages—of the living beings.

Clarifying analogies may be seen the animal world. A cat meows, a bird chirps, a fish swims, all without being taught. These things are, we say, “hard wired” into the cognitive equipment of the animal. Because they are not learned, they are not the results of “inputs.”

The same, I suggest, is true of human mental equipment. As a cat meows—organically—so a man thinks, a woman judges, a child imagines. These, too, are not mere functions. They require material to work on, of course; but they evidently are not reducible to that material. So, whenever a thoughtful college student “reinvents the wheel” about some philosophical problem, for example, without ever having read anything about it before, he is doing something that LLMs are simply not designed to do. He is not spitting back reconstructions of ideas heard before; he is finding out for himself things that, it just so happens, other human beings have found out before him. His invention, however unoriginal, is created by him, not some complex recombination of content he was exposed to.

The difference between LLMs and human intelligence, then, is explained most fundamentally in this formula: Biological organisms have self-contained processes that originate action, while LLMs are merely functions that operate on their data. We may apply this to human intelligence as follows: Human intelligence organically creates new judgments, while an LLM merely reflects its training data.

So human beings have a capacity of judgment of a sort that LLMs do not. LLMs can only compare newly-presented inputs to old. Human beings are doing that in part, but they are not doing only that; they are also doing something fundamentally different, a kind of creativity that is part of what it means to be alive. One consequence of this is that only human beings (or beings of similar intelligence) could create the original training data on which conversational LLMs (as opposed to image-generating ones, etc.) are trained.


This “organic creativity,” as we may call it, has been essential to invention. Without it, knowledge and culture would not have advanced. The history of human knowledge is obviously not just an extrapolation of previous inputs. So, indeed, we might perfect variations on current inputs, and that is certainly a kind of progress. But it requires organic judgment and creativity to extend knowledge in previously unforeseen ways.

Now, encyclopedia articles—to get back to the original topic—rarely “extend knowledge in previously unforeseen ways.” So do they really require human intelligence?

The thing is, encyclopedias have two functions. One is a thing at which LLMs excel: they provide quick, easy answers, systematically organized. But this is not enough: “easy answers”? What if they are wrong? So the other function of encyclopedias is deeper and more difficult, but no less essential: they make judgments as to what the facts are, even if they are facts about differing opinions. LLMs cannot do this. It might look like they do, simply because they make assertions; but again, they are merely reflections of their training data. They are originating no judgments. They are not, properly speaking, “entities” or “individuals” at all, but only functions. The entities behind them are the corporations running the LLMs—or perhaps their programmers or their boards of directors, or whoever decides on the training data and key development decisions. (There’s another philosophical question, but never mind.)

You might say that a neutral encyclopedia doesn’t make judgments. Doesn’t Wikipedia, for example, take a “neutral point of view”? In that case, maybe an LLM could do what it does. Actually, I agree that an LLM could appear to do what Wikipedia does; that’s not really the point. By the way, Wikipedia is no longer devoted to neutrality, whatever it says. It would be more honest to say that it is expressing the Establishment point of view, or whatever its “reliable sources” say, which, on political subjects, are mostly left-wing. The point is this: Even if Wikipedia were still beautifully executing a robust neutrality policy, it would be making judgments about what is true. The truly neutral statements would presumably, by its neutrality, be easily endorsed by most of us; but, without any training data, an LLM could not join in the endorsement. This is a basic but extremely important point.

Now, someone—who was not quite getting the point—might reply as follows. Aren’t LLMs capable of summing up things in generalities, without the inputs of broad, encyclopedic statements? Can’t they “draw conclusions,” of a sort, from specialized research? This is worth addressing, as it will strengthen the argument.

It is true that LLMs are in principle capable of synthesizing specialized research, as in general, encyclopedic articles; but as to their present state, I sometimes I have my doubts. I mean that when I discuss specialized questions with an LLM, questions that I know something about that extend beyond what can be found in textbooks and encyclopedia articles, I frequently discover that the LLM is helpless to synthesize insights beyond the simplest applications. It chirps appreciatively in response to my insights, which are nowhere to be found in its training data, and can even make useful reactions, but that is very different from giving me the thoughtful, frequently creative and additive, judgment of a specialist. For that, I must still consult a specialist.

The point may be made another way: I not infrequently ask an LLM a general question and get a rather biased conventional answer. Then I dive into the details, extracting concessions that contradict the biased answer. I strongly suspect that its general statements reflect equally broad statements in the training data, while more specialized statements reflect narrower statements in the training data. It seems the process of machine learning does not typically work out the contradictions. The point is that, for now, at least, LLMs do seem to depend on summaries of knowledge at all levels of generality.

But maybe LLMs will improve in this regard. Maybe they will be able to make reliably sophisticated “judgments” on specialized encyclopedic questions based on specialized journal articles, say, without having seen any specialized encyclopedias. Maybe.

Let me come to the more essential point, however: Drawing conclusions about what is generally believed requires human judgment. This would apply just as well to synthesizing conclusions from fine-grained research; as in a literature review article, such conclusions are often just as “organic” and “creative” as the original research in the first place. And this is not because LLMs are incapable of mimicking such intellectual work; perhaps indeed they soon will be. It is because human judgment is the very standard by which LLMs themselves would be judged. LLMs that merely asymptotically approach the best training data are not and never will be the yardsticks of truth; only true intelligence can be.

Consequently, we should not trust LLMs unless we find their statements mirror broad statements by some dve sources—nor should we. LLMs act as black boxes, which their designers, especially if they are closed source, can fiddle with in all sorts of ways. We are not distrusting the LLMs, which, again, are not entities but mere functions. We are distrusting their owners and designers; if I find bias in an LLM, I blame its managers, because the LLM is a literally soulless process, incapable of anything like judgment. It merely reflects its programming, as the simplest logic gate does.

By the way, this is why the Knowledge Standards Foundation (the 501(c)(3) I’m working on now) continues its commitment to developing an AI front end to the Encyclosphere. We have done research and testing on this but are far from a product. Our vision is that you ought to be able to ask an LLM a question and get not just a black-box generated answer, but by default get direct quotations from—not merely vague references to—authoritative, encyclopedic sources. The KSF has gathered quite a few of those authoritative sources, and we (continue to) invite developers to use our software to gather even more.

We will always need encyclopedias because we will need human judgment to be summarized on questions of every level of generality. Even if, in the future, our main way of accessing those statements is via LLMs, the statements themselves must be created by organic entities that express organic judgments. We should not stop depending on human specialists to act as the ultimately reliable sources on questions in their areas of specialty. And encyclopedias (especially specialized encyclopedias) will continue to be valuable tools for summing up what they have learned.


by

Posted

in

, ,

Comments

Please do dive in (politely). I want your reactions!

Leave a Reply

Your email address will not be published. Required fields are marked *