Why Encyclopedias Are Still Important

January 29, 2025

Here is a little argument for the enduring necessity of encyclopedias, despite the rise of LLMs. This will have two parts: the first more philosophical, developing principles about the “organic” nature of intelligence and judgment, and the second an application of these principles to the purposes of encyclopedias.

LLMs (basically, AI engines that we can interact with) have, of course, been developing amazingly. But since ChatGPT 3.0 was first introduced, it has become increasingly clear that these tools are asymptotically approaching the quality of their best training material. They can also make immediate and obvious inferences from it. But my view—which I will not argue for here, or not much—is that they will not surpass it.

The limitations of LLMs are (and were) predictable, given the nature of the fundamental pattern-matching technology. It’s just predicting what an intelligent person would say, based on its inputs. Some people, however, were expecting this very technology to be little different from human intelligence, or even an improvement over it. That, I think, is a mistake. To be sure, the technology is amazing. It will certainly get smarter than it is now. But, apart from speed and sheer breadth of knowledge, “godlike” or “superhuman” intelligence is not in the offing, at least not with current AI technology. The best we can hope for is ever faster and high-quality reflections and applications of training data, which is all, ultimately, human. (Of course, it is possible to train a model on non-human inputs, such as raw numbers and video, and this is extremely important as well; but what results is not even intelligence, but an increasingly sophisticated attempt to predict similar numbers and video.)

It is worth dwelling on what LLMs are missing. What they lack, they lack utterly and totally, and always will. Articulating precisely what is lacking is difficult. We might try to put it this way: they lack an independent means of judgment beyond inputs. But this formulation is ultimately uninformative, because we are still only saying that the outputs of LLMs are determined by their inputs. We already knew that.

This question—what LLMs are missing—is philosophical. We are hard put to explain what it is about the products of human intelligence that reflects something in addition to our own inputs. Some reductionists, of a behaviorist stripe, might placidly assure us that ultimately, we too are just complex functions of our sensory inputs. Given enough time and sophistication, the LLMs too will develop human-like intelligence, they might say.

But, it seems to me, this misses the point, because we can already see that LLMs are missing something, and will go right on missing it.

Let me get a proposal on the table, then. Never mind human beings, biological organisms have rich and direct inputs from an analog world, and more importantly, they are decidedly not reducible to their inputs. Being organic, they are individuals with a history, whose brains are constantly running, processing, often without any specific inputs or observable outputs. What makes living intelligence organic is the degree to which cognitive processing is integrated into the life of the organism itself. Things are going on “in the depths”—and that is because the organism itself is an inherently complex entity. They do act as functional input-output machines sometimes; even the most intelligent human beings do. But so often they do not. And the point here does not concern unpredictability, nor that we are analog and not digital. Some organic beings are quite predictable indeed, in some ways more predictable than LLMs. But the biological processes, when we state a judgment or report something imagined, for example, are not mere functions. Again, they are expressions—almost appendages—of the living beings.

Clarifying analogies may be seen the animal world. A cat meows, a bird chirps, a fish swims, all without being taught. These things are, we say, “hard wired” into the cognitive equipment of the animal. Because they are not learned, they are not the results of “inputs.”

The same, I suggest, is true of human mental equipment. As a cat meows—organically—so a man thinks, a woman judges, a child imagines. These, too, are not mere functions. They require material to work on, of course; but they evidently are not reducible to that material. So, whenever a thoughtful college student “reinvents the wheel” about some philosophical problem, for example, without ever having read anything about it before, he is doing something that LLMs are simply not designed to do. He is not spitting back reconstructions of ideas heard before; he is finding out for himself things that, it just so happens, other human beings have found out before him. His invention, however unoriginal, is created by him, not some complex recombination of content he was exposed to.

The difference between LLMs and human intelligence, then, is explained most fundamentally in this formula: Biological organisms have self-contained processes that originate action, while LLMs are merely functions that operate on their data. We may apply this to human intelligence as follows: Human intelligence organically creates new judgments, while an LLM merely reflects its training data.

So human beings have a capacity of judgment of a sort that LLMs do not. LLMs can only compare newly-presented inputs to old. Human beings are doing that in part, but they are not doing only that; they are also doing something fundamentally different, a kind of creativity that is part of what it means to be alive. One consequence of this is that only human beings (or beings of similar intelligence) could create the original training data on which conversational LLMs (as opposed to image-generating ones, etc.) are trained.

This “organic creativity,” as we may call it, has been essential to invention. Without it, knowledge and culture would not have advanced. The history of human knowledge is obviously not just an extrapolation of previous inputs. So, indeed, we might perfect variations on current inputs, and that is certainly a kind of progress. But it requires organic judgment and creativity to extend knowledge in previously unforeseen ways.

Now, encyclopedia articles—to get back to the original topic—rarely “extend knowledge in previously unforeseen ways.” So do they really require human intelligence?

The thing is, encyclopedias have two functions. One is a thing at which LLMs excel: they provide quick, easy answers, systematically organized. But this is not enough: “easy answers”? What if they are wrong? So the other function of encyclopedias is deeper and more difficult, but no less essential: they make judgments as to what the facts are, even if they are facts about differing opinions. LLMs cannot do this. It might look like they do, simply because they make assertions; but again, they are merely reflections of their training data. They are originating no judgments. They are not, properly speaking, “entities” or “individuals” at all, but only functions. The entities behind them are the corporations running the LLMs—or perhaps their programmers or their boards of directors, or whoever decides on the training data and key development decisions. (There’s another philosophical question, but never mind.)

You might say that a neutral encyclopedia doesn’t make judgments. Doesn’t Wikipedia, for example, take a “neutral point of view”? In that case, maybe an LLM could do what it does. Actually, I agree that an LLM could appear to do what Wikipedia does; that’s not really the point. By the way, Wikipedia is no longer devoted to neutrality, whatever it says. It would be more honest to say that it is expressing the Establishment point of view, or whatever its “reliable sources” say, which, on political subjects, are mostly left-wing. The point is this: Even if Wikipedia were still beautifully executing a robust neutrality policy, it would be making judgments about what is true. The truly neutral statements would presumably, by its neutrality, be easily endorsed by most of us; but, without any training data, an LLM could not join in the endorsement. This is a basic but extremely important point.

Now, someone—who was not quite getting the point—might reply as follows. Aren’t LLMs capable of summing up things in generalities, without the inputs of broad, encyclopedic statements? Can’t they “draw conclusions,” of a sort, from specialized research? This is worth addressing, as it will strengthen the argument.

It is true that LLMs are in principle capable of synthesizing specialized research, as in general, encyclopedic articles; but as to their present state, I sometimes I have my doubts. I mean that when I discuss specialized questions with an LLM, questions that I know something about that extend beyond what can be found in textbooks and encyclopedia articles, I frequently discover that the LLM is helpless to synthesize insights beyond the simplest applications. It chirps appreciatively in response to my insights, which are nowhere to be found in its training data, and can even make useful reactions, but that is very different from giving me the thoughtful, frequently creative and additive, judgment of a specialist. For that, I must still consult a specialist.

The point may be made another way: I not infrequently ask an LLM a general question and get a rather biased conventional answer. Then I dive into the details, extracting concessions that contradict the biased answer. I strongly suspect that its general statements reflect equally broad statements in the training data, while more specialized statements reflect narrower statements in the training data. It seems the process of machine learning does not typically work out the contradictions. The point is that, for now, at least, LLMs do seem to depend on summaries of knowledge at all levels of generality.

But maybe LLMs will improve in this regard. Maybe they will be able to make reliably sophisticated “judgments” on specialized encyclopedic questions based on specialized journal articles, say, without having seen any specialized encyclopedias. Maybe.

Let me come to the more essential point, however: Drawing conclusions about what is generally believed requires human judgment. This would apply just as well to synthesizing conclusions from fine-grained research; as in a literature review article, such conclusions are often just as “organic” and “creative” as the original research in the first place. And this is not because LLMs are incapable of mimicking such intellectual work; perhaps indeed they soon will be. It is because human judgment is the very standard by which LLMs themselves would be judged. LLMs that merely asymptotically approach the best training data are not and never will be the yardsticks of truth; only true intelligence can be.

Consequently, we should not trust LLMs unless we find their statements mirror broad statements by some dve sources—nor should we. LLMs act as black boxes, which their designers, especially if they are closed source, can fiddle with in all sorts of ways. We are not distrusting the LLMs, which, again, are not entities but mere functions. We are distrusting their owners and designers; if I find bias in an LLM, I blame its managers, because the LLM is a literally soulless process, incapable of anything like judgment. It merely reflects its programming, as the simplest logic gate does.

By the way, this is why the Knowledge Standards Foundation (the 501(c)(3) I’m working on now) continues its commitment to developing an AI front end to the Encyclosphere. We have done research and testing on this but are far from a product. Our vision is that you ought to be able to ask an LLM a question and get not just a black-box generated answer, but by default get direct quotations from—not merely vague references to—authoritative, encyclopedic sources. The KSF has gathered quite a few of those authoritative sources, and we (continue to) invite developers to use our software to gather even more.

We will always need encyclopedias because we will need human judgment to be summarized on questions of every level of generality. Even if, in the future, our main way of accessing those statements is via LLMs, the statements themselves must be created by organic entities that express organic judgments. We should not stop depending on human specialists to act as the ultimately reliable sources on questions in their areas of specialty. And encyclopedias (especially specialized encyclopedias) will continue to be valuable tools for summing up what they have learned.

Larry Sanger

“Internet Knowledge Organizer.” Currently President of the Knowledge Standards Foundation, started Wikipedia and various other educational and reference sites. Ph.D., Philosophy, Ohio State, 2000. [read more]

Posted

January 29, 2025

Knowledge, Tech and Coding, KSF

Comments

Please do dive in (politely). I want your reactions!

5 responses to “Why Encyclopedias Are Still Important”

Gail Hughes

February 6, 2025

I had a six hour drive today to think about Big Bang profitability. It’s really, really hard to see the ambient culture without reacting to it unconsciously. Imagine questioning the court of an absolute monarch on factual proof of divine right. The nervous system thunders. For Profit science is a taboo.

•The Newtonian Clockwork Universe had already been accepted and with Big Bang, an origin myth brought it full circle. Absolutely everything could be known with scientific certainty.
Full Scientific Myth Allows:
1) Mortals can play the part of ancient heroes and demigods. People have always idolized leaders. Tesla is a readily evident case. We’re all to be associated with the man who gets to Mars first! Mythic beings inspire blind followers. PC ideology is an unconscious call to blind following. More and more power will be granted. More money will be needed.

2)Betterment Marketing, science is intellectually superior—you believe, they know because they have the power to enforce. Traditional culture was easy to cast as inhibited, chauvinistic and cruel. Problems that barely existed exploded. The Ten Commandments are more of a tried and true formula for happy living than trouble making religious dogma. The swap for the Seven Deadly Sins as a code of ethics has returned more misery than the absolutely everything science can ever possibly solve.

3) No Judgment, so why not live dangerously? Living for the moment without expectations of an accounting encourages irresponsibility. Someone addicted to drugs or sex has little fidelity to principle. Fidelity to principles is the key to happiness. Knowing you’ve betrayed, wronged and slouched is depressing.

4) No After life, so the fear of death has no ideological attenuation. The young can be indebted by the vote of the old.

•The Federal Reserve, not to put too fine a point on it, has more or less merged the banks with the government. Blind following is necessarily required. Ninety percent of US wealth has levitated to ten percent over the long term and the myth that we’ll all get rich lives on. The idea that scientific myth grew up around the wealth grab is far outside the Overton Window. The thought of questioning a government court on proof of intent makes the nervous system thunder.

Reply
Gail Hughes

January 30, 2025

If AI is “helpless to synthesize insights beyond the simplest applications “ could it still be useful in moderating cultural bias? Because it is almost impossible to gauge how much ambient culture impairs judgment and public schooling has amplified core biases the problem has accelerated into the major source of dysfunction.

For instance: in the public mind Big Bang Theory has been proven beyond all doubt. It has actually been broadly discredited.

In “Smart Until It’s Dumb, Emmanuel Maggiori writes, “Quantum theory is at odds with general relativity.” For general relativity to hold up 95% of the “stuff” in the universe would have to consist of dark matter and energy. “ Either general relativity is wrong or current physics only understands 5% of the stuff out there. Neither option is flattering for physics.” (P. 160)

There are countless examples of well funded science bias defeating objectivity. An explanation of how physical stuff rises from nothing is still missing. It would seem like AI holds potential to referee but maybe that’s wishful thinking?

Reply
1. Larry Sanger
  
  January 31, 2025
  
  Hi, Gail, thanks for the response.
  
  Yes, AI is useful for all sorts of purposes, including various ways to make texts more neutral or less biased. I’m presently working on an AI-enabled tool to do just that, in fact.
  
  Every culture, ever, imparts various “biases.” The biases of a culture are arguably the very core of the identity of the culture.
  
  The Big Bang Theory remains the dominant scientific theory of the origin (or first moments) of the universe. There are alternate models, but that’s always been true.
  
  There is, for profound philosophical reasons that science simply cannot speak to, no scientific explanation in the offing of “how physical stuff rises from nothing.” I say the sovereign Lord God, who exists independently of the spacetime continuum, created it and continues to sustain it in existence from moment to moment. But I get the sense that for your own bias, if science does not speak of a thing, it doesn’t exist at all.
  
  I do agree that a particularly well-designed AI system could, in some respects, referee debates and papers rather better than most human beings.
  
  Reply
  1. Gail Hughes
    
    February 4, 2025
    
    I apologize, I have miscommunicated my meaning. From the dawn of time the fact that life as we know it has no explanation within our grasp has been accepted as proof of a higher intelligence. The proposition that science has solved the origin question is now accepted as fact when this has been disproven by physics, anthropology and etc. The bias question is then mistaken. The range of individual choice has narrowed on the false supposition that “for profit” science doesn’t survive peer scrutiny when it is obvious that it does. My opinion that big bang science is the apex of “for profit” science doesn’t give me relief from much of its dictates and consequences. Objectivity is in question and an unbiased referee is badly needed. Does this make sense to you?
    
    Reply
    1. Larry Sanger
      
      February 4, 2025
      
      It makes sense, as far as it goes. But the Big Bang theory has been around since the late 1920s, which I imagine is before science was quite so much “for profit.” Besides, what is the profit proposition here? How does it make anybody more money that they believe the Big Bang theory?

Why Encyclopedias Are Still Important

Comments

Please do dive in (politely). I want your reactions!

5 responses to “Why Encyclopedias Are Still Important”

Leave a Reply Cancel reply