The Return of the Coders

Summary:

Internet centralization and corporate dominance have increasingly stifled digital freedom. The Encyclosphere project stands solidly against this trend. In this blog post, I invite developers to make this vision a reality, giving them many ideas on how to help.

Spearheaded by the Knowledge Standards Foundation, we aim to unify the world’s encyclopedic knowledge under an open standard—the ZWI file format. Unlike the fragmented landscape of social media, chat apps, and other digital platforms plagued by competing standards, the Encyclosphere endeavors to consolidate free encyclopedias into an accessible, decentralized network. By doing so, it challenges the status quo, offering a robust model of an alternative to the monopolistic tendencies of Big Tech and the censorship it often wields.

The Encyclosphere’s mission is ambitious yet clear: to collect all the encyclopedias, with shared copies constantly updated and freely available to all in archive format, supported by free search engines and readers, and more.

For developers, the Encyclosphere project offers concrete opportunities to shape open access to knowledge. Expanding the Encyclosphere’s content by integrating more encyclopedias directly addresses our central mission of collecting all the encyclopedias. Developers can engage in projects such as hosting aggregators, building AI front-ends, or creating a decentralized rating system for articles. There are plenty of other ideas for ways to get involved: see below.

The empire likes all your competing standards…

This blog post is mostly about how developers can contribute to the Encyclosphere project. But we begin with the more basic question: why?

Friends, geeks, lend me your ears. There are problems common to social media, videos, chat, search engines, file sharing, encyclopedias, and now LLMs. It does not matter what the type of software and content. Lovers of digital freedom are by now keenly aware of the problems. It’s a one-two punch: on the one hand, there is the centralized, abusive power of platforms; on the other, there is the billions spent by corporate giants, which make it hard for free-and-open (libre, gratis, and autonome) software to compete.

We know what the solutions are, too. We need to write software that follows free-and-open standards and protocols. We need to share free-and-open content and data. There should be multiple good clients to view the content, and multiple fair and fast aggregators to collect and serve up the content. Also, content must be digitally signed, so we can be sure of where it came from, and with self-owned identities, so that no giant corporation or government becomes our digital owners.

So if we know the solutions, why isn’t there awesome software in each vertical? Why are we still being censored, manipulated, and spied-on by all the giant Big Tech platforms?

Under-funding does explain a lot of it: as with most open source software,1 the big guys for the most part simply out-compete the little guys. It’s always been that way, for most categories of software. But in the last ten years, this problem has started to really matter. Big Tech has been turning the screws.

But that’s not the only reason for Big Tech dominance. You, as a lover of digital freedom, might be ignoring another key part of the explanation: the proliferation of open standards.

What do I mean? For social media and almost all other verticals, the key solvable problem that the rebel cause faces is the proliferation of competing standards. We are dividing our labor. By doing so, we open source geeks are undermining the cause of freedom; we too often forget that there is strength in unity.

Isn’t this obvious? An excellent method for a dominant corporation to ensure that no open source project has a chance in a vertical is by funding a whole bunch of small competitors, which develop many competing standards. The decentralized social media movement, for example, could have built atop RSS, but nooooo. They had to come up with a new standard. Then another. Then another. So now there are a bunch of decentralized social media standards, and not one of them has a prayer of gaining the critical mass needed to compete with Facebook, Twitter, and Instagram.

There is an exception: encyclopedias.

There is only one common standard for encyclopedia articles on offer, as far as I know: the ZWI file format. And there is only one organization, to my knowledge, that is attempting to join all the free encyclopedias together into a open network, by making all their content available via a common standard: the Knowledge Standards Foundation. After started in 2019, we’ve made a lot of progress. We call the network “the Encyclosphere”;2 the best way to view it is by search engines like EncycloReader and EncycloSearch. These are run by different developers running different software (PHP and Java; the latter is migrating to JavaScript). They serve as aggregators of encyclopedia articles.

Some people read that and think, “I mean, an open encyclopedia network sounds nice. But does it really help much? You’re up against Wikipedia. How does collecting all the free encyclopedias together in any way to help fight back against Wikipedia?”

Hey, that’s a good question. Geeks are better able to understand the answer than most—and to do something to help.

What is the goal?

The Encyclosphere project does not aim to collect just a few good encyclopedias. It has several aims:

  • Collect all of the encyclopedias. The whole lower-case encyclosphere, i.e., the set of all encyclopedias and encyclopedic writing, is our aim. This means both free and proprietary reference works, and from a wide variety of points of view and languages; but, obviously, not all encyclopedias are licensed so that their content can be displayed publicly. At least we will have their metadata.
  • Keep collections updated constantly. Eventually, have all versions (i.e., the whole version history) of all articles.
  • The whole archive is free to all in many copies. Imagine many independent (and diverse, but interoperable) collections, everywhere. Some aim to be quite exhaustive, while others specialize. This ensures the world’s knowledge can’t be censored by blocking just one domain.
  • Make it easy to add articles to the network. We’re committed to letting bloggers, small wiki owners, and others to add new articles and whole collections to the Encyclosphere directly themselves, giving them the opportunity to see their work instantly pop to the top of a search.
  • Create an aggregator network. It should be genuinely decentralized (centerless) and leaderless (like the Blogosphere…and the Internet, come to think of it). Articles and whole collections will, in time, be shared across a network of easily-installed aggregators quickly and automatically, without central gatekeepers.

Imagine all that. The Knowledge Standards Foundation has made a credible start on this, but it’s an enormous task, and we’ve got a lot left to do. Can you help?

What would a complete Encyclosphere make possible?

Imagine the database we’re building were complete, and the tools we’re developing were 100% finished and a delight to use. What could developers, and the rest of the world, do with this content and these tools? What cool new stuff would it make possible?

A massive, unbiased, uncensored database of knowledge. Knowledge would be much harder to censor. A distributed digital repository would represent the full set of views of humanity from across the political and religious landscape, from countries around the world. The mere fact that the collection would exist in multiple independent copies around the world, a permanent digital Reference Library of Alexandria, would give hope to a humanity increasingly staring down the barrel of censorship.

The search engine/readers would be awesome. EncycloReader and EncycloSearch are already useful. But imagine not just a few dozen encyclopedias. Imagine thousands. It’s all here. If you’re looking for info on some obscure topic in molecular biology, ancient philosophy, or AI, and if there’s any general (free) written explanation of it, the search engine will find it. And the data and the search engine software would all be open, so you would not be limited to just one search platform. What a fantastic alternative to traditional search engines this would be.

Imagine an AI front end for the whole encyclosphere. What if you made an LLM app which took questions, looked for specific answers through the entire possible encyclosphere corpus (all the free general explanations of things to be found online), and returned human-written answers, together with links to the places in the text where the answers can be found (for context)? The Encyclosphere project will make that possible. This could fix a big problem with AI systems like ChatGPT, which have proven to be biased. We can’t avoid the fact that people will inevitably gravitate to chatbots because the specific and targeted answers they provide will be so convenient. This is how Wikipedia became so dominant: it offered easy answers. Chatbots will make getting answers even easier, so they can be expected to replace Wikipedia, as I have argued. We need chatbots to serve as “information concierges” or “majordomos” for texts written by actual human beings. I’m pretty sure this is going to be a major type of AI application going forward, and when such an application is applied to the Encyclosphere, you’ll see the enormous usefulness of the database. Help us build it!

High-quality articles will beat Wikipedia articles. With Encyclosphere search engines, it is easy to get a good article to appear in search results above the lame or biased Wikipedia dreck. We just need the articles! Adding new articles to the open network could be one-click easy. Different search engines could then offer different ways to rank results. Some would cause an objectively better article to pop to the top of the search on the same day it was published. Presumably, as Encyclosphere search engines grow in popularity, experts will see a point in contributing to the world’s knowledge by simply putting their work online. That’s how the Internet should work. I imagine a certain scenario—wouldn’t this be great? A publisher approaches an SEO expert, asking, “How do I get to the top of Encyclosphere search?” We want the SEO experts to throw up their hands and say, in despair, “The Encyclosphere? Well, if you want to get to the top of those search engines, just write a better article. There’s no other way.”

By the way, if you want to make Wikipedia articles more neutral, that is the mission of Justapedia, which I can recommend. Another project worthy of your time is the Citizendium, which is still taking articles 18 years after I started it.

An article rating system would become possible. Once all the content is in place and the network is shown to be robust and truly decentralized, why not leverage the network to invite public reviews of articles? If the reviews were provably owned by the reviewer and not controlled (and thus de facto owned) by some platform, this idea could have legs.3

How can developers get started?

Here are some of the more interesting ways for developers to get started, from simpler to more complex.

1. Get some background

If you’re a developer, some good first stops would be to look at our projects, documentation, and Gitlab repos. Our videos might also help, especially this technical introduction (although it is long, and a year old).

To get oriented with the code, here are some specific ways to “kick the tires.”

2. Parse search results

Here’s a sample command for downloading some JSON-formatted search results from EncycloSearch:

> curl https://encyclosearch.org/encyclosphere/search?q=knowledge

It’s an API, and so of course you can build stuff on it; please let us know if you do. I grabbed the JSON output of the above command, then I had GPT-4 write a Ruby script, which displays the JSON nicely in an HTML page, just to say I could do it. Worked well.4

3. Download a ZWI file

If you do build anything on our APIs, you might have reason to download individual ZWI files from time to time. Here’s how you might do that with EncycloSearch:

> wget https://encyclosearch.org/encyclosphere/database/en/citizendium/citizendium.org/wiki%23Knowledge.zwi

Note that the bit to add after database/ in the URL can be predicted from the original source URL. The source URL itself can be found in the ZWI file, in metadata.json, in the SourceURL field.

4. Install your very own aggregator

If you have Linux installed somewhere, and if you want to fetch some or all of the Encyclosphere network data, you can do this right on your home computer (I did). You’d install the aggregator software, ZWINetwork, and then use the EncycloReader aggregator’s API to get all articles from, for example, Conservapedia. This is just an example. There are many others, and if you use another, such as Citizendium or Encyclopedia Mythica, substitute the appropriate publisher code, such as citizendium and encyclopediamythica. Here is a JSON file with publisher info. If you need to show and index the files, you would need ZWINode.

Here is an example for Ubuntu showing how to get all such articles. First, grab the ZWINetwork repo:

> git clone https://gitlab.com/ks_found/ZWINetwork.git

CD into the directory you just made:

> cd ZWINetwork/

Next, source (i.e., use the source command to run the shell script) zwi_network.sh:

> source ./zwi_network.sh

Finally, use the newly-installed zwi_get command to download Conservapedia (it’ll take a while depending on your connection speed):

> zwi_get -q -o ZWI -p conservapedia -i https://encycloreader.org/db/ZWI/

To view the files themselves, go here (depends on where precisely you installed the root directory):

> cd ZWI/en/conservapedia/www.conservapedia.com

However, this aggregator software does not have an article viewer built in. For that, see the next item.

5. Install another aggregator, with search engine and reader built in

The Knowledge Standards Foundation offers two different search engine/readers for the encyclosphere. Installing either of these is a more advanced project, involving not only installing the software (and its dependencies), but also fetching data and setting up and running a server.

ZWINode aims to make this as easy as possible. It is written by Dr. Sergei Chekanov, CERN physicist and programmer, who also developed EncycloReader. Several people have installed it and report that it is easy to install. I myself had little trouble installing it, until it came time to install PHP and dependencies. This is a process familiar to many who use the command line to set up development environments, but it can be tricky.

EncycloEngine is another option, and while it is very powerful, it is (at present) even harder to install. Not only do you have to install Java and dependencies, you’ll need to edit some configuration files and then change the appearance so it does not look like a clone of EncycloSearch. Developer Henry Sanger plans to make it easier in the not-too-distant future.

In terms of functionality, both of these aggregators have fast search engines and article readers. It is not too difficult to add new collections of encyclopedias (downloaded from EncycloReader and/or EncycloSearch, or another aggregator). Sergei is (as of this writing) working on article editing functionality for ZWINode, so that it can be used to host an encyclopedia, thus going head-to-head with wiki software like MediaWiki. Henry has plans to do something similar. Unlike MediaWiki, such article editors will offer the ability to digitally sign articles out of the box, and to share the articles automatically with the broader encyclosphere.

How developers can help

The Encyclosphere project has many moving parts. There is much opportunity for incremental improvement in many areas. For more ambitious ideas, see the section above titled “What would a complete Encyclosphere make possible?” Some lower-hanging fruit, as of December 2023, is the following.

6. Add a bunch more encyclopedias

Think of what we are doing as being similar to Project Gutenberg, but for free encyclopedias (and metadata about proprietary encyclopedias). Once upon a time, there were a lot of free, public domain books, but they weren’t online. In the same way, right now, there are a lot of free encyclopedias, and they’re online, but there is no easy way to search them all at the same time. Nor is there any way to archive them all in one conveniently usable place, so they are not lost in the mists of time. And in this day and age, in which the usual suspects are crying “Disinformation!” and seeking new ways to censor and silence unapproved information, such search and archiving of our knowledge is increasingly important. The advent of AI makes this even more important.

If you’re a developer and you are reasonably comfy with CSS and the command line, we could use your help. You’d be making our Encyclosphere collection of encyclopedias more complete, so that this shared, distributed database grows to include the entire lowercase-e encyclosphere in all its glory.

You’ll be collecting all the encyclopedic knowledge in a digitally standardized format for the entire world to use. This is hugely ambitious, which must be why nobody has done it before. But it’s so important!

By one count, our developers have indexed at least 2.5 million articles, mostly in English but also in a few other languages (restricted only to non-English Wikipedias, however). EncycloSearch covers some 66 encyclopedias, although not all are completely represented yet. This represents a small fraction of all the encyclopedias we would like to collect. We want hundreds! Thousands, actually! They’re out there, but it takes some serious work to do a good job of crawling them (and putting them in the ZWI file format). It’s something that has to be done more or less one encyclopedia at a time.

If you want to get serious about helping, there are two ways.

(1) You can use EncycloCrawler. Installation requires having a Java development environment set up. After installing EncycloCrawler, your job would basically be to set up some configuration files (one for each encyclopedia); set up a DID:PSQR identity (we can help with this; the author of this standard is on our Board), so that the ZWI (zipped encyclopedia article) files can be properly signed; then run the (crawler) software, which is appropriately throttled, so you needn’t worry about slamming people’s servers; and then upload the files to EncycloSearch or EncycloReader (see the docs). Alternatively, Henry (who wrote EncycloCrawler) can take your config files you prepare and do the actual crawling himself (that’s the easy part).

(2) You can make a better crawler—maybe one that runs on a reasonably beefy internet server (like ours), but which can be controlled via a browser extension. That would be great. My dream is to be able to stumble upon a great article in my web exploration, or a great collection, and then just press a button or two, fill out a short form perhaps, and voila—the article or collection appears in a queue (somewhere) ready to be confirmed for inclusion in an aggregator. Then the aggregator can press a few more buttons, maybe make some edits to such things as the name and license of the collection, and then the articles are added to the aggregator. From there they will make their way to other aggregators. At this point EncycloReader and EncycloSearch have only partly overlapping collections. Other aggregators are under development and less complete.

We would be happy to work with you on this in various ways. Get in touch via our Mattermost instance.

7. Help with user testing

EncycloReader and EncycloSearch could use some serious user testing. There are occasional (sometimes obvious, sometimes subtle) problems with both the data and the UX. To fix these problems, we need them documented carefully.

If you are a tester, or a junior developer who is interested in getting experience with testing, we could use your help. Just go to different pages and note every problem you see, systematically testing everything and writing all issues down. When I do user testing, I generally try to separate design and UX issues from more substantive issues. One general type of issue that needs special attention is poor data, in terms of article titles and descriptions and the inclusion of various kinds of non-articles in the database. Another important category of feedback is in search engine design; articles sometimes poorly ranked, and specific instances of obviously bad ranking is very helpful.

8. Make an active aggregator

An Encyclosphere aggregator is simply a database-driven server that collects encyclopedias, with articles zipped up in ZWI format. An aggregator can add new ZWI files (see above, “Add a bunch more encyclopedias”) or simply copy what is available from other aggregators; as of this writing, originators of ZWI collections include EncycloReader, EncycloSearch, Oldpedia, and most recently, DARA’s excellent and ongoing collection of Wikipedia’s deleted articles. There are instructions above (see 4 and 5) on how to do the installation. What I am adding here is that there is an ongoing need not just for people to install the software, but to keep it and the article collections up to date—in other words, to be a full participant in the Encyclosphere P2P network of aggregators. We would be happy to point people to an independent aggregator, search engine, and reader website in addition to EncycloReader and EncycloSearch, but at present those are head and shoulders over other Encyclosphere resources.

9. Build an AI front-end

An AI front-end for the Encyclosphere? What would that mean? It could mean a couple of different things. Basically, it means you’d be able to ask a chatbot a question, and the answers would come from the Encyclosphere. Here are a couple of implementations of this concept. I strongly prefer the second.

First, you could train the Chatbot using Encyclosphere data. So it would be an ordinary chatbot, which simply answers your questions as GPT does (for example). The problem is that the user still won’t know where the answer comes from or whether the LLM isn’t actually hallucinating.

Second, the preferable way happens in two steps. (a) The user asks a question, and the LLM decides what titles of encyclopedia articles are most likely to contain answers. It sends the answer off to the EncycloReader or EncycloSearch API (both are plenty fast). (b) Armed with the API’s response, the LLM decides which articles to fetch. It opens up the ZWI files, looking for answers. If it finds any, it caches versions of the articles with anchors to the place where the answer can be found. Then it quotes the answers, together with links to the original source.

Two of us at the KSF (a volunteer and yours truly) have independently demonstrated that this is in principle possible.

10. Create a decentralized rating system

This is perhaps the biggest challenge.

When I tell people about what we’re doing with the Encyclosphere project, one problem that people often raise is that, if there are multiple articles per topic, how is the user supposed to decide which one to trust? This isn’t an easy question to answer. Since we’re talking about general knowledge, the question is tantamount to asking, “Given a bunch of competing, incompatible answers to questions, how is an ignorant person supposed to decide which is the truth?” Putting on my philosophers’ hat, I would say the short answer is, “Get an education, preferably a solid liberal arts education.”5

One technical solution I propose is a decentralized system for rating encyclopedia articles. This would require both a rating standard (an RSS extension; I can point you to a draft I developed); an easy-to-use software that people could install that would enable people to submit digitally signed ratings to a network (the KSF can support it); and software for aggregating ratings from multiple sources. One notion is that we allow people to choose which raters, or collections of raters, to trust; then Encyclosphere search engines could rank and re-rank articles based on ratings data.

Simply getting people to use this software to submit ratings could prove to be the biggest challenge. But if it could be used as a general content rating system, permitting ratings and open review of not just encyclopedia articles, but also books, movies, and academic papers, and if the data were truly decentralized, and digitally signed, that might be enough for a lot of us to use it.

Conclusion

The Encyclosphere is ambitious; it’s not a little project. But it is a labor of love on which the KSF Board entirely refuses ever to sell out. As a good old-fashioned open source software project, it’s not likely to make anybody a lot of money anytime soon. But what it will do is archive a wide variety of knowledge sources, as free of censorship as the Internet as a whole is, making it easier to use a wider variety of encyclopedias than the big search engines permit. It will also make it easier for everyone—not just those that play in Wikipedia’s dysfunctional system—to contribute to the world’s encyclopedic knowledge on every topic under the sun—and not just those that Wikipedia says are “notable” enough.

Developers, this is definitely worth your time!

To learn more:

  1. But not operating systems, browsers, and web server software. Even if it still hasn’t been installed that much, Linux now legitimately shines and is simply better than Windows or Mac, period. (One reason Linux isn’t used more is that there are some residual situations in which comfort with the command line is still an absolute requirement.) And Brave and Firefox are both very usable.[]
  2. Named after the blogosphere, i.e., the collection of all blogs. Note that the blogging standard, RSS, is also open. Its openness made it impossible for any one platform to become the platform for blogging. Many tried and none succeeded.[]
  3. I developed the idea in this 2018 article; you’ll have to ignore the stuff about Everipedia, which got out of the encyclopedia business. I have also proposed standards for such reviews in the KSF Mattermost group.[]
  4. Here’s the prompt: “Write me a Ruby script that will take that API output (the JSON file I just uploaded to you) and display it in HTML format. Output an HTML file I can copy.”[]
  5. Starting from a position of ignorance, deciding whom to believe requires here a great deal of educated judgment. It is a liberal arts education that aims to improve that judgment. “Logic” or “critical thinking” helps, sure, but without the content and practice conferred by a good old-fashioned education, you’re up the creek without a paddle.[]

by

Posted

in

, ,

Comments

Please do dive in (politely). I want your reactions!

2 responses to “The Return of the Coders”

  1. This is really good stuff and I encourage you to soldier on, despite how horribly thankless these things are.
    I am working on stuff similarly poised to distribute things and will eventually end up linking into your network. I very much hope that people can keep the faith until this blows up.

    1. Thanks, Bob! Yes, there are aspects of it that are a bit of a slog, and indeed, not many people understand the value and importance of what we’re doing. Developers get it faster, anyway.

Leave a Reply

Your email address will not be published. Required fields are marked *