Hey Alexa, What Is Prose?
Many of our densest, most reliable troves of knowledge, from Wikipedia to (ahem) the pages of WIRED, are encoded in an ancient technology largely opaque to machines—prose. That’s not a problem when you Google a question. Search engines don’t need to read; they find the most relevant web pages using patterns of links. But when you ask Google Assistant or one of its sistren for a celebrity’s date of birth or the location of a famous battle, it has to go find the answer. Yet no machine can easily or quickly skim meaning from the internet’s tangle of predicates, complements, sentences, and paragraphs. It requires a guide.
Wikidata, an obscure sister project to Wikipedia, aims to (eventually) represent everything in the universe in a way computers can understand. Maintained by an army of volunteers, the database has come to serve an essential yet mostly unheralded purpose as AI and voice recognition expand to every corner of digital life. “Language depends on knowing a lot of common sense, which computers don’t have access to,” says Denny Vrandečić, who founded Wikidata in 2012. A programmer and regular Wikipedia editor, Vrandečić saw the need for a place where humans and bots could share knowledge on more equal terms.
Virtual assistants do their jobs better because of Wikidata. Their corporate creators scrape the data and combine it with other sources—though exactly how they use the information, or to what extent, hasn’t been made public. Siri sometimes cites the database as a source, but Apple declined to discuss its use of Wikidata. So did Amazon, but the company did publish a paper last year on how Wikidata taught Alexa to recognize the pronunciation of song titles in different languages.
That the voice-enabled avatars of the world’s most sophisticated tech companies rely on a collective of unpaid enthusiasts is a reminder that AI is more limited than we are often led to believe. Wikidata is incomplete and messy. A quarter of items lack references. There are many errors, one of which caused Siri to spookily foretell, by four months, the death of 95-year-old comic book legend Stan Lee last year. Apple and others use Wikidata anyway, because our dumb algorithms so desperately need help comprehending the world.”
The backbone of voice assistants, a multi-billion dollar industry, is “an army of volunteers.”