There are emoji everywhere these days—but creating them doesn’t come cheap, and now, says the Unicode Consortium, it’s time to pay up.
Of course, the consortium puts it a little more nicely than that: It’s inviting people to sponsor a symbol for a year to help fund its work encoding languages that don’t yet have digital representations.
The Unicode developed and promoted by the consortium is composed of thousands of code points, each expressing a relation between a number and a symbol.
Those relations allow app developers, font developers and keyboard designers to agree that a given number stored in memory should appear as a given symbol—a particular emoji, say—on the screen, and that the symbol be the same regardless of which device the message is displayed on. If there wasn’t such agreement, then there’s a risk that when an Android user sent 💓, say, an iPhone user might receive 💩.
There’s more to Unicode than emoji. Its original purpose was to allow the encoding of the different scripts used to write the world’s major languages—the Latin alphabet used for English and the accented variations to write Western European languages; the Arabic, Cyrillic, Greek and Hebrew alphabets; Japanese syllabaries and the ideograms and other symbols used to write Chinese, Japanese and Korean.
“We currently support about 130 scripts or writing systems in Unicode, but there are something like another 150 that have yet to be encoded,” said Mark Davis, co-founder and president of the Unicode Consortium.
“With languages we fully support about 70, and partially support another 50, but there are very large number of spoken languages on the earth so we have a long way to go,” he said Wednesday.
Up to now, the scripts for those languages have largely been encoded by volunteers sitting at their desks. “Everything we do is supported by contributions of resources from volunteers, many of them from our member companies,” he said.
Other desk work has involved adding new emoji to Unicode to represent people of different ethnicities. Next on the list is addressing gender diversity.
“We are looking for some mechanisms to customize emoji, to address some of the remaining gender disparities,” said Davis. “For example, we have an emoji for a runner. The emoji is gender-neutral but when people put a colorful image to it, then they use one or other gender. We’re looking at a mechanism that would allow people to specify, this is a runner, and use a female person for it, or use a male person.”
The consortium recently added full coverage for the Belarussian language. “That’s an example of a language that has millions of speakers, but it’s on the cusp of being disadvantaged, it’s not well supported,” Davis said.
“There are many languages, especially in Asia and Africa that have very little support. There are some languages of South America that are spoken by large populations, millions or tens of millions, that are not well supported,” he said.
To address some of those languages that haven’t yet received its attention, the consortium will need to adopt a different funding model.
“These languages tend to be more isolated, so it’s harder to get information about them,” Davis said. People will need to travel to conduct research. “We need to pay people expenses to do that,” he said.
There’s more to it than just copying down the characters used to write a language and giving each a number. Two years might elapse from a research proposal to final report, as a volunteer learns how a language is structured, how it represents numbers, dates and times, and how it distinguishes singular from plural or one gender from another. That’s because the consortium doesn’t just describe the symbols used to write a language, but also provides resources for programmers to internationalize their applications—helping them use the correct decimal separators, for example, or putting day and month in the right order in dates.
“A lot of the time it’s people doing graduate research in linguistics that are interested in this. We help them prepare proposals for the research,” Davis said.
To fund such research, the Unicode Consortium is inviting people to sponsor one of the 120,000 or so Unicode symbols. For $100, Bronze-level sponsors will receive a certificate and have their name listed next to the symbol on the consortium’s website. For $1000, up to five Silver sponsors per symbol will receive an engraved thank-you gift and have their name listed on the website. Gold-level sponsors—just one per symbol—get all that and a hyperlink on the website.
There’ll be no platinum award level allowing businesses to add their corporate logo to Unicode, though. “That’s something we’ll definitely stay away from. We have a policy in place that we won’t do emoji for consumer brands,” said Davis.
According to the consortium’s website, IBM is one of the first gold-level sponsors. It’s listed as the sponsor of an emoji representing a cloud. Older readers may remember—and indeed may still be maintaining systems that have to deal with—the confusion that ensues when moving documents from systems using the ASCII character set, in which the numeric code for @ is 64, to those using EBCDIC, in which @ is stored as the number 124. IBM created EBCDIC at a time when much of the rest of the computer industry was agreeing on ASCII.
Another name on the list is Internet luminary Vint Cerf, as sponsor of an emoji representing the Vulcan hand gesture that accompanies the blessing “Live long and prosper” in the fictional Star Trek universe. One of Cerf’s pet projects is the creation of an interplanetary Internet to aid space exploration.
Davis himself sponsored the plain old ASCII comma.