2016年10月26日 星期三

為更多漢字編碼 China’s Digital Soft Power Play

為更多漢字編碼,中國的數字化軟實力

本月,中國政府計劃推出大約3000個中文字符的編碼,此舉屬於「中華字庫工程」的一部分。這個宏大的工程將把以前沒有電子形式的50萬個字符進行數字化。到目前為止,國際計算標準Unicode已經對80388個漢字進行了編碼。
該項目包含全國56個民族的10萬個字符,以及來自中國書面語料庫的另外10萬個生僻字和古文字。該項目動員了近30家公司、機構和大學,是有史以來規模最大的政府資助數字化項目。
這些字符長期囿居於蒙塵的古舊手稿上,它們將在數字媒體中獲得新生。擴展到網上之後,中國和世界各地的人可以更加方便地接觸這些文稿,這將有助於中國語言和文化的傳播。
全球信息架構以使用西方字母表為主,給中國造成了一些困難。現代通信領域的重大創新——莫爾斯電碼、打字機和ASCII(美國信息交換標準代碼)編碼標準——無一考慮到了中文字的使用。
幾十年來,中國科學家一直在努力打破字母媒介的壁壘。1974年,中國政府指示工程師和數學家尋找一種方式,來使用美國的字母鍵盤。最終他們配置了數千個擊鍵組合,以便在計算機的標準鍵盤上鍵入數以萬計的字符。
長期以來,中國人一直覺得他們在書面語言上具有優越性。北京政府認為,當前Unicode中編碼字符的數量不足以代表中國古代文化的豐富性。通過字庫工程,中國人將解鎖他們的文稿寶庫,從古代的甲骨文到少數民族語言文字,都將進行數字化。
通過孔子學院等方式在世界各地傳播中國語言和文化,是北京過去十年提升軟實力戰略的組成部分。字庫工程將把這個使命帶入數字領域。
從學術論文到Twitter消息的任何內容,只要能被人看到,就會有助於擴大中文的覆蓋面。隨著越來越多的中文進入網路空間,就會有更多的人開始使用它,其地位也將隨著可見度的增加而上升。
這個數字化項目也可以為很多中國人解決一個大難題,他們對中文數字化的不完善感到不滿意。
去年,中國一家媒體報導了一個10歲男孩的故事。他有一個寓意吉祥的名字,使用了一個由「龍」和「天」組成的生僻字。校方在計算機系統中找不到這個字符,當他通過了一個重要考試後,他的姓名在證書上卻只剩下一個普通而平淡的字——「皓」,意思是「白色」。他不能充分證明自己通過了考試,這讓他的父親很不滿。
還有很多其他影響更嚴重的例子:一些人因為身份證件上無法顯示正確的姓名而無法使用醫保或取錢。過去,人們可以通過手動填寫生僻字來解決這個問題。如今,如果姓名沒有正確的電子形式,這個名字可能也就不存在了。
像這樣的案例實在太多,以至於中國在本世紀初開始指定哪些字可用於起名。當局規定,超出指定的那1605個字的姓名必須改名。新增加的這些文字將在不限制家長的起名權的情況下解決這些令人頭痛的問題。
儘管擴大中文在數字世界的版圖好處很多,但依然有理由保持警惕。從項目發言人的表述看,負責該項目的機構,同時也在負責審查與信息交流的控制,其目的是重塑互聯網上以西方為主導的數字內容。為避免政府審查而使用生僻字表達隱秘或玩笑意思的網民,可能會發現可用的詞越來越少。
近年來,隨著官方的網路監視機構規模擴大,網民們找到了通過雙關語、使用變體或古文字以及台灣等地區研發的非標準化電子字體攻擊政府的途徑。字庫工程將實現語言的標準化,並且隨著用於保密的文字進入官方數據庫,顛覆性語言將更容易檢測。新近被數字化的文字將幫助中國更好地追蹤民眾的動向、財務狀況以及在公開場合和私底下的言論。
但該項目的作用遠不僅限於此。把最大的詞彙表放到網上被稱作「借船出海」,這是一項利用他國的網路、基礎設施和資源讓中國的議程走向全球的戰略。增加50萬個文字或許不是耶穌會會士所祈禱的,但它標誌著一個仍處於崛起之勢的國家有了一種新的「巧實力」形式。
本文作者為耶魯大學教授石靜遠。她正在寫一本書,介紹中國如何把中文變成一項全球技術。
翻譯:紐約時報中文網

China’s Digital Soft Power Play

NEW HAVEN — Looking at Chinese script, you might empathize with the words of an 18th-century Jesuit missionary: “One can only endure the pain of learning it for the love of God.” The piety may be gone, but the Chinese have heard this kind of complaint for over four centuries and are finally doing something about it.
This month, the Chinese government plans to introduce codes for some 3,000 Chinese characters as part of a grand project, known as the China Font Bank, to digitize 500,000 characters previously unavailable in electronic form. Until now, only 80,388 characters have been encoded in the international computing standard, Unicode.
The project highlights 100,000 characters from the country’s 56 ethnic minorities, and another 100,000 rare and ancient characters from China’s written corpus. Deploying almost 30 companies, institutions and universities, it’s the largest state-funded digitization project ever undertaken.
Characters that have long resided in the dusty pages of old manuscripts will come to life in the digital medium. The online expansion will give people in China and around the world more access to the script, thereby helping spread the Chinese language and culture.
China has struggled with the global information architecture that favors the Western alphabet. Not any of the significant innovations in modern communications — Morse Code, typewriters and the ASCII (American Standard Code for Information Interchange) encoding standard — were built with the Chinese script in mind.
Chinese scientists toiled for decades to break into the alphabetic media. In 1974, the government directed Chinese engineers and mathematicians to develop a way to piggyback on to the American alphabetic keyboard. Eventually, hundreds of keystrokes were reconfigured to allow tens of thousands of characters to be typed into a computer on the standard keyboard.
The Chinese have long believed in the superiority of their written language. Beijing thinks the current number of encoded characters in Unicode inadequately represents the richness of China’s cultural past. Through the Font Bank, the Chinese will unlock their written treasures, from oracle bone scripts to ancient writings in minority languages.
The spread of Chinese language and culture through Confucius Institutes and other efforts around the world has been part of Beijing’s soft-power strategy for the past decade. The Font Bank takes this mission into the digital realm.
Anything from scholarly papers to tweets will help extend the reach of Chinese through its sheer availability. As more of the language enters cyberspace, more people will use it, and its status will rise with its visibility.
The digitization project will also hit close to home for many Chinese people, who have been ill-served by the incomplete digitization of their language.
Last year a local Chinese media outlet reported the story of a 10-year-old boy whose auspicious name contained a rare character made up of “dragon” and “sky.” School authorities could not find the character in the computer system, and after he passed an important exam, the rare character was replaced with a common, less colorful one — meaning “white” — on his certificate. He was left with inadequate proof of his achievement, upsetting his father.
There are many other personal examples with graver consequences: Some people can’t access health insurance or their money because the correct character for their name cannot be displayed on identification papers. In the old days, one could get away with filling in a rarely used character by hand. Today, if your proper name doesn’t have an electronic form, it might as well not exist.
There were enough cases like this that in the early 2000s China began to designate the characters people could use in their names. Authorities mandated that any name outside of the 1,605 specified characters had to be changed. The newly available characters will solve these headaches without restricting parents’ naming rights.
With all the benefits of a richer digital presence for Chinese, there is reason to be wary. The same state agency that controls censorship and communications is overseeing the effort, whose aim, according to a spokesman for the project, is to reshape the digital content of the Western-dominated internet. Netizens who have been using obscure characters for secret or playful language to avoid government scrutiny can expect to have fewer words to hide behind.
As the state’s online monitoring apparatus has grown in recent years, netizens have found ways to take jabs at the government through wordplay, the use of mutated or ancient characters, and nonstandardized electronic scripts developed in places like Taiwan. The Font Bank project will standardize the language, and as the scripts for secret usage enter into an official database, subversive language will be more easily detected. The newly digitized characters will help China to better track people’s movements, finances, and public and private speech.
But the project will do so much more. Putting the largest vocabulary online has been described as “sailing out on a borrowed ship” — a strategy that makes use of other countries’ networks, infrastructure and resources to take China’s agenda global. Adding a half million more characters may not be what the Jesuits prayed for, but it marks a new form of smart power for a nation still on the rise.
Jing Tsu, a professor at Yale, is writing a book on how China has transformed the Chinese language into a global technology.

沒有留言: