You are here

قراءة كتاب The Internet and Languages

تنويه: تعرض هنا نبذة من اول ١٠ صفحات فقط من الكتاب الالكتروني، لقراءة الكتاب كاملا اضغط على الزر “اشتر الآن"

‏اللغة: English
The Internet and Languages

The Internet and Languages

تقييمك:
0
No votes yet
المؤلف:
دار النشر: Project Gutenberg
الصفحة رقم: 5

evolving and only just being incorporated into the latest software, this new coding system translates each character into 16 bytes. Whereas 8-byte extended ASCII could only handle a maximum of 256 characters, Unicode can handle over 65,000 unique characters and therefore potentially accommodate all of the world's writing systems on the computer. So now the tools are more or less in place. They are still not perfect, but at last we can at least surf the web in Chinese, Japanese, Korean, and numerous other languages that don't use the Western alphabet. As the internet spreads to parts of the world where English is rarely used - such as China, for example, it is natural that Chinese, and not English, will be the preferred choice for interacting with it. For the majority of the users in China, their mother tongue will be the only choice."

= Encoding in Project Gutenberg

Used since the beginning of computing, ASCII (American Standard Code for Information Interchange) is a 7-bit coded character set for information interchange in English. It was published in 1968 by ANSI (American National Standards Institute), with an update in 1977 and 1986. The 7-bit plain ASCII, also called Plain Vanilla ASCII, is a set of 128 characters with 95 printable unaccented characters (A-Z, a-z, numbers, punctuation and basic symbols), i.e. the ones that are available on the English/American keyboard. With the use of other European languages, extensions of ASCII (also called ISO-8859 or ISO- Latin) were created as sets of 256 characters to add accented characters as found in French, Spanish and German, for example ISO 8859-1 (ISO-Latin-1) for French.

Created by Michael Hart in July 1971, Project Gutenberg was the first information provider on the internet. Michael's purpose was to digitize as many literary texts as possible, and to offer them for free in a digital library open to anyone. Michael explained in August 1998: "We consider etext to be a new medium, with no real relationship to paper, other than presenting the same material, but I don't see how paper can possibly compete once people each find their own comfortable way to etexts, especially in schools."

Whether digitized years ago or now, all Project Gutenberg books are created in 7-bit plain ASCII, called Plain Vanilla ASCII. When 8-bit ASCII is used for books with accented characters like French or German, Project Gutenberg also produces a 7-bit ASCII version with the accents stripped. (This doesn't apply for languages that are not "convertible" in ASCII, like Chinese, encoded in Big-5.)

Project Gutenberg sees Plain Vanilla ASCII as the best format by far, and calls it "the lowest common denominator". It can be read, written, copied and printed by any simple text editor or word processor on any electronic device. It is the only format compatible with 99% of hardware and software. It can be used as it is or to create versions in many other formats. It will still be used while other formats will be obsolete, or are already obsolete, like formats of a few short-lived reading devices launched since 1999. It is the assurance collections will never be obsolete, and will survive future technological changes. The goal is to preserve the texts not only over decades but over centuries.

Project Gutenberg also publishes ebooks in well-known formats like HTML, XML or RTF. There are Unicode files too. Any other format provided by volunteers (PDF, LIT, TeX and many others) is usually accepted, as long as they also supply an ASCII version where possible.

Initially, the books were mostly in English. As the original Project Gutenberg is based in the United States, its first focus was the English-speaking community in the country and worldwide. In October 1997, Michael Hart expressed his intention to digitize ebooks in other languages. In early 1998, the catalog had a few titles in French (10 titles), German, Italian, Spanish and Latin. In July 1999, Michael wrote: "I am publishing in one new language per month right now, and will continue as long as possible."

In the 2000s, multilingualism became a priority for Project Gutenberg,
like internationalization, with Project Gutenberg Australia (created in
August 2001), Project Gutenberg Europe (created in January 2004),
Project Gutenberg Canada (created in July 2007), and others to come.

The launching of Project Gutenberg Europe and Distributed Proofreaders Europe (DP Europe) by Project Rastko was an important step. Founded in 1997, Project Rastko is a non-governmental cultural and educational project. One of its goals is the online publishing of Serbian culture. It is part of the Balkans Cultural Network Initiative, a regional cultural network for the Balkan peninsula in south-eastern Europe.

DP Europe has used the software of the original Distributed Proofreaders, launched in 2000 to share proofreading among a number of volunteers. Since the beginning, DP Europe has been a multilingual website, with its main pages translated into several European languages by volunteer translators. In April 2004, DP Europe was available in 12 languages. The long-term goal was 60 languages and 60 linguistic teams in the main European languages. DP Europe supports Unicode instead of ASCII, to be able to proofread ebooks in numerous languages.

First published in January 1991, Unicode "provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language" (excerpt from the website). This double-byte platform-independent encoding provides a basis for the processing, storage and interchange of text data in any language, and any modern software and information technology protocols. Unicode is maintained by the Unicode Consortium, and is a component of the W3C (World Wide Web Consortium) specifications. In 2008, 50% of available documents on the internet were encoded in Unicode, with the other 50% encoded in ASCII.

In the original Project Gutenberg in the U.S., there were ebooks in 25
languages in February 2004, in 42 languages in July 2005, including
Sanskrit and the Mayan languages, and in 50 languages in December 2006.
The ten top languages were English, French, German, Finnish, Dutch,
Spanish, Chinese, Italian, Portuguese and Tagalog.

[Many thanks to Russon Wooldridge and Mike Cook for revising previous versions of this section.]

FIRST MULTILINGUAL PROJECTS

= [Quote]

Tyler Chambers, who created the Human-Languages Page and the Internet Dictionary Project, wrote in September 1998: "Online, my work has been with making language information available to more people through a couple of my web-based projects. While I'm not multilingual, nor even bilingual, myself, I see an importance to language and multilingualism that I see in very few other areas. The internet has allowed me to reach millions of people and help them find what they're looking for, something I'm glad to do. (…) Overall, I think that the web has been great for language awareness and cultural issues — where else can you randomly browse for 20 minutes and run across three or more different languages with information you might potentially want to know?"

= Travlang

Travlang is a website dedicated to both travel and languages, created in 1994 by Michael C. Martin on his university's website when he was a student in physics. Travlang included one section called Foreign Languages for Travelers, with links to online tools to learn 60 languages. Another section, Translating Dictionaries, gave access to free dictionaries in a number of languages (Afrikaans, Czech, Danish, Dutch, Esperanto, Finnish, French, Frisian, German, Hungarian, Italian, Latin,

Pages