I should note that “collegiate” dictionaries, like American Heritage or the Merriam- Webster Collegiate, contain only a limited subset of available words, perhaps 150,000 of them, whereas if you include scientific, industrial, technical, commercial, and dialect words as well as non-English borrowings, there may be as many as 4,000,000 words in the “American English” language (my guess, which, as any lexicographer will tell you, is a real shot in the dark). It was also important in producing a lexicon of controlled size, to accommodate the memory limitations of early computers but also to avoid the rare words that most users probably didn’t mean. It was manifestly not possible to ask the readers-and-markers what words they were no longer using, yet culling the lexicon in this way was very important in order to produce an accurate contemporary dictionary. From the perspective of q dictionary-making, this resource allowed the lexicographers to determine what words were being used and how frequently-but as an equal benefit, what words had passed out of use. This work offered the opportunity of developing a lexicon on scientific principles rather than the accident of personal recognition of new words, forms, and senses. Nelson Francis) and Frequency Analysis of English Usage (also with Francis). Office of Education’s Standard Corpus of American English Project which resulted in the creation of the well-regulated million-word “Brown Corpus,” on which was based his Computational Analysis of Present-Day American English (with W. Henry Kučera of Brown University, who as early as 1962 was already teaching a course in computational linguistics there. How could that be done? An alliance was formed with Dr. When I came to Houghton Mifflin in the early nineteen eighties, the commitment had been made to use the computer actually to document American English as it was written at the time. (Of course, for Murray the challenge was even greater because the Oxford wanted to be a “historical” dictionary, representing obsolete words and words still in use with the dates of the first appearance for all entries and with citations.) Since time immemorial, that is, since James Murray and his staff put together the Oxford English Dictionary-the famous O.E.D.-in the latter part of the nineteenth and early part of the twentieth centuries, dictionaries had been compiled on the basis of a process known as “reading and marking.” Dictionary publishers would employ retired clergy, for example, or schoolteachers on a summer break to notice new words or new senses of old words and to submit slips recording the words in context to the dictionary editors as documentation. That latter objective required much more exacting measurement than had been done before. Houghton Mifflin, as publisher of one of the great American dictionaries, The American Heritage Dictionary, the print composition of which was driven by a lexical database, began to realize in the early nineteen eighties, as it prepared new editions of this standard reference book, that natural language processing could assist it in ensuring that the lexicon was as accurate as possible and reflective of the new and not entirely welcome standard set by editor Philip Babcock Gove in Merriam- Webster’s Third New International Dictionary, Unabridged, that dictionaries should reflect what people were actually saying and not an abstract, inherited standard. But truly effective spell checking requires a sophistication about natural language that was in the early days not so common in the computer science community. It was easy enough to develop a routine that would flag exceptions to a stored dictionary, and many people tried it. Once you begin processing text, computer-based spell checking is the kind of application that is so irresistible that there are likely to have been many “first” implementations and many “first” implementers.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |