MaterStudiorum.ru - домашняя страничка студента.
Минимум рекламы - максимум информации.


Авиация и космонавтика
Административное право
Арбитражный процесс
Архитектура
Астрология
Астрономия
Банковское дело
Безопасность жизнедеятельности
Биографии
Биология
Биология и химия
Биржевое дело
Ботаника и сельское хоз-во
Бухгалтерский учет и аудит
Валютные отношения
Ветеринария
Военная кафедра
География
Геодезия
Геология
Геополитика
Государство и право
Гражданское право и процесс
Делопроизводство
Деньги и кредит
Естествознание
Журналистика
Зоология
Издательское дело и полиграфия
Инвестиции
Иностранный язык
Информатика
Информатика, программирование
Исторические личности
История
История техники
Кибернетика
Коммуникации и связь
Компьютерные науки
Косметология
Краткое содержание произведений
Криминалистика
Криминология
Криптология
Кулинария
Культура и искусство
Культурология
Литература и русский язык
Литература(зарубежная)
Логика
Логистика
Маркетинг
Математика
Медицина, здоровье
Медицинские науки
Международное публичное право
Международное частное право
Международные отношения
Менеджмент
Металлургия
Москвоведение
Музыка
Муниципальное право
Налоги, налогообложение
Наука и техника
Начертательная геометрия
Новейшая история, политология
Оккультизм и уфология
Остальные рефераты
Педагогика
Полиграфия
Политология
Право
Право, юриспруденция
Предпринимательство
Промышленность, производство
Психология
Психология, педагогика
Радиоэлектроника
Разное
Реклама
Религия и мифология
Риторика
Сексология
Социология
Статистика
Страхование
Строительные науки
Строительство
Схемотехника
Таможенная система
Теория государства и права
Теория организации
Теплотехника
Технология
Товароведение
Транспорт
Трудовое право
Туризм
Уголовное право и процесс
Управление
Управленческие науки
Физика
Физкультура и спорт
Философия
Финансовые науки
Финансы
Фотография
Химия
Хозяйственное право
Цифровые устройства
Экологическое право
Экология
Экономика
Экономико-математическое моделирование
Экономическая география
Экономическая теория
Эргономика
Этика
Юриспруденция
Языковедение
Языкознание, филология
    Начало -> Информатика, программирование -> Division of the sentence into phrases

Название:Division of the sentence into phrases
Просмотров:392
Раздел:Информатика, программирование
Ссылка:none(0 KB)
Описание: Министерство образования Республики Беларусь Учреждение образования «Гомельский государственный университет им. Ф. Скорины» Филологический факультет Курсовая работа Division of the senten

Часть полного текста документа:

Министерство образования Республики Беларусь

Учреждение образования

«Гомельский государственный университет

им. Ф. Скорины»

Филологический факультет

Курсовая работа

Division of the sentence into phrases

Исполнитель:

Студентка группы К-42

Лапицкая Т.Е.

Гомель 2005


Content

 

Introduction

Presentation

Algorithm for division of the sentence into phrases

Lists used by Algorithm No 2

Some examples of the performance of Algorithm No 2

Conclusion

References

 


Introduction

 

For multiple purposes, in Text Processing and Machine Translation, often there is a need to divide the sentence into smaller units that can be processed more easily than the whole sentence, especially when the sentence happens to be a long one. To that purpose we have devised an efficient algorithm based on the assumptions presented in the next section.


Presentation

 

When we say that we are going to divide the sentence into phrases, we must state first how we will define the phrase and what our understanding of the phrase will be where it starts and where it ends. For the purposes of the present algorithm (and not for any other, especially theoretical, purposes) the phrase is delimited on its left and on its right by Punctuation Marks and Auxiliary words. The phrase usually starts with an Auxiliary word and ends with the appearance of a Punctuation Mark or an Auxiliary word.

The Auxiliary words, marking the boundaries of the phrases, are presented in tables (Lists). Each table lists Auxiliary words of a particular type. It was observed that some Auxiliary words (as well as some sequences of consecutively used Auxiliary words) start usually longer and more independent phrases than others. For example, in a sentence like is often difficult to seek solutions through the curtailment of consumption.

The Auxiliary word through followed by the Article the (another Auxiliary word) starts a phrase that ends with the appearance of a Punctuation Mark, while the Auxiliary word of starts a sub-phrase which is part of a longer phrase. In our algorithm (see Algorithm No 2 in Section 3) this subdivision of the sentence into longer phrases and the subdivision of the longer phrases into smaller constituent phrases is expressed by leaving different lengths of space between one phrase and another. The longer the space left before the phrase, the more self-sufficient and independent the phrase is thought to be. In this study we have established five types of phrases, depending on their relative independence within the sentence. This independence is expressed by a particular Auxiliary word (or words) or by a Punctuation Mark. The longest and the most self-sufficient and relatively independent phrase starts and ends with a Punctuation Mark. The second most independent phrase starts with a word from List No 1 and ends with a Punctuation Mark or with the appearance of another Auxiliary word from List No 1. For example:

(6 spaces left) One US government study estimated

(5 spaces left) that there are 68 large manufacturing complexes

(4 spaces) in the region

(5 spaces left) that have significant idle capacity, (end)

The full stop at the start of the sentence is equivalent to six spaces. In other words, a smaller space following after a larger space to the left means that the phrase starting after the smaller space is dependent on, and a constituent of, the larger phrase. The smaller space in the example above (4 spaces) shows that the phrase following after it is dependent on the previous phrase that there are 68 large manufacturing complexes and explains it (or brings additional information about it, here location), while the five spaces left after region signify that the next phrase is dependent on the previous large phrase (the one that has a longer space left in front), in this case One US government study estimated that there are 68 large manufacturing complexes.

The space left between the phrases depends on the actual Preposition (or Punctuation Mark) used or on the sequence of Punctuation Mark and/or Auxiliary words, as specified (for more details see the instructions for Algorithm No 2 below).


Algorithm for division of the sentence into phrases

Input text comparing of each word entry Searching left or right with the Auxiliary words or (up to two words) for Punctuation Marks (presented other Auxiliary words in Lists) and identifying the or Punctuation Marks Auxiliary words or Punctuation Marks Output result: a phrase

Note: The algorithm (27 digital instructions in all) is available for free download on the Internet (see Internet Downloads at the end of the book).

Lists used by Algorithm No 2

NB The words not registered in the Lists are recorded as they follow, in the same sequence, after those registered in the Lists.

(i)      List No 1: besides, therefore, however, whereas, thus, hence, though, despite, with, nevertheless, throughout, through, during, that, only, but, if, otherwise, again, which, although, thereby, already, against, unless, thereafter etc.

(ii)     List No 2: over, as, what, toward(s), for, into, about, by, so, from, at, above, under, beside, below, onto, since, behind, in front of, beyond, around, before, after, then, altogether, among(st), between, beneath etc.

(Hi) List No 3: both, neither, none etc.

(iv)    List No 4: of, to (as Preposition)

(v)     List No 5: the, a, an

(vi) List No 6: so much as, so far as, so far, as long as, as soon as, so long as, in order that, in order to, lest, as well as, and, or, nor etc.

(vii) List No 7: such, than, onto, until, all, near, even, when, while, within, last, next, also, less, more, most, whether, much, once, one, any, many, some, where, another, other, each, then, whose, who, whoever, till, until, what, across, whence, according, due to, owing, whereby, prior, wherever, whenever, already, moreover, likewise, however etc.

(viii) List No 8: out, in, on, down etc.

Some examples of the performance of Algorithm No 2

Below we will present a text divided into phrases according to the instructions for the algorithm:

(i) Many countries also have established or have under construction a free zone, where exporters have access to shipping facilities, a pool of labour and freedom from exchange controls.

(ii) The Caribbean Basin Initiative, a US package of aid and trade incentives to encourage manufacturing, has given an added boost to industrial development in this region.

The analysis of the sentence starts with checking the contents of the memory and taking to print any information stored up to this moment (this is done at the start of each new sentence), also with ascertaining whether the sentence has ended or not and recording the analysed word in the memory if it is not recorded yet ia procedure carried out after each word). ............







Похожие работы:

Название:The problem of polysemy in the English language
Просмотров:551
Описание: MINISTRY OF EDUCATION, SCIENCE, YOUTH AND SPORT OF UKRAINE IVAN FRANCO NATIONAL UNIVERSITY OF LVIV COLLEGE OF EDUCATION THE PROBLEM OF POLYSEMY IN THE ENGLISH LANGUAGE Bachelor paper presented by a 4th — year student Galyna Tsvyk Supervised by Drofyak N.I. Teacher of English Lviv - 20

Название:A history of the english language
Просмотров:383
Описание: CONTENTS INTRODUCTION CHAPTER 1. LINGUISTIC SITUATION IN OLD ENGLISH AND MIDDLE ENGLISH PERIOD 1.1 The development of Futhark 1.1.1 The runic alphabet as an Old Germanic writing tradition 1.1.2 Old English literature in the period of Anglo-Saxon ethnic extension 1.2 Linguistic situation in the Middle English 1.2.1 Linguistic situation in Medieval England afte

Название:National varieties of English
Просмотров:375
Описание: Introduction “The youth is not only our hope and future, but it is a decisive force of our today and tomorrow”. Islam Abduganievich Karimov Our Republic pays the great attention to the education of the students of Universities ,colleges, schools. Great attention is paid in the republic to the improvement of educational system and training of qualified specialists

Название:Subject: ways of expressing the sentence
Просмотров:428
Описание: Introduction   The given annual project is dedicated to the linguistic problem - ‘The Subject: Ways of Expressing It in the Sentence’. The main goal of the work is to identify the main features of the subject in the sentence, basing on the theoretical and scientific works of Russian, English, American, Moldovan and Romanian authors, and examine the subject and its f

Название:Comparison of nouns in English and Russian languages
Просмотров:537
Описание: Content Introduction Chapter I Morphological features of nouns 1.1 Classification of nouns in English 1.2 Morphological characteristics of Nouns 1.3 Morphological composition of Nouns Chapter II Comparison of Nouns in English and Russian languages 2.1 The category of number of Nouns in English and in Russian languages 2.2 The category of case of Nouns in Engli

 
     

Вечно с вами © MaterStudiorum.ru