Details
Title | Automatic Term Extraction from Parallel Corpus in the Domain of Automatic Text Processing: выпускная квалификационная работа магистра: направление 45.04.04 «Интеллектуальные системы в гуманитарной среде» ; образовательная программа 45.04.04_01 «Цифровая лингвистика (международная образовательная программа)/Digital Linguistics (International Educational Program)» |
---|---|
Creators | Чистякова Светлана Владимировна |
Scientific adviser | Коган Марина Самуиловна |
Organization | Санкт-Петербургский политехнический университет Петра Великого. Гуманитарный институт |
Imprint | Санкт-Петербург, 2024 |
Collection | Выпускные квалификационные работы; Общая коллекция |
Subjects | automatic term extraction; parallel corpus; bilingual glossary; automatic text processing; computational linguistics; natural language processing (NPL) |
Document type | Master graduation qualification work |
File type | |
Language | Russian |
Level of education | Master |
Speciality code (FGOS) | 45.04.04 |
Speciality group (FGOS) | 450000 - Языкознание и литературоведение |
DOI | 10.18720/SPBPU/3/2024/vr/vr24-5804 |
Rights | Доступ по паролю из сети Интернет (чтение) |
Additionally | New arrival |
Record key | ru\spstu\vkr\33252 |
Record create date | 8/29/2024 |
Allowed Actions
–
Action 'Read' will be available if you login or access site from another network
Group | Anonymous |
---|---|
Network | Internet |
The object of the research is terms in the domain of automatic text processing in English and Russian language. The subject of investigation is building a bilingual glossary using automated tools. The objective of the study is extraction and comparative analysis of automatically extracted terms from parallel corpus of English and Russian texts in the domain of automatic text processing. The tasks of the study: Collect and analyse scientific literature and technical documentation on the master thesis topic. Analyse the existing methods of term extraction and select the most suitable approach to fulfill the research objective. Review existing English-Russian bilingual resources in the automatic text processing domain. Build and preprocess the parallel corpus in the automatic text processing domain. Employ selected algorithms to extract terms from the parallel corpus. Analyse and systemise extracted terminology. The research methodologies include methods of corpus linguistics, quantitative data analysis, comparative analysis of terms and computer data processing technologies. As a result of the study the automatic term extraction algorithm was implemented on a bilingual corpus of texts in the domain of automatic text processing in English and Russian, leading to the creation of a bilingual glossary. The bilingual glossary in the domain of automatic text processing can be leveraged to enhance automatic machine translation systems, integrated into document summarisation tasks, or incorporated into educational materials and curriculum development for natural language processing courses.
Network | User group | Action |
---|---|---|
ILC SPbPU Local Network | All |
|
Internet | Authorized users SPbPU |
|
Internet | Anonymous |
|
Access count: 0
Last 30 days: 0