
Title Automatic Term Extraction from Parallel Corpus in the Domain of Automatic Text Processing: выпускная квалификационная работа магистра: направление 45.04.04 «Интеллектуальные системы в гуманитарной среде» ; образовательная программа 45.04.04_01 «Цифровая лингвистика (международная образовательная программа)/Digital Linguistics (International Educational Program)»
Creators Чистякова Светлана Владимировна
Scientific adviser Коган Марина Самуиловна
Organization Санкт-Петербургский политехнический университет Петра Великого. Гуманитарный институт
Imprint Санкт-Петербург, 2024
Collection Выпускные квалификационные работы; Общая коллекция
Subjects automatic term extraction; parallel corpus; bilingual glossary; automatic text processing; computational linguistics; natural language processing (NPL)
Document type Master graduation qualification work
File type PDF
Language Russian
Level of education Master
Speciality code (FGOS) 45.04.04
Speciality group (FGOS) 450000 - Языкознание и литературоведение
DOI 10.18720/SPBPU/3/2024/vr/vr24-5804
Rights Доступ по паролю из сети Интернет (чтение)
Additionally New arrival
Record key ru\spstu\vkr\33252
Record create date 8/29/2024

Allowed Actions

Action 'Read' will be available if you login or access site from another network

Group Anonymous
Network Internet

The object of the research is terms in the domain of automatic text processing in English and Russian language. The subject of investigation is building a bilingual glossary using automated tools. The objective of the study is extraction and comparative analysis of automatically extracted terms from parallel corpus of English and Russian texts in the domain of automatic text processing. The tasks of the study: Collect and analyse scientific literature and technical documentation on the master thesis topic. Analyse the existing methods of term extraction and select the most suitable approach to fulfill the research objective. Review existing English-Russian bilingual resources in the automatic text processing domain. Build and preprocess the parallel corpus in the automatic text processing domain. Employ selected algorithms to extract terms from the parallel corpus. Analyse and systemise extracted terminology. The research methodologies include methods of corpus linguistics, quantitative data analysis, comparative analysis of terms and computer data processing technologies. As a result of the study the automatic term extraction algorithm was implemented on a bilingual corpus of texts in the domain of automatic text processing in English and Russian, leading to the creation of a bilingual glossary. The bilingual glossary in the domain of automatic text processing can be leveraged to enhance automatic machine translation systems, integrated into document summarisation tasks, or incorporated into educational materials and curriculum development for natural language processing courses.

Network User group Action
ILC SPbPU Local Network All
Internet Authorized users SPbPU
Internet Anonymous

Access count: 0 
Last 30 days: 0

Detailed usage statistics