پست های ویژه دانشیار

دانلود پایان نامه زبان و ادبیات اردو pdf

در این صفحه دو پایان نامه در رشته زبان و ادبیات اردو قرار داده شده است. فایل PDF این پایان نامه ها را می توانید از قسمت "فایل ها برای دانلود" در پایین همین صفحه دریافت نمایید.

عنوان فارسی:
شناسایی توالی‌های فرمولی در زبان اردو و پیامدهای آموزشی آنها برای SLA (ESL/USL)

عنوان انگلیسی:


Identification of Formulaic Sequences in Urdu Language and Their Pedagogical Implication for SLA (ESL/USL)

Abstract
In this study, an effort has been made to explore formulaicity in the Urdu language and its pedagogical implication in second language acquisition, both for English as a second language and Urdu as a second language learner. It is believed that formulaic sequences or prefabs make more than fifty percent of a language. These formulaic sequences are of various kinds encompassing idioms, proverbs, collocations and sometimes, simple fillers. For the current study, data will be collected from two widely circulated Urdu newspapers. The data will consist of lexical chunks or formulas, which will be identified on the basis of eleven criteria proposed by Wray and Namba (2003). To maintain inter-rater reliability, the data will be shared with an Urdu language expert. After the identification, the formulaic sequences will be classified into six classes. Results of the pilot study show that there is formulaicity in the Urdu language. It was found that Urdu is also replete with almost all kinds of formulaic sequences, like many other languages.

عنوان فارسی:
برچسب‌گذاری معنایی اردو - واژگان، مجموعه‌ها، روش‌ها و ابزارها

عنوان انگلیسی:


An Urdu Semantic Tagger - Lexicons, Corpora, Methods and Tools

Abstract
Extracting and analysing meaning-related information from natural language data has attracted the attention of researchers in various fields, such as Natural Language Processing (NLP), corpus linguistics, data sciences, etc. An important aspect of such automatic information extraction and analysis is the semantic annotation of language data using semantic annotation tool (a.k.a semantic tagger). Generally, different semantic annotation tools have been designed to carry out various levels of semantic annotations, for instance, sentiment analysis, word sense disambiguation, content analysis, semantic role labelling, etc. These semantic annotation tools identify or tag partial core semantic information of language data, moreover, they tend to be applicable only for English and other European languages. A semantic annotation tool that can annotate semantic senses of all lexical units (words) is still desirable for the Urdu language based on USAS (the UCREL Semantic Analysis System) semantic taxonomy, in order to provide comprehensive semantic analysis of Urdu language text. This research work report on the development of an Urdu semantic tagging tool and discuss challenging issues which have been faced in this Ph.D. research work. Since standard NLP pipeline tools are not widely available for Urdu, alongside the Urdu semantic tagger a suite of newly developed tools have been created: sentence tokenizer, word tokenizer and part-of-speech tagger. Results for these proposed tools are as follows: word tokenizer reports F1 of 94.01%, and accuracy of 97.21%, sentence tokenizer shows F1 of 92.59%, and accuracy of 93.15%, whereas, POS tagger shows an accuracy of 95.14%. The Urdu semantic tagger incorporates semantic resources (lexicon and corpora) as well as semantic field disambiguation methods. In terms of novelty, the NLP pre-processing tools are developed either using rule-based, statistical, or hybrid techniques. Furthermore, all semantic lexicons have been developed using a novel combination of automatic or semi-automatic approaches: mapping, crowdsourcing, statistical machine translation, GIZA++, word embeddings, and named entity. A large multi-target annotated corpus is also constructed using a semi-automatic approach to test accuracy of the Urdu semantic tagger, proposed corpus is also used to train and test supervised multi-target Machine Learning classifiers. The results show that Random k-labEL Disjoint Pruned Sets and Classifier Chain multi-target classifiers outperform all other classifiers on the proposed corpus with a Hamming Loss of 0.06% and Accuracy of 0.94%. The best lexical coverage of 88.59%, 99.63%, 96.71% and 89.63% are obtained on several test corpora. The developed Urdu semantic tagger shows encouraging precision on the proposed test corpus of 79.47%.
Despite good results of the proposed tools, methods, lexicons and corpora, however, the following limitations have been observed. A word tokenization method did not handle out-of-vocabulary words in morpheme matching process of space omission problem. Sentence tokenization is rule based and are not able to dealt with non-sentence boundary markers and period marker used between different abbreviations. Whereas, the POS tagger did not completely handle unknown words. Multi-target classifiers did not explore feature extraction approaches and has only been tested on a small dataset. Finally, future work will need to focus on the creation of multi-word semantic lexicons.

 

768
رتبه بندی این مطلب:
بدون رتبه

با تکمیل فرم زیر می توانید درخواست خود را به دانشیار ارسال نمایید. درخواست خود را به صورت کامل بنویسید. همچنین می توانید از طریق تلگرام و واتساپ با دانشیار در ارتباط باشید.

اطلاعات تماس شما

بازخورد شما