• Classification of Persian accents speech using[[taliem.ir]

    Classification of Persian accents speech using histogram of pitch contour

    تومان

    Speaker variability significantly impacts the performance of speech recognition. One of the most important facors that cause the variance among speakers is accent. This paper describes an accent recognition system for Persian accents from 5 different dialects. A novel framework based on histogram of pitch contour of speech is proposed. To reliably detect the accent, a sufficiently long continuous speech segment is  required. This research aims to analyze different segmentation approaches of speech signals. SAHAND accent speech dataset (SES) is used for data set evaluation. The experi mental results confirm that accurate accent recognition is obtained when the speech segments are longer than 1.4 second or are composed of more than 15 voiced segments. To recog nize the accent in speech, the normalized histogram of pitch  frequencies are employed successfully using several approaches, namely K-nearest-neighbor (KNN), Artificial Neural Networks (NNs) based  classifier, Naive Bayes and linear discriminant analys (LDA).

  • Comparison of Farsi Vowel Intonation with[taliem.ir]

    Comparison of Farsi Vowel Intonation with Different Languages for Teaching and Preserving Original Accent

    تومان

    Voice intonation is one of criteria of appropriately expression of phonemes and words, especially in teaching language exert appropriately  intonation by person on phonemes and words, regardless of type such as: interrogative, affirmative and etc, are depends on learning. Comparison and reconstruction of voice intonation are two important challenges in Computer Assisted Language Learning. In this paper a method based on discrete signal processing has been presented. By using this method user can see similarity value of own voice intonation with source voice. In addition duration of voice and silent of user sound intervals, reconstruct according to source voice, and the user will be able to hear reconstructed phoneme by own voice. Applications of this method are: teaching standard Farsi pronunciation to the non-Farsi speakers, speech therapy, animation, and E-learning.

  • This paper presents a fast and simple method for FarsiArabic subwords recognition in a large lexicon. Byomitting dots [taliem.ir]

    Farsi Machine-printed Subwords Recognition Using Contour-based Fourier Descriptors

    تومان

    This paper presents a fast and simple method for Farsi/Arabic subwords recognition in a large lexicon. Byomitting dots and  complementary  parts of machine-printed characters, a dataset including 9445 Farsi/Arabic subwords written by a single font and single size was obtained. This dataset not only reduces the number of subwords, but makes it suitable for both Farsi/Arabic languages. After normalizing boundary points of each subword, Fourier descriptor features are extracted. Experimental results on 30 plain text shows accuracy of 82.1% on subword level. Considering this large and comprehensive dataset, the obtained results are still promising which can be enhanced in the future by the use of Farsi/Arabic language grammar for connecting subwords.

  • Language Discrimination and Font[taliem.ir]

    Language Discrimination and Font Recognition in Machine Printed Documents Using a New Fractal Dimension

    تومان

    This paper focuses on language separation and font recognition in multilingual and multi-font texts. The purpose of this task is to improve performance of general OCR systems, dealing with omni-fonts and different languages. The proposed method is based on an innovative fractal dimension measurement. The extracted features with this method are independent of document contents and considers language and font recognition problem as texture identification task. Experimental results on three different languages namely, Farsi, Arabic and English with their most popular fonts show that the proposed method not only separates these languages but recognizes their font types accurately.

  • Preparing an accurate Persian POS tagger suitable for[taliem.ir]

    Preparing an accurate Persian POS tagger suitable for MT

    تومان

    In this paper an accurate Persian POS tagger suitable for MT is prepared. First a new set of POS tags is defined which is general and more  usable for MT rather than detailed ones; Then an accurate tagged corpus is prepared with modifying Bijankhan corpus. Stanford POS tagger is trained on the modified Bijankhan, the resulting tagger gives a 99.36% accuracy which shows significant improvement over previous Persian taggers. Result of utilization of this tagger for statistical machine translation is investigated. Outputs show better performance compared to simple SMT, while using previous tagger in SMT drops the BLEU compared to simple SMT.

  • Semantically Clustering of Persian Words[taliem.ir]

    Semantically Clustering of Persian Words

    تومان

    Clustering is one of data mining task which aims to divides a set of objects into groups so that similar objects fall into the same group and  objects with different features are put into different and separate groups. This paper presents a technique for semantic word clustering which is one of the applications of data mining techniques in the task of natural language processing. Word clustering is used in various fields of textmining such as word disambiguation, information retrieval, language modelling, and text classification. This paper proposes a graph based method to clustering Persian words. The proposed method is a type of pattern-based clustering. This method includes two parts; in the first part using statistical similarity measures such as Chi-Square, pointwise mutual information (PMI), and Cosine a word co-occurrence graph is obtained. In the second part, the graph is further divided into appropriate clusters by Newman’s graph clustering algorithm. Our researches show that Chi-square is the best measure to cluster the words in Persian.

  • bannertaliem-taliem-ir

    استفاده از بازآرایی نحوی جهت بهبود ترجمه ماشینی آماری انگلیسی به فارسی

    تومان

    ترجمه ماشینی آماری به عنوان یکی از بهترین روش ها برای ترجمه از یک زبان به زبان دیگر شناخته میشود. برای زبان هایی که از لحاظ ساختار دارای شباهت زیادی به یکدیگر هستند خروجی این مترجم بسیار مناسب میباشد. تفاوت های ساختاری میان زبان انگلیسی و فارسی و همچنین عدم وجود پیکره دوزبانه بزرگ باعث شده است که این روش برای زبان ترجمه انگلیسی به فارسی ترجمه های مطلوبی را تولید نکند. ما در این مقاله سعی كرده ایم با استفاده از رهیافت بازآرایی کلمات، تا حد ممکن شباهت ساختاری میان عبارت انگلیسی و فارسی را افزایش دهید. در ادامه تاثیر این عمل را بر روی بهبود نتایج خروجی مورد بررسی قرار داده ایم. به همین منظور ابتدا با كمك درخت تجزیه، مجموعه ای از قوانین بازآرایی استخراج شده است. سپس این قوانین به عنوان یك عمل پیش پردازشی برروی عبارات انگلیسی اعمال گردیده است. نتایج بررسی ها نشان میدهد كه خروجی مترجم پس از اعمال این روش منجر به بهبود كیفیت ترجمه در معیار BLEUشده است.

  • bannertaliem-taliem-ir

    بهبود دسته بندي متون فارسي در روش همسایگی وزن دار

    تومان

    با رشد روز افزون منابع اطلاعاتي و حجم مقالات و مطالب توليد شده در زمينه هاي مختلف و به شکل هاي متنوع اعم ازرسانه های مختلف ديجيتال نياز به دسترسي آسان اطلاعات نيز افزايش مييابد. يکي از نياز هاي اوليه در بالا بردن سرعت دسترسي که اطلاعات و پردازش اين مطالب که غالباً داراي حجم بالايي نيز ميباشند، دسته بندي اين اطلاعات در طبقات مختلف ميباشد. دسته بندي متون به عم برچسب زدن يا تفکيک يک متن در قالب يکي از دسته هاي از پيش تعیین شده گفته ميشود. در اين مقاله به بررسي عملکرد الگوريتم WKNN3با استفاده از معيار وزن دهي tf-idfميپردازيم. همچنکين براي بالابردن دقت در انتخاب طبقه صحيح و به منظور افزايش کارايي الگوريتم از روش ميانگین گيري از داده ها به عنوان معيار ارزيابي استفاده میکنیم. نتايج به دست آمده از تفکيک متون فارسی با استفاده از روش هاي فوق نشان دهنده دقت 98درصد ميباشد.