7 results on '"Dobrišek, Simon"'
Search Results
2. Razpoznavanje slovenskega govora z metodami globokih nevronskih mrež.
- Author
-
Ulèar, Matej, Dobrišek, Simon, and Robnik-Šikonja, Marko
- Abstract
Copyright of Uporabna Informatika is the property of Slovensko Drustvo Informatika and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2019
3. The Slovenian Dialog System for Air Flight Inquires
- Author
-
Ipšić, Ivo, Mihelič, France, Pepelnjak, Karmen, Žganec Gros, Jerneja, Dobrišek, Simon, Pavešić, Nikola, and Nöth, Elmar
- Subjects
dialogue system ,multi-lingual systems ,speech recognition ,speech synthesis - Abstract
The Slovenian Dialog System for Air Flight Inquires
- Published
- 1997
4. An Edit-Distance Model for the Approximate Matching of Timed Strings.
- Author
-
Dobrišek, Simon, Žibert, Janez, Pavešić, Nikola, and Mihelič, France
- Subjects
- *
PATTERN perception , *REPETITIVE patterns (Decorative arts) , *PATTERN recognition systems , *CLASSIFIERS (Linguistics) , *EDITING software , *AUDITORY perception - Abstract
An edit-distance model that can be used for the approximate matching of contiguous and noncontiguous timed strings is presented. The model extends the concept of the weighted string-edit distance by introducing timed edit operations and by making the edit costs time dependent. Special attention is paid to the timed null symbols that are associated with the timed insertions and deletions. The usefulness of the presented model is demonstrated on the classification of phone-recognition errors using the TIMIT speech database. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
5. Samodejno razpoznavanje glasov slovenskega govora z uporabo zbirke orodij Kaldi
- Author
-
BOLKA, ALEŠ and Dobrišek, Simon
- Subjects
hidden Markov models ,razpoznavanje govora ,slovenski jezik ,nevronska omrežja ,speech recognition ,Slovenian language ,prikriti Markovovi modeli ,neural networks ,Kaldi - Abstract
Diplomsko delo obravnava avtomatsko razpoznavanje glasov slovenskega jezika na osnovi podatkovne zbirke Sofes. Uporabljen je nabor programskih orodij za razpoznavanje glasov in govora Kaldi, ki do sedaj za slovenski jezik še ni bil uporabljen. Na podlagi različnih jezikovnih in akustičnih modelov razpoznavalnika je bilo izvedeno rapoznavanje glasov. Dobljeni rezultati, tako na podlagi nevronskih omrežij kot klasičnih HMM pristopov, so obetavni. The thesis is focused on automatic speech recognition of Slovenian phonemes, based on the Sofes database. The speech and phoneme recognition toolkit Kaldi is used, which has thus far not been used for the Slovenian language. The speech recognition process was implented with various acousting and lingustic models. The results, obtained by using both neural networks and classical HMM approaches, yielded promising results.
- Published
- 2016
6. Razvoj govornega vmesnika za vnos podatkov pri terenskem delu
- Author
-
SEVER, VID and Dobrišek, Simon
- Subjects
razpoznavanje govora ,speech recognition ,Google Speech API ,speech interface ,govorni vmesnik - Abstract
Cilj dela v diplomski nalogi je razviti govorni vmesnik, ki bo uspešno reševal probleme z vnašanjem podatkov v informacijske sisteme med terenskim delom. V prvem delu naloge smo raziskali področje razpoznavanja govora in pregledali možne govorne vmesnike ter orodja, katere bi lahko uporabili pri svojem delu V drugem delu naloge smo se osredotočili na samo izvedbo govornega vmesnika v programskem jeziku Python. Pri obdelavi posnetkov govora smo uporabili nekaj nestandardnih Python knjižnic. Za razpoznavanje govora smo uporabili Googlov govorni programski vmesnik Google Speech API. Razpoznano besedilo smo oblikovali v HTML formatu. Razvili smo tudi grafični vmesnik. Delovanje govornega vmesnika smo preizkusili v okoljih z različno ravnijo hrupa. Ugotovili smo, da zadovoljivo dobro deluje tudi pri posnetkih, narejenih v naravnem okolju, v katerem terensko delo navadno poteka. Main goal of the thesis was to develop a speech interface for solving problems with data entry during fieldwork. In first part of the thesis we did an overview of speech recognition field, tools and speech interfaces which we cloud use in development of my own speech interface. In the second part of the thesis we focused on developing speech interface with python programing language. We used some nonstandard python libraries for audio processing. Speech recognition was performed by Google Speech API. We used HTML format to achieve the desired text structure of the output. We also developed a graphical user interface. We tested the speech interface in different environments with different noise volumes. We concluded that it performs well with voice recordings that were recorded in a natural environment, where fieldwork is usually performed. Performance drops only in environments with a really loud noise.
- Published
- 2016
7. Preizkus Googlovega govornega programskega vmesnika za slovenski govorjeni jezik
- Author
-
ČEFARIN, DAVID and Dobrišek, Simon
- Subjects
razpoznavanje govora ,speech recognition ,pošiljanje elektronske pošte ,sending emails ,Google - Abstract
Motivacija za zaključno delo je bila želja, da bi spoznali in preizkusili Googlovo aplikacijo za razpoznavanje slovenskega govora. Želja je bila tudi ugotoviti, kako bi bil videti konkreten primer uporabe te aplikacije in na kakšne težave in omejitve bi pri tem naleteli. V prvem poglavju so predstavljena kratka zgodovina in glavna odkritja, ki so prispevala k razvoju sistemov za razpoznavanje govora. Tako lahko vidimo, da je v zadnjih par letih prišlo do velikega napredka pri razpoznavanju govora z uporabo večplastnih nevronskih omrežij. Nevronska omrežja so se v zadnjih letih uveljavila kot orodje za ugotavljanje verjetnosti ujemanja zvočnih signalov s shranjenimi modeli besed oziroma glasov. Tak pristop je prisoten v veliko komercialnih sistemov za razpoznavanje govora. Prav ti sistemi so predstavljeni v drugem poglavju. Tu lahko vidimo, da je na trgu čedalje več razpoznavalnikov govora, razvitih v velikih računalniških podjetjih, kot so Microsoft, Google ali Apple. Ti sistemi se uporabljajo tako za samo narekovanje besedila ali kot ključni del programov osebnih pomočnikov, ki predstavljajo popolnoma nov način interakcije z računalnikom. Iz vsega tega je jasno, da je razvoj razpoznavanja govora v zadnjih letih naredil velik napredek tako v akademski sferi kot v gospodarstvu. Kljub temu obstajajo številne omejitve pri uporabi sistemov za razpoznavanje govora v vsakdanjem življenju. Nekatere od teh problemov spoznamo s testiranjem Googlove aplikacije s pomočjo orodja HResults. Testiranja so bila izvedena na treh različnih skupinah vzorcev govora. Prva skupina je vsebovala vzorce počasnega in razločnega branja, v drugi skupini so bili vzorci hitrejšega in manj razločnega branja, v tretji pa so bili vzorci prostega govora. Točnost razpoznavanja se je razlikovala med različnimi vzorci. Tako je bila povprečna točnost pri razpoznavanju vzorcev iz prve skupine 88-odstotna, pri drugi skupini 79-odstotna in pri tretji 62-odstotna. Rezultati točnosti razpoznavanja pri prvi skupini vzorcev so zelo dobri in so podobni rezultatom, ki jih oglašujejo podjetja ob predstavitvi svojih izdelkov. Iz testov pa lahko vidimo, da ima Googlova aplikacija za razpoznavanje govora težavo pri hitrejšem branju. Težave ima predvsem z razpoznavanjem kratkih besed, kot so vezniki in predlogi, ki jih ne razpozna ali jih razpozna narobe. Pri prostem govoru je razpoznavanje še slabše zaradi velikega števila nepotrebnih glasov in medmetov, tako da so nekateri stavki popolnoma nerazumljivi. Iz rezultatov lahko zaključimo, da je Googlov razpoznavalnik zelo uporaben, če govorimo razločno in ga ne uporabljamo v situacijah, kjer je zahtevana popolna točnost pri razpoznavanju besed. Predstavljeni so tudi rezultati razpoznavanja na večji zbirki vzorcev v dveh različnih obdobjih. Iz preizkusa je razvidno, da se točnost razpoznavanja s časom izboljšuje. V zadnjem poglavju je opisan program, ki je narejen v programskem jeziku Python in je namenjen pošiljanju elektronske pošte s pomočjo govora. Pri razvoju programa lahko spoznamo načine implementacije in delovanje Googlovega sistema za razpoznavanje govora na praktičnem primeru. Googlova aplikacija je izredno zanesljiva, saj ni bilo nobenih problemov s povezavo na njen strežnik. Omejitev časovne dolžine govora, ki ga aplikacija razpozna, predstavlja oviro, če želimo delati z daljšimi zvočnimi datotekami. Ta problem se da rešiti z razdelitvijo zvočnih datotek na krajše dele. Da razdelitev ne uniči govora, jo je treba izvesti na mestih, kjer pride do prekinitve v govoru. Za snemanje program uporablja orodje PyAudio za pretvorbo med zvočnimi formati pa Sound eXchange. Ko imamo transkripcijo celotnega besedila, program besedilo uredi in zamenja narekovana ločila, kot so pika in vejica s pravimi ločili. Na koncu program pošlje besedilo na izbran naslov elektronske pošte. Uporabnost takšnih programov je predvsem v primerih, ko ima uporabnik zasedene roke ali oči z drugim delom in ne more ali ne sme uporabljati tipkovnice. In this thesis, we test the accuracy and usefulness of the Google speech recognition application for the Slovenian language. We also want to find out the limitations and problems of the application. In the first chapter, there is a short overview of history and the most important developments in the field of speech recognition. We can see how deep learning and use of neural networks has profoundly revolutionized automatic speech recognition. Also, a lot of commercial auto speech recognition systems nowadays are based on deep learning methods. Some of these commercial systems are described in the second chapter. The commercial systems for speech recognition have been developed by big corporations, such as Microsoft, Google, and Apple. These systems are a part of the programs that work as intelligent personal assistants. Personal assistants use natural language user interface to answer questions, make recommendations, and perform actions. We tested the accuracy of Google application with the tool HResults. The tests were made on three different groups of samples. The first group included samples of reading that are slow and easy to understand. In the second group, there were samples of faster reading. The third group encompassed speech that is the hardest to understand. The accuracy of recognition was different between different groups of samples. In the first group, recognition accuracy was 88%, recognition accuracy in the second group was 79%, and in the third group 62%. The recognition results are very good for slow reading. These results are similar to results published by advertisements for commercial recognition systems. We can also see that Google application has problems when it tries to recognize faster reading. The recognizer has trouble recognizing short words such as conjunctions and prepositions. When the application recognizes speech that is the hardest to understand, the results are even worse because there are so many unnecessary interruptions and interjections in the speech. We conclude that the Google application is very useful when we speak clearly and we do not use the application in situations where we need perfect accuracy of recognition. We also measure the accuracy of recognition at two different time points. The test shows the accuracy of recognition is improving over time. In the last chapter, we describe the development of our own computer program for sending emails with the help of Google application for speech recognition. Google application is easy to implement into a computer program. The application is also very reliable, but has a limitation on duration of the speech. Time limitation can cause some problems when we want to recognize longer speech files. These problems are solved by cutting audio files into smaller pieces without losing any of the important data. Our program for sending email uses PyAudio for recording audio and it uses Sound eXchange for conversion between different audio formats. When we get the whole transcription of speech from Google application, the program replaces all dictated punctuation marks with real symbols for punctuation marks. At the end, the computer program sends email to the chosen address. These types of programs are useful in situations where the user cannot use a keyboard.
- Published
- 2016
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.