11 results on '"Robert W. P. Luk"'
Search Results
2. XML Document Clustering Using Common XPath
- Author
-
Ho-Pong Leung, Fu-Lai Chung, Stephen C. F. Chan, and Robert W. P. Luk
- Subjects
Document Structure Description ,Information retrieval ,computer.internet_protocol ,Computer science ,XML validation ,Well-formed document ,computer.software_genre ,XML database ,Simple API for XML ,Streaming XML ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,XML schema ,computer ,computer.programming_language ,XPath - Abstract
XML is becoming a common way of storing data. The elements and their arrangement in the documents hierarchy not only describe the document structure but also imply the datas semantic meaning, and hence provide valuable information to develop tools for manipulating XML documents. In this paper, we pursue a data mining approach to the problem of XML document clustering. We introduce a novel XML structural representation called common XPath (CXP), which encodes the frequently occurring elements with the hierarchical information, and propose to take the CXPs mined to form the feature vectors for XML document clustering. In other words, data mining acts as a feature extractor in the clustering process. Based on this idea, we devise a path-based XML document clustering algorithm called PBClustering which groups the documents according to their CXPs, i.e. their frequent structures. Encouraging simulation results are observed and reported.
- Published
- 2005
- Full Text
- View/download PDF
3. Adaptive Data Delivery Framework for Financial Time Series Visualization
- Author
-
Fu-Lai Chung, Tak-chung Fu, Chak-man Ng, Chun-fai Lam, and Robert W. P. Luk
- Subjects
Finance ,Information management ,business.industry ,Computer science ,Mobile computing ,computer.software_genre ,Visualization ,Data point ,Data visualization ,The Internet ,Time series ,Web service ,business ,computer - Abstract
Nowadays, financial applications spread over various devices in both e-commerce and m-commerce. One of the major tasks in this kind of application is viewing the historical price movement of a stock for the market players before making any decision. However, due to the divergence of the configurations among different devices and platforms, adaptation is necessary for time series data delivery and visualization. In this paper, an adaptive framework of financial time series delivery and visualization is proposed to achieve this goal. The core system adapts the concept of data point importance and data points reordering for time series representation. The application of the proposed framework on stock price time series delivery and visualization is demonstrated in different devices using Web service.
- Published
- 2005
- Full Text
- View/download PDF
4. SBT-Forest, an Indexing Approach for Specialized Binary Tree
- Author
-
Robert W. P. Luk, Tak-chung Fu, Chak-man Ng, and Fu-Lai Chung
- Subjects
Information management ,Theoretical computer science ,Binary tree ,Computer science ,Trie ,Search engine indexing ,Binary number ,Data mining ,Time series ,Data structure ,computer.software_genre ,computer ,Time series database - Abstract
In our previous work, a time series representation framework, specialized binary tree (SB-tree) has been proposed for representing the stock time series data effectively and efficiently. By putting a set of SB-trees together, a time series database is formed while we termed it as a specialized binary tree-forest (i.e. SBT-forest). By manipulating the SBT-forest, different time series query and mining processes can be facilitated. However, the major challenge is how to locate a SB-tree in the forest efficiently. Therefore, the development of an indexing approach for the SB-trees is of fundamental importance for maintaining an acceptable speed for query. In this paper, a time series indexing approach, based on transforming the SB-trees to symbol strings first and then indexing the symbol strings by a trie data structure, is proposed. The proposed approach is efficient and effective as well. As demonstrated in the experiments, the proposed approach speeds up the time series query process. The proposed approach can handle the problem of updating new entries to the database without any difficulty.
- Published
- 2005
- Full Text
- View/download PDF
5. Improving web server performance by a clustering-based dynamic load balancing algorithm
- Author
-
Kei Shiu Ho, Lai Kuen Ho, Hau Yee Sit, Hong Va Leong, and Robert W. P. Luk
- Subjects
Web server ,business.industry ,Computer science ,Distributed computing ,Round-robin DNS ,Load balancing (computing) ,computer.software_genre ,Load management ,Network Load Balancing Services ,Server ,Resource allocation ,Algorithm design ,The Internet ,Cluster analysis ,business ,computer - Abstract
A load balancing scheme is presented which allows HTTP requests to be dynamically migrated between clustered back-end Web servers based on the loading condition of the system. We adopt a nearest neighborhood clustering algorithm whereby an adaptive number of requests are migrated as determined by the real-time distribution of load among the servers. Experiment results demonstrate that our proposed algorithm yields the best performance when compared with several other common approaches.
- Published
- 2004
- Full Text
- View/download PDF
6. The impact of speech recognition errors on the effectiveness of spoken Cantonese query retrieval
- Author
-
T.K. Choi, W.C. Siu, X.M. Zhu, Robert W. P. Luk, Kin-Man Lam, Fu-Lai Chung, and Man-Wai Mak
- Subjects
Concept search ,Computer science ,business.industry ,Bigram ,Speech recognition ,Search engine indexing ,Pinyin ,computer.software_genre ,Speech processing ,ComputingMethodologies_PATTERNRECOGNITION ,Speech analytics ,Visual Word ,Artificial intelligence ,Syllable ,business ,computer ,Natural language processing - Abstract
This paper examines the impact of recognition errors on spoken Cantonese query retrieval effectiveness. One of the largest test collections provided by NTCIR for evaluating Chinese information retrieval is used. The retrieval system uses one of the best models (2-Poisson) and the robust bigram indexing strategy. If there are no syllable recognition errors, then the errors in converting spelling (called pinyin) to characters degrades the performance by 3.9% points which is not statistically significant. Otherwise, the performance dropped by 10.2% points which is statistically significant. We improved our system by merging the /n/ and /l/ phone labels and retrained the syllable-to-text conversion routines. The improved retrieval system dropped only 6.4% points.
- Published
- 2004
- Full Text
- View/download PDF
7. An adaptive clustering approach to dynamic load balancing
- Author
-
Robert W. P. Luk, Hau Yee Sit, Kei Shiu Ho, Hong Va Leong, and Lai Kuen Ho
- Subjects
Network Load Balancing Services ,Distributed algorithm ,business.industry ,Computer science ,Server ,Distributed computing ,Dynamic load balancing ,Round-robin DNS ,The Internet ,Load balancing (computing) ,Cluster analysis ,business - Abstract
With the rapidly increasing reliance to distributed systems following the prosperity of low cost networking and the Internet, development of effective techniques for task distribution becomes one of the important issues in distributed computing. During the past few years, most of the load balancing algorithms in practical use employed migration policy with a fixed number of tasks in each step. This paper proposes a task transfer scheme with an adaptive number of tasks transferred between the participating servers for load balancing. The adaptation is achieved by a data mining technique, namely, clustering, via employing the distance-weighted nearest neighborhood algorithm. Experiment results show that our proposed algorithm yields the best performance when compared with several other common approaches.
- Published
- 2004
- Full Text
- View/download PDF
8. Evolutionary segmentation of financial time series into subsequences
- Author
-
V. Ng, Tak-chung Fu, Fu-Lai Chung, and Robert W. P. Luk
- Subjects
Optimization problem ,Human-based evolutionary computation ,Computer science ,Segmentation ,Data mining ,Time series ,computer.software_genre ,computer ,Evolutionary computation - Abstract
Time series data are difficult to manipulate. When they can be transformed into meaningful symbols, it becomes an easy task to query and understand them. While most recent works in time series query only concentrate on how to identify a given pattern from a time series, they do not consider the problem of identifying a suitable set of time points based upon which the time series can be segmented in accordance with a given set of pattern templates, e.g., a set of technical analysis patterns for stock analysis. On the other hand, using fixed length segmentation is only a primitive approach to such kind of problem and hence a dynamic approach is preferred so that the time series can be segmented flexibly and effectively. In view of the fact that such a segmentation problem is actually an optimization problem and evolutionary computation is an appropriate tool to solve it, we propose an evolutionary segmentation algorithm in this paper. Encouraging experimental results in segmenting the Hong Kong Hang Seng Index using 22 technical analysis patterns are reported.
- Published
- 2002
- Full Text
- View/download PDF
9. Improving the robustness of wavelet transform for epoch detection
- Author
-
Robert W. P. Luk, Fu-Lai Chung, and Y.Y. Lam
- Subjects
Discrete wavelet transform ,business.industry ,Stationary wavelet transform ,Spline wavelet ,Second-generation wavelet transform ,Wavelet transform ,Pattern recognition ,Wavelet packet decomposition ,symbols.namesake ,Wavelet ,Gaussian noise ,symbols ,Artificial intelligence ,business ,Mathematics - Abstract
This paper investigates (1) the robustness of epoch detection (i.e. identification of glottal closure) by the wavelet transform and (2) the methods to improve its robustness. We achieved a similar identification performance (2% error rate) to earlier investigation using the spline wavelet transform, under Gaussian noise degradation. However, the performance under other types of noise degradation, such as periodic noise (e.g. traffic lights) and short noise (e.g. keyboard noise), is not as robust as before. The scale matching technique could not secure good performance because the spline wavelet has poor recall performance. We explored the use of the Gaussian wavelet transform. Instead of scale matching, a single level is used and the recall of epochs associated with the nearest laryngograph differences by the Gaussian wavelet is about 30% more than by the spline wavelet, across different types of noise degradation. However, the spline wavelet has less false alarm (29% on average) in identification and the peaks correspond well to epoch positions (with less [standard] deviation). We evaluated detection schemes using both scalograms of Gaussian and spline wavelets and achieved improvement of recall (26%), with a relative position consistency of 1.4 ms.
- Published
- 2002
- Full Text
- View/download PDF
10. Inference of letter-phoneme correspondences with pre-defined consonant and vowel patterns
- Author
-
Robert W. P. Luk and Robert I. Damper
- Subjects
Consonant ,business.industry ,Computer science ,Speech recognition ,Pattern recognition ,Speech synthesis ,computer.software_genre ,Vowel ,Stress (linguistics) ,Artificial intelligence ,business ,Hidden Markov model ,computer ,Word (computer architecture) - Abstract
The authors describe the automatic inferencing of letter-phoneme correspondences with predefined consonant and vowel patterns, which imply a segmentation of the word in one domain. The technique obtains the maximum likelihood (ML) alignment of the training word, and correspondences are found according to where the segmentation projects onto the ML alignment. Here, the phoneme strings were segmented depending on the number of consonant phonemes preceding or following the vowel phoneme. Sets of correspondences were evaluated according to the performance obtained when they were used for text-phonemic alignment and translation. The number of correspondences inferred was too large to evaluate using Markov statistics. Instead, hidden Markov statistics were used, where the storage demand is further reduced by a recording technique. Performance improves significantly as the number of consonants included in the pattern is increased. The performance of correspondences with predefined V.C* patterns was consistently better than with C*.V patterns. >
- Published
- 1993
- Full Text
- View/download PDF
11. Inference of letter-phoneme correspondences by delimiting and dynamic time warping techniques
- Author
-
Robert W. P. Luk and Robert I. Damper
- Subjects
Dynamic time warping ,Computer science ,business.industry ,Inference ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Speech synthesis ,Pattern recognition ,Pronunciation ,computer.software_genre ,Translation (geometry) ,Spelling ,Set (abstract data type) ,Euclidean distance ,Computer Science::Sound ,Artificial intelligence ,business ,computer ,Word (computer architecture) ,Natural language processing - Abstract
An algorithm for inferring correspondences between letters and phonemes from a large set of word spellings and their associated phonemic forms is described. The algorithm uses two techniques to infer correspondences: delimiting and dynamic time warping (DTW). The first technique delimits the part of the word spelling and pronunciation that cannot be aligned with the existing set of correspondences. The second technique derives correspondences from the delimited part of that word. The inferred correspondences are evaluated in terms of translation performance tested with unseen words, proper names and novel words. The translation performance is compared with those obtained using the manually driven correspondences as the benchmark. Nonparametric statistical tests are used to establish whether the performances of inferred correspondences are significantly different from the manually derived correspondences. >
- Published
- 1992
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.