Li, Baichuan (author.), Lyu, Michael R. (thesis advisor.), Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering, (degree granting institution.), Li, Baichuan (author.), Lyu, Michael R. (thesis advisor.), and Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering, (degree granting institution.)
社區問答服務如雅虎知識+和百度知道為大規模的使用者提供按需問答服務。近年來,隨著社區內大量增加的問題,社區問答服務在問題解決和知識學習的效率上面臨著不小的挑戰。為了方便回答者找到合適的問題,幫助提問者更高效的獲取資訊,本論丈提出了個社區問答服務中問題處理的計算框架。, 該計算框架中包含三部分:流行度分析與預測,路由,以及結構化。第一部分分析了影響問題流行度的因素,發現使用者和話題的交集造成了不同的問題流行度。基於這個發現,我們提出了個基於相互增強的標籤傳播演算法,利用問題交本和提問者簡介預測問題流行度。實驗結果證明提出的演算法比先進的基準方法更能區別高流行度和低流行度的問題。, 第二部分目的在於把新提出的問題路由給潛在的回答者。我們提出的問題路由框架考慮了回答者的專業知識和可用性。為了估算回答者的專業知識,我們提出了三個模型。第一個模型來源於查詢詞相似語言模型,之後的兩個模型通過加入回答品質進步優化第一個模型。對於估計回答者的可用性,我借助了一個自回歸模型。實現結果證明引入答案品質顯著提高了問題路由的效果。此外,利用相似回答者在相似問題上的答案品質可以做出更準確的回答者專業知識預測,準而提高路由性能。回答者的可用性估計則進一步提高了路由的效果。, 在問題路由中,回答者專業知識估計起到了至關重要的作用。然而目前的方法使用全部的簡介去對所有回答者進行估計,效率不高又費時。為了解決這個問題,我們借助問題所在的類別構建了類別-回答者索引來過濾不相關的回答者,並提出了類別敏感的語言模型來估計使用者專業知識。實驗結果說明了:一,類別-回答者索引極大縮小了相關回答者的範圈,降低了計算時間:二,類別敏感的語言模型相比現今的基準方法,可更準確估計回答者專業知識。, 在框架的第三個部分,我們提出了個新穎的基於分層實體的方法結構化社區問答服務中的問題。由於大量文檔的存在,傳統的基於清單的問題組織在內容流質和知識學習上效率低下。為了解決這個問題,我們利用大規模的實體庫,構建了個三步框架把問題結構化到“實體樹"中。實驗結果反映了該框架的有效性。我們進步從使用者和系統兩方面評價實體樹在組織知識上的表現。在用戶層面上,用戶調查表明,使用者在基於實體樹的問題組織上知識學習的表現比基於列表的顯著提高。在系統層面上,實體樹通過再排序明顯提高了系統的問題搜索效果。, 概括起來,該論丈在概念框架和實證基礎兩方面為社區問答中的問題處理做出了貢獻。, Community Question Answering (CQA) services, such as Yahoo! Answersand Baidu Zhidao, provide a platform for a great number of users to ask and answer for their own needs. In recent years, the efficiency of CQA services for question solving and knowledge learning, however, is challenged by a sharp increase of questions raised in the communities. To facilitate answerers access to proper questions and help askers get information more efficiently, in this thesis we propose a computational framework for question processing in CQA services., The framework consists of three components: popularity analysisand prediction, routing, and structuralization. The first componentanalyzes the factors affecting question popularity, and observes that the interaction of users and topics leads to the difference of question popularity. Based on the findings, we propose a mutual reinforcement-based label propagation algorithm to predict question popularity using features of question texts and asker profiles. Empirical results demonstrate that our algorithm is more effective in distinguishing high-popularity questions from low-popularity ones than other state-of-the-art baselines., The second component aims to route new questions to potential answerers in CQA services. The proposed question routing (QR) framework considers both answerer expertise and answerer availiability. To estimate answerer expertise, we propose three models. The first one is derived from the query likelihood language model, and the latter two models utilize the answer quality to refine the first model. To estimate answerer availability, we employ an autoregressive model. Experimental results demonstrate that leveraging answer quality can greatly improve the performance of QR. In addition, utilizing similar answerers’ answer quality on similar questions provides more accurate expertise estimation and thus gives better QR performance. Moreover, answerer availability estimation further boosts the performance of QR., Expertise estimation plays a key role in QR. However, current approaches employ full profiles to estimate all answerers’ expertise, which is ineffective and time-consuming. To address this problem, we construct category-answerer indexes for filtering irrelevant answerersand develop category-sensitive language models for estimating answerer expertise. Experimental results show that: first, category-answerer indexes produce a much shorter list of relevant answerers to be routed, with computational costs substantially reduced; second, category-sensitive language models obtain more accurate expertise estimation relative to state-of-the-art baselines., In the third component, we propose a novel hierarchical entity based approach to structuralize questions in CQA services. Traditional list-based organization of questions is not effective for content browsing and knowledge learning due to large volume of documents. To address this problem, we utilize a large-scale entity repository, and construct a three-step framework to structuralize questionsin “cluster entity trees (CETs). Experimental results show the effectiveness of the framework in constructing CET. We further evaluate the performance of CET on knowledge organization from both user and system aspects. From a user aspect, our user study demonstrates that, with CET-based organization, users perform significantly better in knowledge learning than using list-based approach. From a system aspect, CET substantially boosts the performance on question search through re-ranking., In summary, this thesis contributes both a conceptual framework and an empirical foundation to question processing in CQA services., Detailed summary in vernacular field only., Li, Baichuan., Thesis (Ph.D.) Chinese University of Hong Kong, 2014., Includes bibliographical references (leaves 138-161)., s also in Chinese., http://library.cuhk.edu.hk/record=b6115585, Use of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)