The evaluation of conversational agents or chatterbots question answering systems is a major research area that needs much attention. Before the rise of domain-oriented conversational agents based on natural language understanding and reasoning, evaluation is never a problem as information retrieval-based metrics are readily available for use. However, when chatterbots began to become more domain specific, evaluation becomes a real issue. This is especially true when understanding and reasoning is required to cater for a wider variety of questions and at the same time to achieve high quality responses. This paper discusses the inappropriateness of the existing measures for response quality evaluation and the call for new standard measures and related considerations are brought forward. As a short-term solution for evaluating response quality of conversational agents, and to demonstrate the challenges in evaluating systems of different nature, this research proposes a blackbox approach using observation, classification scheme and a scoring mechanism to assess and rank three example systems, AnswerBus, START and AINI., {"references":["J. Lin, V. Sinha, B. Katz, K. Bakshi, D. Quan, D. Huynh, and D. Karger,\n\"What Makes a Good Answer? The Role of Context in Question\nAnswering,\" presented at.the 9th International Conference on Human-\nComputer Interaction, 2003.","L. Hirschman and R. Gaizauskas., \"Natural Language Question\nAnswering: The View from Here,\" Natural Language Engineering, vol.\n7, pp. 275-300, 2001.","U. Hermjakob, \"Parsing and Question Classification for Question\nAnswering,\" presented at the ACL Workshop on Open-Domain\nQuestion Answering, 2001.","Z. Zheng, \"Developing a Web-based Question Answering System,\"\npresented at.the 11th International Conference on World Wide Web,\n2002a.","C. Kwok, D. Weld, and O. Etzioni, \"Scaling Question Answering to the\nWeb,\" ACM Transactions on Information Systems, vol. 19, pp. 242-262,\n2001.","P. Zweigenbaum, \"Question Answering in Biomedicine,\" presented at\nthe 10th Conference of the European Chapter of the Association for\nComputational Linguistics, 2003.","H. Chung, K. Han, H. Rim, S. Kim, J. Lee, Y. Song, and D.Yoon, \"A\nPractical QA System in Restricted Domains,\" presented at the ACL\nWorkshop on Question Answering in Restricted Domains, 2004.","F. Benamara, \"Cooperative Question Answering in Restricted Domains:\nthe WEBCOOP Experiment,\" presented at the ACL Workshop on\nQuestion Answering in Restricted Domains, 2004.","F. Benamara and P. Saint-Dizier, \"Advanced Relaxation for Cooperative\nQuestion Answering,\" in.New Directions in Question Answering: MIT\nPress, 2004.\n[10] W. Wong, O. S. Goh, M. I. Desa, and S. Sahib, \"Online Cyberlaw\nKnowledge Base Construction Using Semantic Network,\" presented at\nInternational Conference on Computational Intelligence for Modelling,\nControl and Automation, Rhodes, Greece, 2004.\n[11] O. S. Goh, C. C. Fung, and M. P. Lee, \"Intelligent Agents for an\nInternet-based Global Crisis Communication System,\" Journal of\nTechnology Management and Entrepreneurship, vol. 2, pp. 65-78, 2005.\n[12] B. Katz and J. Lin, \"START and Beyond.,\" presented at the 6th World\nMulticonference Systemics, Cybernetics and Informatics, 2002.\n[13] B. Katz, \"Annotating the World Wide Web using Natural Language,\"\npresented at the 5th Conference on Computer Assisted Information\nSearching on the Internet., 1997.\n[14] D. Moldovan, M. Pasca, M. Surdeanu, and S. Harabagiu., \"Performance\nIssues and Error Analysis in an Open-Domain Question Answering\nSystem,\" presented at the 40th Annual Meeting of the Association for\nComputational Linguistics, 2002.\n[15] J. Allen, D. Byron, M. Dzikovska, G. Ferguson, L. Galescu, and A.\nStent, \"Towards conversational human-computer interaction,\" AI\nMagazine, vol. 22, 2001.\n[16] J. Cassell, \"Embodied Conversation: Integrating Face and Gesture into\nAutomatic Spoken Dialogue Systems,\" in Spoken Dialogue Systems,\nLuperfoy, Ed.: MIT Press, to appear.\n[17] R. J. Lempert, S. W. Popper, and S. C. Bankes, Shaping the next one\nhundred years: new methods for quantitative, long-term policy analysis.\nSanta Monica, CA.: RAND, 2003.\n[18] O. S. Goh and C. C. Fung, \"Automated Knowledge Extraction from\nInternet for a Crisis Communication Portal,\" in First International\nConference on Natural Computation. Changsha, China: Lecture Notes in\nComputer Science (LNCS), 2005, pp. 1226-1235.\n[19] J. A. Fodor, Elm and the Expert: An Introduction to Mentalese and Its\nSemantics: Cambridge University Press, 1994.\n[20] R. A. Brooks, \"The Cog Project: Building a Humanoid Robot,\"\npresented at The 1st International Conference on Humanoid Robots and\nHuman friendly Robots, Tsukuba, Japan, 1998.\n[21] O. S. Goh, A. Depickere, C. C. Fung, and K. W. Wong, \"Top-down\nNatural Language Query Approach for Embodied Conversational\nAgent,\" presented at the International MultiConference of Engineers and\nComputer Scientists 2006, Hong Kong, 2006.\n[22] M. King, \"Evaluating Natural Language Processing Systems,\"\nCommunications of the ACM., vol. 39, pp. 73-79, 1996.\n[23] E. Voorhees, \"Overview of TREC 2003,\" presented at the 12th Text\nRetrieval Conference, 2003.\n[24] J. Facemire, \"A Proposed Metric for the Evaluation of Natural Language\nSystems,\" presented at the IEEE Energy and Information Technologies\nin the Southeast,, 1989.\n[25] G. Guida and G. Mauri, \"A Formal Basis for Performance Evaluation of\nNatural Language Understanding Systems.,\" Computational Linguistics.,\nvol. 10, pp. 15-30, 1984.\n[26] A. Srivastava and V. Rajaraman, \"A Vector Measure for the Intelligence\nof a Question-Answering (Q-A) System,\" IEEE Transactions on\nSystems: Man and Cybernetics., vol. 25, pp. 814-823, 1995.\n[27] J. Allen, Natural Language Understanding: Benjamin/Cummins\nPublishing, 1995.\n[28] E. Nyberg and T. Mitamura, \"Evaluating QA Systems on Multiple\nDimensions,\" presented at the Workshop on QA Strategy and Resources,\n2002.\n[29] A. Diekema, O. Yilmazel, and E. Liddy., \"Evaluation of Restricted\nDomain Question-Answering Systems,\" presented at the ACL Workshop\non Question Answering in Restricted Domains, 2004.\n[30] M. Maybury, \"Toward a Question Answering Roadmap,\" presented at\nthe AAAI Spring Symposium on New Directions in Question\nAnswering, 2003.\n[31] Z. Zheng, \"AnswerBus Question Answering System,\" presented at the\nConference on Human Language Technology, 2002b.\n[32] B. Katz, S. Felshin, and J. Lin, \"The START Multimedia Information\nSystem: Current Technology and Future Directions,\" presented at the\nInternational Workshop on Multimedia Information Systems, 2002."]}