David Lillis: HOTAIR

Introduction

My MSc thesis was based on work I performed as part of the HOTAIR (Highly Organised Team of Agents for Information Retrieval) project, which was affiliated with the IIRG (Intelligent Information Retrieval Group) in UCD.

Project Overview

Information Overload is a well-recognised problem that applies not just to the Internet, but also internally within large organisations which maintain vast distributed information archives that are often difficult to search in a cohesive fashion. The recently proposed Agent-Oriented Software Engineering (AOSE) paradigm [ 12 ] promotes the use of agent technologies [ 10 ] in the construction of complex software systems. This approach is considered most appropriate for problem domains in which data, control, expertise, or resources are distributed; and where agents provide a natural metaphor for delivering system functionality [ 11 ]. The use of agents is further enhanced by a number of recent initiatives, such as Autonomic Computing by IBM [ 1 ][ 2 ] and Proactive Computing by Intel [ 3 ], which are aimed at managing the complexity that is often inherent in modern software systems. These initiatives offer a view of a new generation of agent-based software systems whose architectures include an autonomic backbone of software agents that deliver a range of services that support self-optimisation, self-configuration, self-healing, and self-protection. Given the distributed nature of the today's information systems, and the industry-recognised need for software solutions that are able to manage themselves, it is our belief that agent technologies will play a significant role in the delivery of the next generation of search architectures.

The Highly Organised Team of Agents for Information Retrieval (HOTAIR) project seeks to develop an Enterprise Search Application that is underpinned by a robust and extensible agent-based search engine architecture, which seamlessly integrates multiple Information Retrieval techniques to rapidly deliver high-quality search results. Specifically, this proposal aims to develop a proof-of-concept demonstrator that illustrates the core techniques that will underlie this architecture and to evaluate them within the context of the TREC (Text REtrieval Conference) Information Retrieval benchmarks [ 13 ]. Central to this architecture is the concept of dynamic voting schemes for integrating result-sets from multiple Information Retrieval (IR) algorithms [ 14 ]. Within the architecture, a number of agents, known as the Panel of Experts (PoE), will encapsulate various IR algorithms. Each agent will generate an independent result set comprising of the most relevant documents to the user's query. Through negotiation, these experts will synthesise these results into a single result set. This will be achieved through the use of a dynamic weighting scheme that is customised to individual user preferences based upon a combination of implicit and explicit user feedback.

The HOTAIR architecture will be implemented using Agent Factory, a four-layer framework, developed by Collier, that delivers structured support for the development and deployment of agent-oriented applications [ 6 ][ 7 ][ 8 ][ 9 ]. In addition, a number of pre-existing IR and Indexing libraries will be adapted for use in the architecture. These libraries have been developed by the UCD Intelligent Information Retrieval Group (IIRG), of which Dunnion and Toolan are senior members. Finally, the autonomic backbone of the architecture will be delivered through the refinement and implementation of a number of Autonomic Computing techniques that have been designed for agent-based applications [ 4 ][ 5 ].

For the evaluation of the HOTAIR Search Architecture, we intend to use the TREC collection, which is comprised of a document dataset and a set of benchmarks for analysing the performance of IR systems. The use of TREC will enable us to easily compare our architecture's performance to other IR systems. It is also envisaged that one means of evaluating the scalability of the architecture is to take part in the TREC task called "Terabyte Collection" which involves running techniques over a terabyte of documents. This call begins in February 2005 for completion by September 2005. We believe that our architecture will handle the large volume of data extremely well.

References:-

  1. Horn, P. (2001), Autonomic Computing: IBM's Perspective on the State of Information Technology, URL: http://www.research.ibm.com/autonomic/manifesto/.
  2. Kephart, J.O., and Chess, D.M. (2003), The Vision of Autonomic Computing, IEEE Computer Magazine, January 2003.
  3. Want, R., Pering, T., and Tennenhouse, D. (2003), Comparing autonomic and proactive computing, in IBM Systems Journal, Vol. 42, No.1.
  4. Skerritt, R., Bustard, D.W., Towards and Autonomic Computing Environment, in Proceedings of 14th International Workshop on Database and Expert Systems Applications (DEXA '03), Prague, Czech Republic, September 2003.
  5. Collier, R., O'Grady, M. J., O'Hare, G.M.P, Muldoon, C., Phelan, D., Strahan, R., Tong, Y., (2004), Self-Organisation in Agent-Based Mobile Computing, Proceedings of the 2nd International Workshop on Self-Adaptive and Autonomic Computing Systems (SAACS 04), Zaragoza, Spain, August 30th-September 4th.
  6. Collier, R., Rooney, C., O'Hare, G.M.P., (2004), A UML-based Software Engineering Methodology for Agent Factory, Proceedings of the 16th International Conference on Software Engineering and Knowledge Engineering (SEKE-2004), Banff, Alberta, Canada, 20-25th June.
  7. Ross, R., Collier, R., O'Hare, G.M.P., (2004), AF-APL - Bridging Principles & Practice in Agent-Oriented Languages, Proceedings of the 2nd International Workshop on Programming Multi-Agent Systems Languages and Tools (PROMAS-2004), New York, 19-20th July.
  8. Collier, R.W., O'Hare G.M.P., Lowen, T., Rooney, C.F.B. (2003), Beyond Prototyping in the Factory of the Agents, 3rd Central and Eastern European Conference on Multi-Agent Systems (CEEMAS'03), Prague, Czech Republic.
  9. Rooney, C.F.B., Collier, R.W., O'Hare, G.M.P., VIPER: Visual Protocol Editor, in Proceedings of the 6th International Conference on Coordination Models and Language (COORDINATION 2004), Pisa, Italy, 24-27 February, 2004.
  10. Wooldridge, M. and Jennings, N.R. (1995), Intelligent Agents: Theory and Practice, Knowledge Engineering Review 10(2).
  11. Wooldridge, M. and Jennings, N.R. (1998), Pitfalls of Agent-Oriented Development, in K. P. Sycara and M. Wooldridge, editors: Agents '98: Proceedings of the Second International Conference on Autonomous Agents, ACM Press.
  12. Jennings, N. R. (2000), On agent-based software engineering, in Artificial Intelligence 117, pp 277-296.
  13. NIST Special Publication SP 500-255: Proceedings of the 12th Text Retrieval Conference (TREC). 2003
  14. Baeze-Yates, R., and Ribeiro-Neto, B., (1999), Modern Information Retrieval, Addison-Wesley, ISBN 020139829X.

People