MW-photo
April 9-12, 2008
Montréal, Québec, Canada

The National Museums Online Learning Project Federated Collections Search: Searching Across Museum And Gallery Collections In An Integrated Fashion

Terry Makewell, Victoria & Albert Museum, United Kingdom

Abstract

The National Museums Online Learning Project has been developed by a consortium of 9 national museums and galleries within the UK and is a 3-year project funded by Treasury. The purpose of the project is to get the vast amount of content already on these national museum and gallery Web sites better used through the creation of on-line resources. To enable the Web sites to be better used, the outcomes will be based on the partner Web sites within a peer-to-peer environment rather than a central portal. The enabling of end-users to search efficiently across the partner collections was deemed crucial for these on-line resources. Substantial technical resources have gone into investigating the federated search.

Each possible solution has different strengths and weaknesses. This paper addresses the concepts and questions around different types of federated searching, and debates the models and possible solutions which best suit this project and the sector as a whole. It is hoped that the processes and models developed will go some way to providing future frameworks for searching across national museum collections.

Keywords: Searching, collections, federated, collaboration, Opensearch, OAI

Introduction

The National Museums Online Learning Project has been developed by a consortium of national museums and galleries within the United Kingdom and is a 3-year project funded by Treasury. The institutions involved in the project are the British Museum, Imperial War Museum, Natural History Museum, National Portrait Gallery, Royal Armouries Museum, Sir John Soane's Museum, Tate, Wallace Collection and the Victoria & Albert Museum (the lead partner).

The museums and galleries collaborating on the National Museums Online Learning Project each have different technical set-ups. The majority have different Content Management Systems and different Collection Management Systems. The sizes of the organisations also range from around 50 to around  1,000 employees. Each partner has a wealth of content, ranging from Rodin sculptures through to King Henry VIII’s armour, via William Morris textiles. Being able to search across all of the available on-line resources in an intuitive manner was deemed crucial to enabling a cohesive user experience for the Web resources that are being created as part of the project.

If we were to envision what we were trying to do outside of the Web sphere, then a particularly beneficial comparison would be to department stores. Back in the echelons of time when people went shopping, they had to visit multiple shops to obtain all that they required, walking up and down the high street visiting many different shops. The introduction of department stores enabled people, or customers, to be able to go into one shop and get, in one place, all the furniture, food, electronics or other goods they required. Using this model it's clear to see how creating a 'one-stop-shop' for obtaining resources and objects from each of the partners would clearly benefit users of this project.

The aims of the project are to:

  • Increase the use of the existing digital collections of the 9 partners, without adding new content (e.g. more digitisation) or creating a new Web site or museum portal
  • Encourage a more creative and critical use of partners’ on-line resources
  • Increase confidence levels in the use of collections
  • Create sustainable high-quality on-line resources for each of the participating partners
  • Strengthen the partnership between the participating institutions & develop collaborative working models between partners.

It has been decided to do this through WebQuests and Creative Journeys.

WebQuests

WebQuests are guided, open-ended activities which are based around topics and subject areas mapped to the English national curriculum. At least 3 different partners collaborate on each WebQuest, with objects coming from across the partner collections. The project will create 100 WebQuests.

WebQuests were invented by Bernie Dodge and Tom March at San Diego State University in 1995. Dodge (1995) defines a WebQuest as "an inquiry-oriented activity in which some or all of the information that learners interact with comes from resources on the internet". While the working title of this part of the project is called WebQuests, the structure and layout of the WebQuests created by the project has been significantly modified from the original model although the underlying educational methodology remains the same.

Creative Journeys

Creative Journeys aim to show how people have made use of museum and gallery objects in their own personal activities – in sculpting, designing clothes or writing novels, for example. They give users an opportunity to personalise their relationship with the collections by using new technologies to record and share their journeys with each other. From this they increase user confidence and  give users skill-building activities to encourage and support their participation.

With the development of ideas for both these resources, what became obvious to the project team and partners very early on was that the users, whether formal or lifelong learners, would need a flexible and intuitive process through which they could search across the partner collections to obtain the virtual objects they required. The option of having nine search boxes, or even nine pop-up windows, was deemed to be 'too 1990s'. Ensuring a good user experience is paramount to any successful Web project and central to this project.

Deep and Dark

The history of searching across the Internet is wide and varied, and any in-depth review or analysis of it is outside the scope of this paper. But it is important to give a brief background to enable an understanding of how recent this technology is. Gerard Salton is commonly known as the father of modern searching;  his teams at Harvard and Cornell developed the SMART informational retrieval system. Searching is still based upon many of the concepts he developed and tests he carried out, including the Term Frequency (TF) and relevancy feedback mechanisms (Salton, 1987). These concepts are relevant for the federated search for the project. From here we can jump forward a few years to when Larry met Sergey at Stanford in 1995. Three years, later their search engine called BackRub became Google. Modern Internet searching was born, and this is the environment we currently find ourselves in.

But how do search engines work? The majority of them create their indices by spidering Web pages. More often a page must be static and linked to other pages to be discovered. Deep Web resources are not seen or retrieved. These deep Web resources take many forms different from private Web sites that require a username and password to non-html content in formats not handled by search engines (Bergman, 2001). In the case of this project, the deep Web refers to dynamic content (pages which are created dynamically) and unlinked content (stand alone pages). A certain percentage of the partner Collection Management Systems are hidden in the deep Web since the content is stored in searchable databases that return results dynamically. Determining how we were going to access this content was a major challenge of the project. We needed to come up with a way of retrieving, qualifying and organising both ‘deep’ and ‘surface’ content.

These two processes of searching for ‘deep’ and ‘surface’ content is shown in figure 1 (reworked from a diagram by Bergman, 2001). The first process shows how the searching of the surface Web could be carried out by traditional search engines. The small and simple telescope can see and retrieve all of the near earth objects. These are the collection objects and Web pages that are based on the surface Web. The second process shows the type of data that would be available for use in the project if we could gain access into the deep Web contained within our partner collections – in this case represented by a direct link to the Hubble telescope. By combining these two methods, a wealth of collections data can be made available.

While resources from the main partner Web sites would be used within the project, it was more important that users gain access to object level searching – contained within the deep Web. As detailed above, traditional search engines can only access the surface Web. This is where the partner Web sites would reside. The on-line collection databases were lurking in the deep Web.

Figure 1

Fig 1:  Showing how partner collections could be accessed

Effectively searching across these different resources without having to visit each database individually meant that federated searching needed to be investigated. Peter Jasco (2004) stated that a federated search consists of:

  • Transformation of a query and broadcast of it to a group of disparate databases with the appropriate syntax
  • Merging of the results collected from the databases
  • Presentation of them in a succinct and unified format with minimal duplication
  • Provision of a means, performed either automatically or by the portal user, to sort the merged result set

Federated searching would allow for a "one stop searching shop". It would enable users to have a seamless experience with reduced time involved in searching for information and would take pressure off their having to decide which database/web site to search. For novice Web users or younger children, this would be invaluable. It would also allow for the less-accessed collections of some partners to have increased usage. One issue that was quickly identified was that with simpler versions of federated searches, the advanced search features are sometimes lost, and it is often hard to obtain relevancy ranking.

It was determined that a prototype needed to be developed to investigate the potential advantages and disadvantages of the chosen model. The project Technical Advisory Group analysed the possible solutions at the same time as determining the requirements of the prototype. It was deemed that the chosen technical solution must:

  • Provide an effortless, timely and cost effective implementation to ensure a lowering of barriers for partners
  • Allow for searching at both Web site and object/collections level across multiple partners
  • Allow for relevance ranking of results
  • Work through HTTP

Once the initial requirements were determined, we set about investigating the technology and processes that were being used, both inside and outside of the sector, for federated searching. The debates that arose regarding the different possible solutions proved to be engaging and thought provoking.

The Investigation

Throughout the investigation into which type of federated search to prototype, we had to integrate into the process certain rationales which were unique to the project. These included time scales and the resources available. The project is multi-partnered, and there are vast differences in terms of technical capability and capacity of the partners. It was critical to understand the extent of these differences and their potential impact when considering a federated search, since the technical work would have to be carried out with each of the different partners individually. However, an element of economy of scale would of course come into practice throughout the process. The National Museums Online Learning Project is the first time that these particular national museums within the United Kingdom have collaborated on a project. As such, the project is large and complex and requires a high level of management oversight to ensure that the best practices and procedures are used and a coherent and suitable project framework is utilised.

When investigating federated searching, it was decided that the correct choice of technology was imperative. This choice would have to be evaluated as to how it fitted in to the constraints of the project. It was apparent quite quickly that there were two main processes of federated searching which were applicable within the project. The first process involved collecting meta-data from each of the databases that needed to be searched and then storing them in a central repository. This central repository could then be searched. When users found the meta-data of the required results item, they would be able to access the full record on the partner Web site. The second process involved utilising all of the partners’ current infrastructures and search procedures by sending a single request out to all of them and then aggregating the results to return to the user. Both of these approaches have their advantages and disadvantaged when aligned to the requirements of the project. We decided to investigate both of these processes further before making any final decision.

Open Archives Initiative

The first type of federated searching we investigated was the Open Archives Initiative Protocol for Meta-data Harvesting (OAI-PMH). OAI (as it is commonly known) is a process through which meta-data can be stored and accessed over the Internet. The processes that are set up collect the meta-data descriptions of the records into a central repository, and services are then built around this meta-data. This open standard supports any meta-data schema, with the base schema being Dublin Core. Figure 2 details the process which we studied for implementation of OAI within the project. First, the user enters a search term into the federated search. In the case of the project, the federated search would not reside in one place, but would be replicated across all the partner Web sites. The federated search application then relays this query to the meta-data repository. The objects in the repository are queried and the OAI results are returned to the origin of the search (in this case, one of the partner Web sites). The federated search application parses the results, and they are displayed to the user. The user then selects the required result and is taken to the full collection object record on the specific partner Web site.

This approach has been implemented with the People's Network Discover Service (http:/www.peoplesnetwork.gov.uk/discover/) in the United Kingdom and has been well documented at numerous museum conferences. This same repository has also been implemented in the Exploring 20th Century London project (http://www.20thcenturylondon.org.uk/). Cole et al (2002) detail the processes and the lessons learned during the creating of OAI-compliant meta-data at the University of Illinois Library. Zilber and Marsh (2007) further expand this to detail how their project used OAI to show how “a deep understanding of users’ thought processes, combined with appropriate meta-data standards and an effective meta-data system, can expose content providers’ resources in a manner that makes them truly accessible and useful”. The complexities of implementing OAI within any sizeable project are well documented, and this learned data was fully analysed and utilised at the evaluation stage. This includes all content providers mapping their data sets to a common standard and the implementation of the harvesting methods.

Figure 1

Fig 2:  Showing how OAI could be implemented in the project

Opensearch

The second type of federated searching we investigated was Opensearch. Opensearch is a light-weight Web protocol which was originally developed by Amazon.com as part of the A9.com project which went live in 2004. Since then, Opensearch has moved to a community process at www.opensearch.org. Opensearch allows users to search through distributed databases and Web sites through one interface without visiting each site. Importantly, it would make use of the current searching mechanisms implemented on each partner site. From a central search interface, queries are sent off to the separate Web sites or databases. These are then queried through their current search procedures and the results sent back via an RSS or Atom feed. Figure 3 shows the process which we studied concerning the possible implementation of Opensearch within the project, as detailed above.

Opensearch has not been implemented within the sector as much as OAI, although the Powerhouse museum in Australia has implemented it in conjunction with their collections search. Chan (2007) details this process and how search results can be fed to users without needing to harvest records. These processes have also been implemented on a larger scale with the Collections Australia Network (http://www.collectionsaustralia.net).

Figure 3

Fig 3:  Showing how Opensearch could be implemented in the project

Another Option: SRU

Closely related to Opensearch, SRU (search/retrieve via URL) is a similarly lightweight protocol, much of which is derived from Z39.50. This older protocol is widely used by libraries and was first developed in the 1970s. With the advent of the Internet, and the development of XML, this older protocol could now be used over the Web. As a result, the best parts of the protocol were taken over and developed into SRU.

SRU has the capability to understand more of the search target than Opensearch. Opensearch simply enables keyword searches. The strength of SRU is in searching across the unstructured servers and files that a multi-partnered project such as this contains. It also allows for the use of advanced searching.

Evaluation

The possible processes through which a federated search could be carried out can be seen as on a continuum from simple technology with a simple interface through to the use of complicated technology with an advanced interface. More often than not, when implementing digital projects, the more complicated technology and interface is deemed appropriate for differing reasons.

When evaluating the most appropriate solution for the project we examined the processes which each partner would have to undertake to implement the federated search. For OAI, each partner would be a data provider and support the OAI as a means of exposing meta-data from their collections. Each partner would also be a service provider and use the meta-data harvested via OAI for searching across the collections. Each partner is also a service provider because the project resources and solutions are spread across the partner Web sites and each would need to be able to run the federated search independently. Meta-data, representing a skim of the collections, would be stored in a central repository which would then be made searchable. This would involve a significant amount of work for each partner, from scoping through to actual implementation. If the decision were taken to implement OAI, this would mean that one of the project partners would need to host the repository to store the meta-data. An external partner could also be used, but this would introduce an unknown element into the mix. All collections data would still be stored on the partner Web sites but there would be a central repository with meta-data from the collections. This would mean one partner taking more of the load or working with a third party. This could be deemed to not fulfill the requirements of this project since the resources would no longer be shared throughout the original partnership.

For Opensearch, if a partner already had a collection search engine, then a micro-application would be used to return search results from that existing search in RSS/Atom with Opensearch response elements. In some cases the partners could use their current systems to serve out the feed rather than using an additional micro-application. An example is that numerous partners implement Lucene which provides the ability to do this via Nutch.

Only the partner museums would be involved in implementing and using Opensearch since there is no central repository. The processes would be much simpler and easier for the partners to implement since the current infrastructure would be utilised within a simple search interface. SRU would advance upon this, but would involve a higher level of internal working for the partners and thus a higher level of resource allocation on their part.

The requirements of the prototype, along with specific project constraints, led the Technical Advisory Group to the decision that Opensearch would be prototyped. Opensearch fulfilled all the necessary requirements of the prototype, such as working through HTTP as well as providing a simple implementation process in comparison to either OAI or SRU. Both OAI and SRU would enable object data to be used in a more integrated fashion, but, as mentioned before, due to the time, resources and cost involved, these were not feasible solutions for this project. Opensearch also answered the constraints and sustainability questions of the project. On completion, the resources that remain must be fully supportable and sustainable by the partners.

The National Museums Online Learning Project has a three-year timescale and is due for completion in March 2009. At the end of 2007, following a strict EU defined timetable for procurement, two digital agencies were brought onboard to facilitate the design and implementation of the two separate elements of the project – WebQuests and Creative Journeys. This included assistance with the development of the federated search (included as part of the WebQuest package). The introduction of the agencies marked a new phase in the project where the work of the previous year could start to be realised. A large amount of research and evaluation had already been carried out  into the possible technical solutions for the project as a whole – of which the federated search was only a small part.

Implementation

The next stage involved undertaking the actual implementation of the federated search prototype. The realisation of such a prototype would have to be correctly scheduled for each of the partners due to the development time involved and the working partnership element. Three partners agreed to take part in the prototype of the Opensearch implementation, led by the project technical manager. The prototype is under development, and it is planned that it will be delivered by April 2008. This will allow for the integration of this search into the project’s main resources –  WebQuests and Creative Journeys.

In planning the prototype, the main topics which needed direct attention included the lack of similar standards among the partners and the process through which the ranking of the search results would be carried out. There are different mechanisms of federated searching, and each of them allows for different levels of interactivity between the user and the collections. Opensearch does not provide the same possibilities as SRU or OAI at the level of advanced searching, but for the purposes of our project it fitted perfectly. Ease of implementation simply outweighed the option of advanced functionality.

The implementation of Opensearch within the project is the second known time that this technology has been used within the sector on such a scale – certainly the first time in the northern hemisphere. Historically the cultural sector has implemented more projects involving OAI for federated searching. Opensearch is a fairly new implementation of established technology which has not yet really filtered down into the sector. Within the remit of the project it has fitted perfectly to what we wanted to accomplish. As mentioned previously, the Powerhouse Museum in Australia has also carried out analysis in the area of Opensearch. It was in the initial phase of their project with CAN (Collections Australia Network) that it was decided to implement Opensearch.

Conclusions

The federated search provides a simple and intuitive process through which users are able to search for ‘deep’ and ‘surface’ Web objects across the partners. These collections objects will be employed within WebQuests and Creative Journeys from this search to great effect. The implementation of the federated search will contribute to the delivery of an extremely successful and ambitious project within the sector, providing users with a more accessible, direct and user-friendly route to the partner collections. By using open standards, the project partners will also be able to easily implement the technology they have learnt in other projects. This is an important element of the project – ensuring that the learning has been shared and the knowledge disseminated throughout the sector and beyond.

Following the completion of the federated search prototype in April 2008, a full analysis and evaluation will be carried out. If the prototype is deemed successful within the remit of the project, then a full federated search will be undertaken involving all partners. In parallel to the development of the federated search, the main resources of the project will be under development. The federated search will then be integrated into these resources.

The collaboration between the partners has allowed for successes to be shared and for knowledge to be reciprocated across and through networks which would not have existed otherwise. This atmosphere of collaboration on this multi-faceted project, of which the Opensearch implementation is only a small part, has allowed for knowledge to flow not only from the larger partners to the smaller ones, but also vise versa.

The production of a federated search across multiple partners would, until recently, have been an expensive undertaking. The implementation of an uncomplicated, scalable technology such as Opensearch will allow for even the smaller museum partners to comfortably undertake this process.  Utilising the current partner infrastructures provides a model of federated searching which does not create a propriety system available to only a few, with large set-up and maintenance costs. Its provides a simple and straight-forward process which will enable other partners to join in the future and create a larger national, or even international, museum and gallery collection which is fully accessible to all.

Acknowledgments

The project Technical Advisory Group, consisting of a representative from each of the Museums and Galleries within the consortium, has been fully involved in the process of selecting an appropriate search solution for this project.

References

Bergman, M.K. (2001). “The Deep Web: Surfacing Hidden Value”. The Journal of Electronic Publishing, August, 2001  Volume 7, Issue 1

Chan, S. “Tagging and Searching – Serendipity and museum collection databases”. In J. Trant and D. Bearman (eds). Museums and the Web 2007: Proceedings. Toronto: Archives & Museum Informatics, published March 31, 2007 at http://www.archimuse.com/mw2007/papers/chan/chan.html

Cole, T.W., J. Kaczmarek, P.F. Marty, C.J. Prom, B. Sandore, S. Shreeves (2002). “Now That We’ve Found the ‘Hidden Web,’ What Can We Do With It?: The Illinois Open Archives Initiative Meta-data Harvesting Experience”. In D. Bearman and J Trant (eds.). Museums and the Web 2002: Proceedings. Archives & Museums Informatics, 2002. Consulted Jan 3, 2008. http://www.archimuse.com/mw2002/papers/cole/cole.html

Dodge, B. (1995). “WebQuests: A technique for Internet-based learning”. Distance Educator, 1(2), 10-13.

Jasco, Peter (2004). “Thoughts about Federated Information Retrieval”. Information Today, 21(9), October 2004, p.17, 19.

Salton, Gerard (1987). “A Theory of Indexing”. Society for Industrial Mathematics, Jan 1, 1987, ISBN-13: 978-0898710151

Zilber, J. and J. Marsh (2007). Access.ca: “Social Studies Resources for Canadian Teachers”. In International Cultural Heritage Informatics Meeting (ICHIM07): Proceedings. J. Trant and D. Bearman (eds). Toronto: Archives & Museum Informatics, 2007. Published September 30, 2007 at http://www.archimuse.com/ichim07/papers/zilber/zilber.html

Cite as:

Makewell, T., The National Museums Online Learning Project Federated Collections Search: Searching Across Museum And Gallery Collections In An Integrated Fashion, in J. Trant and D. Bearman (eds.). Museums and the Web 2008: Proceedings, Toronto: Archives & Museum Informatics. Published March 31, 2008. Consulted http://www.archimuse.com/mw2008/papers/makewell/makewell.html