Ryan Chute (1), Luydimilla Balakireva (2), Herbert Van de Smpel (3) Digital Library Research & Prttyping Team Research Library Ls Alams Natinal Labratry (1) rchute@lanl.gv (2) ludab@lanl.gv (3) herbertv@lanl.gv Acknwledgments: Jeren Bekaert, Patrick Hchstenbach, Henry Jerez, Xiaming Liu
The adore Prject: Backgrund Initial mtivatin: Severe deficiencies in the infrmatin discvery envirnment develped fr the LANL Research Library: - Metadata-centric: descriptive metadata recrds first class citizens; actual digital assets auxiliary data. - Tens f millins f digital assets stred as files in file system. - Tight integratin between applicatin cntent cllectin and discvery, preventing ther applicatins frm leveraging the rich cntent base. Obvius slutin: - Replace metadata-centric apprach by cmpund bject apprach. - Bundle digital assets int strage cntainers that dramatically reduce the amunt f files in a file system. - Cleanly separate strage repsitry frm applicatins that leverage the stred assets by prviding necessary machine interfaces. Implementatin f the bvius slutin led t the adore R&D prject 2003-2007
The adore Prject: Majr Drivers Cncrete need t design and implement a slutin t ingest, stre, access the vast and grwing cllectin f the LANL Research Library. Scale, scale, scale! Existing pen surce slutins (at that time) did nt meet ur scale requirements - e.g. static binding f disseminatrs t bjects in Fedra. Interest in repsitry interperability, cf. invlvement in OAI- PMH, NISO OpenURL, OAI-ORE Interest in digital preservatin, cf. Natinal Digital Infrmatin Infrastructure and Preservatin Prgram (NDIIP) funding
The adore Prject: Majr Design Principles Leverage existing standards and technlgies t make develpment and migratin mre straightfrward. Read: Laziness as a strategy Use a distributed, cmpnent based apprach t meet challenges f scale. Use Digital Objects, Datastreams, and Surrgate abstractins t characterize cntent. Facilitate a unifrm manner fr client applicatins t discver and access cntent bjects available in a grup f distributed repsitries. Prvide single repsitry behavir fr a grup f distributed repsitries.
The adore Federatin: Cntent Objects
The adore Federatin: Cntent Objects Cntent is characterized int three types f Cntent Objects: Digital Objects - an identified aggregatin f ne r mre Datastreams and prperties pertaining t Datastreams and t the aggregatin itself. Datastreams - a retrievable bitstream, f any media type, made available by a repsitry t the federatin. Surrgates - the serializatin f a Digital Object int a machinereadable representatin that is made accessible by a repsitry. Supprts Multiple Cmplex Digital Object Serializatin Frmats (e.g. MPEG-21 DIDL, METS, ORE Atm/XML, ORE RDF/XML)
The adore Federatin: Architecture A 3-Tier architecture fr the federatin f distributed repsitries: Tier 1: the adore repsitries Netwrked systems that hst digital bject cntent and that make that cntent accessible by expsing cre service interfaces. Currently XMLtapes and ARCfiles (adore Archive) Other Cntent Management Systems can be turned int an adore repsitry by implementing the cre service interfaces. Tier 2: the adore federatin management Netwrked systems that facilitate presenting the adore repsitries as a single lgical repsitry; these federatin cmpnents expse cre service interfaces t allw access t their cntent. Federatin cmpnents are: Identifier Lcatr, Service Registry, Frmat Registry, Semantic Registry Tier 3: the adore frnt-ends Netwrked systems that make digital bject cntent hsted in the multitude f physical adore repsitries accessible by expsing cre services interfaces that present thse adore repsitries as a single lgical repsitry adore frnt-ends are: OAI-PMH Federatr, OpenURL Reslver
adore Federatin
The adore Federatin sftware Available tday at: http://african.lanl.gv/adore/prjects/adrefederatin Released under GNU LGPL Open Surce Sftware License This is a majr update t the adore Archive: Updates the Tier-1 adore Archive Implements the 3 Tiers f the architecture instead f nly Tier-1 Use Cases Large cllectins f relatively stable bjects A plug-in strage cmpnent fr Institutinal Repsitry and Archive slutins
The adore Archive @ LANL, August 23, 2008 In prductin at LANL Research Library fr ver 1 year 90,000,000 Cmpund Digital Objects 216,000,000 Stred bitstreams ~ 10,000 autnmus repsitries: ~ 4,500 XMLtapes: XML serializatins f Digital Objects ~ 5,500 ARCfiles: bitstreams > 617,000,000 identifiers
The adore Prject: Current / Future Wrk djatka - An Open Surce JPEG 2000 Image Server Cmpressin f JPEG 2000 files using the Kakadu JPEG 2000 Library Dynamic extractin f reslutins and Regins frm JPEG 2000 files; Supprt fr a rich set f input/utput frmats (e.g. BMP, GIF, JPG, PNG, PNM, TIF, JPEG 2000). Extensible interfaces t request image services and manipulatins (e.g. watermarking); A rich service framewrk, based n the OCLC OpenURL Reslver, t facilitate the transfer f service parameters via an OpenURL cmpliant HTTP GET request. Pluggable as an adore Disseminatr Service Release: Mid-Sept. 2008 adre-searcher - A SOLR-based search engine fr adore repsitries Currently used in prductin adore instance (> 90M bib recrds) Release:?
LANL Research Library Installatin