07/22/10

Metadata Blog Has Moved!

Please check out our new home: http://www.alcts.ala.org/metadatablog/

All future posts will appear on the new site, so please update your bookmarks and readers.

Permalink . kmarti . 08:54:12 pm . 28 Words . Metadata Blog . Email . 1353 views . Leave a comment

07/08/10

Metadata Interest Group Meeting ALA 2010: Linked Data

Metadata Interest Group Meeting
The Metadata Interest Group met on Sunday, June 28, and had two speakers.

Linked Data and Controlled Vocabularies on the Web
Rebecca Guenther, Library of Congress

Ms. Guenther described a project underway at the Library of Congress to provide access to the Library of Congress’s controlled vocabularies using the Resource Description Framework (RDF). First, she gave an overview of the controlled vocabularies and their uses. Controlled vocabularies control value, reduce ambiguity, provide for synonym control, allow for validation, and establish formal relationships among terms. They can be simple, like lists of enumerated lists (e.g., drop down menu) or complex, (e.g., full thesauri with multiple relationships). The Library of Congress (LC) maintains standards that contain controlled vocabularies, including:

  • LCSH/NAF
  • TGM
  • MARC controlled lists (e.g., ISO 639-2 – language codes)
  • MODS/METS/MIX/PREMIS controlled lists

Controlled vocabularies are currently represented in a variety of ways,

  • Metadata format like MARC authority records
  • XML schemas, e.g., enumerated list
  • RDF/XML and RDFS (i.e., semantic web)
  • SKOS – Simple Knowledge Organization System
  • MADS (MODS for authority records)


Guenther focused on using SKOS at http://id.loc.gov. SKOS is an RDF application used to express knowledge organization systems such as classifications, thesauri, and taxonomies. It allows distributed decentralized management of SKOS through linked data-inspired applications. It requires a uniform resource identifier (e.g., http URIs). The data model in place at id.loc.gov provides a concept scheme; logical groupings of concepts; and labeling properties, annotation properties, and associative properties

SKOS was selected because the defined element set is relevant to controlled vocabularies, more than RDF or OWL (ontology web language) alone. It is easy to transform MARC authority records into SKOS and show broader and narrower relationships and it enables web services using the URIs.

Guenther also provided some additional information about “linked data,” which is a feature of the semantic web where links are made between resources. It goes beyond the hypertext links because it allows links between concepts. According to Wikipedia: “term used to describe a method of exposing, sharing, and connecting data via dereferenceable URIs on the Web.” Id.loc.gov is a web service for shared vocabularies. It should reduce maintenance and make openly available comprehensive information about controlled terms and has been an experiment ground for semantic web technologies.

Id.loc.gov went live in April 2009, with more vocabularies added in 2010: LCSH, TGM, MARC code list for relators, and PREMIS controlled vocabularies. Data is open and continuously updated, and can be bulk downloaded in RDF. Searches can bring up terms by ID or label information, and multiple vocabularies can be searched at once. A demonstration of the site revealed the visualizations tree, suggested terminology tab, and links to similar concepts in other vocabularies, such as the French RAMEAU subject headings.

The data from id.loc.gov has been put in use in several projects, including:

  • University of Pennsylvania online books
  • University of Virginia auto suggest feature
  • Freebase.org
  • National libraries of Sweden and France

In the future LC will be adding some additional vocabularies:

  • MARC code list for language (ISO 639-2) and other ISO 639 lists
  • MARC code list for countries and geographic areas
  • Additional PREMIS controlled vocabularies
  • Name authorities – will be a challenge because it doesn’t fit into SKOS very well, so looking at a different mark-up

Some other avenues in the future includes a MADS OWL schema to enable identification of facets within name and subjects, expanded information on subdivisions, and additional relator terms to enhance existing relationships.

The technical infrastructure for id.loc.gov

  • Django (Python)
  • LCSH uses MySQL and SKOS RDF generated at time of request, mainly operates like relational database with MARC mapped to tables
  • Everything else is RDF triplesotre (Python Library, uses MySQL), XML to SKOS RDF/XML before ingest
  • Programmatic queries using SPARQL


VIVO: A Research-Focused Discovery Tool

Sara-Russell-Gonzalez, University of Florida

Russell-Gonzalez discussed VIVO, an open source semantic web application that enables the discovery of research and discovery for researchers. It is designed for researchers, students, administrators, and donor/funding agencies. It provides profiles for researchers. Originally developed at Cornell University Libraries to support the life sciences, it was redesigned in 2007 to be a semantic web application, and can cover all disciplines. The University of Florida got involved with a 2009 NIH grant to create National Networking with VIVO. VIVO is designed in part to answer the following questions:

  • Researchers don’t visit the library with online resources, so how do you know what your researchers are doing and how can you be involved in the research process?
  • How can researchers form collaborations with researchers in other disciplines or students learn about potential advisors?
  • How can administrators know their strengths and weaknesses for strategic planning?

VIVO gathers data from a variety of sources, although all of it is public. As much as possible is done automatically, drawing from internal and external sources. Because each school is different, each school has their own local VIVO instance. Local sources can includes the institutional repository, human resources databases, institutional grants database, faculty reporting tool, etc. National sources are mostly abstracts and indexes, like PubMed. All data is mapped into an RDF structure. Compliance with semantic web standards enables national network across all VIVOs around the country

Data is stored in RDF triples, with reflexive relationships (i.e., relationships are reflected in both directions). Consequently it grows quickly in size. The VIVO core ontology is used to describe people, organizations, activities, publications, events, interests, grants, and other relationships, with support for local extensions and FOAF (Friend of a Friend). Http URIs identify objects and data uses SPARQL end-points.

There are multiple challenges to using the semantic web approach, such as determining the level of granularity, scalability as the database grows, provenance of the data, and keeping data up-to-date. Disambiguation, particularly of authors, may be one of the biggest challenges. From a political standpoint, determining when data should be removed is another issue (e.g., what happens when a faculty member leaves?).

There is one year left on the project with some upcoming enhancements:

  • Want to be able to give ability to produce CVs and biosketches
  • Forming collaborations with publishers to bring in additional external data sources
  • Developing visualization capabilities

VIVO is still looking for schools to get involved, data providers and for application developers to interface with VIVO. The first national conference for VIVO is August 12-13 at the New York Hall of Science. VIVO’s website is: http://vivoweb.org/

Reported by Kristin Martin

Permalink . kmarti . 09:06:16 pm . 1085 Words . ALA Annual 2010 . Email . 1140 views . Leave a comment

06/28/10

Apples and Oranges

This is a report on the Metadata Interest
Group program: Converging Metadata Standards in Cultural Institutions: Apples & Oranges, which occurred Saturday June 26 2010, at 8:00 a.m. during the ALA Annual Meeting in Washington.

The three presentations provided some good insights into metadata challenges and possible courses of action. The first two dealt more closely with the theme suggested by the title of the session – aggregating heterogeneous metadata from differing sources including libraries, museums, and archives – while the third, a research report on an assessment of metadata from a cost/benefit perspective, brought in the strain of understanding user behavior and letting it guide metadata practice. This mixture came together like a good salad as evidenced by some lively question/answer at the end.

Danielle Plumer, Coordinator for the Texas Heritage Online project of the Texas State Library and Archives Commission, described the metadata education component of this cooperative digitization grant project. Over 30 institutions partnering together in 10 projects, each of which is creating at least 1,000 metadata records, were offered training at various locations in project management, legal issues, metadata standards and crosswalks (particularly content standards), controlled vocabularies, and digital preservation management. Much of the training was adapted for the audience for this project (many of whom are not librarians and who had a large learning curve) from modules developed by the Library of Congress, Cornell and others, and will be further modified to an online learning format and made available to anyone through Amigos Library Services. Danielle had some interesting comments on discoveries arising from her work with these diverse institutions: there’s nothing wrong with using MARC to describe cultural objects; LCSH is the most commonly used standard vocabulary, but is often poorly understood and some systems display it poorly; and often, metadata decisions are driven, not by the needs of a project, but by the limitations of the system/software used to create and store it.

Ching-Hsien Wang, Chief Information Officer at the Smithsonian Institution presented “Striving in Library, Archives, & Museum Converging Landscape: The power of working together”. The Smithsonian Institution encompasses 20 libraries, 19 museums, 14 archives, has recently launched a “one-stop” search center (http://collections.si.edu/search/), which will provide for the first time, the ability to search across all collections. It currently includes 4.6 million records, 445,000 images, from 40 data sources, encompassing highly diverse types of materials (e.g. books, postage stamps, audio of interviews…). In addition to simple search, the interface provides faceted browsing by object type, media, topic, name, date, place, data source, and many advanced features. “Metadata made it all happen.” They began by combining records from 8 Horizon databases, all MARC but with many differences, and as a result of that effort decided that metadata standardization needed to happen as they moved to incorporate data from the scientific and museum databases as well. The overcame challenges of defining common data elements and data typing for those elements through collaborative discussions with data providers, while respecting the diverse perspectives and traditions of different institutions. Much of the work to create unified indexes and presentation was done by programmers massaging and transforming the metadata; some MARC fields were omitted or, as with LCSH, “taken apart”. Sometimes assumed values for a particular context had to be supplied for the aggregation. They are at the end of the first phase but have much more to do; are working on iPhone and georeferencing applications and hierarchical facets, and are bringing each data source online one by one. Catalogers are learning and modifying their practices (and remedying existing metadata) as a result of seeing the outcomes.

Joyce Celeste Chapman, Library Fellow at NCSU Libraries, presented “Assessing metadata and incorporating user feedback,” a report on a research study she conducted to compare the time spent creating specific EAD elements with a study of both user behavior and opinion on the usefulness of those elements. This was a small-scale study, and Joyce was careful to state that the sample was not random and results were not generalizeable, but that the indications from them could nevertheless be useful and point to areas where a change of emphasis in metadata creation could benefit users. She also mentioned as a larger context, the recently released Final Report of the Task Force on Cost/Value Assessment of Bibliographic Control http://connect.ala.org/files/7981/costvaluetaskforcereport2010_06_18_pdf_77542.pdf . Discussion in the later question/answer period pointed out the difficulty of separating the effect of metadata practice from the effect of aspects of the discovery interface when trying to derive data on user impact. Chapman’s study was able to sidestep this problem to some degree by presenting a generic interface where each metadata element was chosen and presented separately, but she pointed out that the attempt to isolate metadata elements could result in a disjointed user experience. Timing for EAD field creation (Abstract, Bio/historical note, Scope/Content note, Subject Headings, Collection Inventory, and Other) was collected from 9 metadata creators at two institutions. A sample of end users were give 5 different tasks and their choices were analyzed; some were also interviewed on the relative importance they would place on data elements. The most striking finding was that Collection Inventory, while taking lots of time, also had high importance to users, while the Biographical note, which also was time consuming, was not ranked highly by users. There were also observations about the usefulness and duplication of information between Abstract and Scope/Content note. The user research group at NCSU is considering next steps for additional metadata assessment measures across different metadata schemes and methodologies.

In the question/answer, there were requests for sharing of metadata massaging tools/code (even if very institution-specific, people thought there’d be value in seeing how others are doing things at the code level); a general recognition that dates and subjects are among the most challenging elements to deal with in aggregating metadata; there are challenges in synchronizing / updating aggregations, especially when the source systems don’t provide timestamps; interaction between library, museum and archive folks often results in learning for all, and that all contexts could benefit from more detailed research into metadata’s value for users, however difficult that may be to assess.

Permalink . lakerman . 08:10:32 am . 1030 Words . ALA Annual 2007 . Email . 2693 views . Leave a comment

06/06/10

ALA Annual 2010: Best Bets for Metadata Librarians and Call for Bloggers

Below is a list of metadata and digital library-friendly sessions for ALA Annual 2010. Planning to attend a session or already reporting on a session? Think about blogging it here! If you would like to blog any of the sessions, please contact Kristin Martin at kmarti@uic.edu with your name, e-mail address, and preferred session. Fuller descriptions, when available, are linked to. See a section not on here that you think would be of interest? Suggest it!

I've tried to be inclusive as possible with the sessions as metadata is a cross-disciplinary topic within library and information science. Sessions of interest include metadata, digital projects, digital technology, and cataloging, and are from all different groups within ALA. Note that many of the sessions are sponsored through LITA, which has its own blog and they are also looking for bloggers. They are listed here for interest and I will link to write-ups following the conference.

Friday Sessions

10:30 AM - 12:00 PM on 06/25
FRBR Interest Group
Location: MAY in Chinese BR
Unit/Subunit: ALCTS

3:30 PM - 5:15 PM on 06/25
Cataloging and Classification Forum (CCS)
Location: HIL in Lincoln
Unit/Subunit: ALCTS - CCS

4:00 PM - 5:15 PM on 06/25
Electronic Resources Management Interest Group
Location: HIL in Fairchild
Unit/Subunit: LITA, ALCTS

4:00 PM - 5:15 PM on 06/25
Competencies and Education for a Career in Cataloging Interest Group
Location: JW in Commerce
Unit/Subunit: ALCTS - CCS

Saturday Sessions

8:00 AM - 10:00 AM on 06/26
Technical Services Managers in Academic Libraries Interest Group Program
Location: MAD in Constitution

Unit/Subunit: ALCTS

8:00 AM - 10:00 AM on 06/26
Grassroot Prog.: Digital Initiative for College Libraries
Location: WCC in 141
Unit/Subunit: ALA

8:00 AM - 10:00 AM on 06/26
Converging Metadata Standards in Cultural Institutions: Apples & Oranges
Location: WCC in Ballroom B
Unit/Subunit: ALCTS
Blogger: Laura Akerman

10:30 AM - 12:00 PM on 06/26
Catalog Form and Function Interest Group
Location: HIL in Columbia 5

Unit/Subunit: ALCTS - CCS

10:30 AM - 12:00 PM on 06/26
Developing a Sustainable Digitization Workflow
Location: WCC in 146C

Unit/Subunit: LITA

1:30 PM - 3:30 PM on 06/26
Image Resources Interest Group
Location: CAP in Pan American
Unit/Subunit: ACRL

1:30 PM - 3:30 PM on 06/26
Catalog Management Interest Group
Location: HIL in Fairchild

Unit/Subunit: ALCTS - CCS

1:30 PM - 3:30 PM on 06/26
Cataloging Norms Interest Group
Location: HIL in Columbia 5

Unit/Subunit: ALCTS - CCS

1:30 PM - 3:30 PM on 06/26
Multiple Formats and Multiple Copies in a Digital Age: Acceptance, Tolerance, Elimination
Location: WCC in 147B
Unit/Subunit: ALCTS - CMDS, RUSA - CODES

1:30 PM - 3:30 PM on 06/26
Digital Conversion Interest Group
Location: JW Marriott Hotel (Capitol)- BR H/J
Unit/Subunit: ALCTS - PARS

4:00 PM - 5:30 PM on 06/26
MARC Format Interest Group
Location: HIL in Kolorama
Unit/Subunit: LITA, ALCTS

4:00 PM - 5:30 PM on 06/26
Holdings Update Forum: "Next Generation OPACs: Making the Most of Local Holdings Data"
Location: JW in Grand BR IV
Unit/Subunit: ALCTS - CRS

Sunday Sessions

8:00 AM - 10:00 AM on 06/27
Digital Preservation Interest Group
Location: JW in Grand BR I/II
Unit/Subunit: ALCTS - PARS

8:00 AM - 10:00 AM on 06/27
Digital Library Technology Interest Group
Location: HIL in Fairchild
Unit/Subunit: LITA

8:00 AM - 10:00 AM on 06/27
Cataloging and Beyond: The Year of Cataloging Research
Location: WCC in 147A
Unit/Subunit: ALCTS - CCS, RUSA - RSS, LITA

8:00 AM - 10:00 AM on 06/27
Metadata Interest Group
Location: WCC in 202A
Unit/Subunit: ALCTS - CCS,LCTS
Blogger: Kristin Martin

8:00 AM - 12:00 PM on 06/27
Digitization: Preserving and Open Access to African American Collections
Location: WCC in 152B
Unit/Subunit: ACRL - AFAS

10:30 AM - 12:00 PM on 06/27
Cataloging and Classification Interest Group: Social tagging in libraries
Location: HIL in Columbia 2

Unit/Subunit: ALCTS - CCS

10:30 AM - 12:00 PM on 06/27
Intellectual Access to Preservation Metadata
Location: JW in Capitol BR H/J
Unit/Subunit: ALCTS - PARS

10:30 AM - 12:00 PM on 06/27
Open to Change: Open Source and Next Generation ILS and ERMS
Location: WCC in 146C
Unit/Subunit: ALCTS - AS, ALCTS - CRS

10:30 AM - 12:00 PM on 06/27
To Protect and Serve: Is Digitization Good for Your Historical Collections?
Location: REN in Rennaisance West A/B
Unit/Subunit: RUSA - HS

10:30 AM - 12:00 PM on 06/27
Internet Resources and Services Interest Group
Location: HIL in Columbia 10
Unit/Subunit: LITA

10:30 AM - 12:00 PM on 06/27
With Great Power Comes Great Responsibility: Building a Support Infrastructure for an Open-Source ILS
Location: HIL in Fairchild
Unit/Subunit: LITA

10:30 AM - 12:00 PM on 06/27
MODS and MADS: Current implementations and future directions
Location: WCC in 143B/C
Unit/Subunit: LITA

1:30 PM - 3:30 PM on 06/27
RDA Update Forum
Location: JW in Grand BR I/II
Unit/Subunit: ALCTS - CCS

1:30 PM - 5:30 PM on 06/27
Authorized Genre, Forms and Facets in RDA
Location: HIL in Lincoln
Unit/Subunit: LITA, ALCTS

4:00 PM - 5:30 PM on 06/27
Standards Interest Group (LITA)
Location: HIL in Columbia 1
Unit/Subunit: LITA

4:00 PM - 5:30 PM on 06/27
Institutional Repositories in Action: Success Stories from the Federal World
Location: WCC in 202A
Unit/Subunit: FAFLRT

4:00 PM - 5:30 PM on 06/27
Creative Ideas in Technical Services Interest Group Meeting
Location: MAY in Chinese BR

Unit/Subunit: ALCTS

Monday Sessions

8:00 AM - 10:00 AM on 06/28
Boot Camp for the 21st Century Metadata Manager
Location: WCC in 150B
Unit/Subunit: AFL - OLAC, ALCTS - CCS

8:00 AM - 10:00 AM on 06/28
Heads of Cataloging Interest Group
Location: WCC in 140A/B

Unit/Subunit: ALCTS - CCS

10:30 AM - 12:00 PM on 06/28
Got Data? New Roles for Libraries in Shaping 21st Century Research/Presidents Program
Location: WCC in Ballroom B
Unit/Subunit: ALCTS

1:30 PM - 3:30 PM on 06/28
Next Generation Catalog Interest Group
Location: HIL in Columbia 8
Unit/Subunit: LITA

Permalink . kmarti . 11:24:27 am . 854 Words . ALA Annual 2010 . Email . 1366 views . Leave a comment

:: Next Page >>