The eyes of the network-that-seeks-a-tool-for-shared-cataloging is partly turned towards Sweden, who made the ambitious / brave / foolhardy / crazy choice (to each his/her point of view) to bet on the Semantic Web.
They’re developping for their national union catalog LIBRIS a new architecture, LIBRIS-XL, as well as their own shared cataloging tool to generate data in RDF. They’re replacing MARC21 by JSON-LD as the internal storage format fo the data. Note that this tool was planned to go into production in March 2014, but the deadline was pushed back to autumn 2014.
Careful, the Swedish developers have planned to offer an interface to produce bibliographic data in RDF but also to maintain the possibility to catalog in MARC21 for experienced catalogers. Cohabitation between MARC and RDF should last for a while to allow the international exchange of data.
Sweden has no intention of cutting herself off from the rest of the world.
Note that their catalog aims at implementing the FRBR model, and that they primarily use Bibframe declared vocabulary. Bibframe being a format under development at the Library of Congress, intended to replace the MARC formats. Sweden has also chosen to adopt RDA in 2015, which is quite logical for countries applying AACR2 standards, since RDA is a sort of AACR3.
What does the Swedish cataloging interface look like?
The Swedish cataloging interface can be tested (and thanks Google translate), just enter « test » as username and password : http://devkat.libris.kb.se/login
No coding to blind the eyes (of cataloguers), the cataloguer will face a fairly standard form with fields to fill in to describe his item (ISBN, number of pages, publication date, etc..).
Web of data and RDF do not mean that catalogers have to become IT specialists.
As stated in the Libris presentation at ELAG 2013, the difference with linked data, is that we will primarily choose among existing entities rather than describe the items with strings (which explains the idea that cataloguers will become “catalinker” – “catalinking” is very trendy).
So, is the “Swedish scenario” any good?
I willingly confess, the Swedish scenario is a dream-setter. Taking the future into our hands, to be bold, a model of innovation (well, after the Swedish, OK), is already motivating.
But most importantly, to develop a tool that really suits the configuration and culture of our network, to overcome the limits imposed by proprietary systems, to create, adjust, adapt, maintain flexibility, to be independent, master of our data … that’s what I find most exciting about this scenario.
But there is a principle of reality, one whispers in my ear.
Is the “Swedish scenario” suited for France ?
We wouldn’t start from scratch, since we could rely on the experience of the Swedish (see the approach presented by Kristin Olofsson at Jabes 2010), but there are notable differences.
The Swedish network is about 180 institutions / 350 libraries. Sudoc network is about 3100 libraries (half of them only taking part in Sudoc-PS, for periodicals). Larger = greater complexity and a higher servers capacity needed, at least.
This project is supported by the National Library of Sweden, which is part of the Ministry of Education and Research, and coordinates the work of all libraries, public and academic, in the Swedish union catalog, LIBRIS. Means and political support are unlikely the same and coordination is simplified. From our side, we must not only consider the replacement of Sudoc but also its relationship with the general catalog of the BnF, for starters.
Does the MARC to RDF conversion work?
I have not yet fully grasped the extent of the complexity in transforming one format to another, several people at ABES have tried to explain it to me, but it does not hits home yet.
If I understand correctly, expressing our UNIMARC data into XML poses no problems, granularity and accuracy are the same as our UNIMARC in CBS (the « heart » of Sudoc) and in the XML mirror database.
Where it gets complicated is to express these data in RDF. XML = Web and RDF = Web does not mean that XML = RDF.
Roughly, with UNIMARC as well as with XML the data is organized and makes sense when placed in a certain order, and this order is dictated to us by the ISBD standard. This does not fit with the structure of the RDF model, which completely overcomes any idea of hierarchy in the bibliographic description.
It is still a little fuzzy for me, but I understand that asking a machine to automatically express some strings as a RDF graph is a pain in the ass.
One example with a book containing several works from different authors :
200 1#$aThe @white devil$bTexte imprimé$aThe duchess of Malfi$fby J. Webster$cThe atheist’s tragedy$aThe revenger’s tragedy$fby Tourneur$g[all] edited with an introduction and notes by J.A. Symonds
The machine cannot automatically differentiate a first 200$a from a second 200$a, and put Title n°1 et Title n°2 in order and deduce the link between one title and its rightful author. [Thanks @Iladpo for this example!]
Problem is not so much with UNIMARC as it is with the ISBD upstream. There is an ISBD vocabulary developed by IFLA for the Semantic Web but it should be infinitely more complex to cover all cases.
This is what I understand of the challenges causes by automating a UNIMARC to RDF via XML transformation. Please correct me in comments if necessary!
How do the Swedes do it, they who seem to get the job done?
Well truth is they don’t. Not completely. But they decided that it was OK. This is at least what I understand from a conversation with Martin Malmsten and Niklas Linström (National Library of Sweden), as well as from Martin Malmsten’s words in his article Making a library catalog share of the semantic web (2008).
Their position is clear from the abstract:
“The focus is on links to and between resources and the mechanisms used to make available, rather than perfect description of the individual resources.”
The Swedes have decided that the important thing is to expose and link data and thus participate in the Semantic Web, even if it means giving up on a certain quality and completeness of the existing bibliographic records.
One argument is also that quality and completeness are all relative in the existing catalog, due to frequent changes in rules and practices over the last 30 years, and that the new system will just reshape and improve data :
« Also, thirty years of continually changing cataloguing rules and practices have left some data in an inconsistent state. Our hope is that the result of the work described will help us work with data in a new and better way. »
We could imagine, as the Swedes, keeping a source of production in UNIMARC, then converted into XML in a mirror database to serve web services and APIs, and a parallel homemade production tool in RDF (I do not know which languages are currently used by ABES, I believe that several cohabit). It would also be Bibframe inspired and bond to supplant the UNIMARC production tool when everyone is ready – it’s a gamble, but a gamble that does not seem too risky. Bibframe will mature and might eventually replace the MARC formats, we would only anticipate (for once).
UNIMARC production could be done in the next generation systems, such as Alma or WorldShare, for sites that would be set on using theses systems workflows to their full potential, thereby cataloging in their own system.
How would the system work, schematically?
Probably gaps and inaccuracies in this scheme, which is based in part on schemes produced by the consultant Maurits van der Graaf and presented mainly during the « Ré-informatiser à l’heure du SGBM » conference organized by the ADBU in February.
I here namely use Alma, Sierra and WMS systems, but it is only by way of example, there are others.
What’s the hurry?
At present, ABES does not have the human and financial resources to undertake a project of this magnitude, and this choice would require strong political involvement.
But our current system still has a few years ahead of him, say between 5 and 10 years.
There is no question of delaying the entire “SGBM” project, the pilot sites’ interest for the project being based on a short-term need to move to a new ILS. But the option chosen for the SGBM project was to separate the functions of local (meta)data production and management. It has been set now that it will take more time to decide the future of the union catalog than to put a SGBM in place.
Can we imagine a future development of our union catalog and shared tool for cataloging, based on the example of data.bnf.fr? That means using an external provider specialized in the Semantic Web, as did the BnF with the company Logilab relying on the CubicWeb software to develop data.bnf.fr? Can we imagine a financial support to the project such as the “investissements d’avenir” , as was the case for ISTEX project?
There are plenty of holes in my thinking, I am well aware, as I don’t master all the technical aspects underlying such a project. But I think it is too early to give up dreaming.
By the way, do we still need a shared cataloguing tool (or even a national cataloguing network)?
As I am very lazy, I would say yes. Not lazy because I’m too lazy to argue, but lazy because I find it extremely comfortable to have a national agency to centralize / advocate / arbitrate, and to have peers in the network to do the job for me share the workload.
French is still widely used by the academic community, it remains obvious with undergraduate students, so it seems necessary to continue to provide the documentation and indexation in our language, as a minimum. Moreover, our union catalog also serves to provide access to French local editions for which we still have to produce the bibliographic descriptions ourselves. Even if only for PhD theses, we do need a common tool and a well-structured network.
Even if we produce ourselves less and less data and collect more data from publishers instead, we will be required to rework the data to the specific needs of libraries. I remain convinced that the precision and accuracy of the information provided by the library catalogs remains one of our strengths.
Upgrade the data, does not mean handwriting each record but automating as much as possible. However, automation has its limits, experts bibliographic data (aka cataloguers) will be asked to participate in projects aimed at improving the publishers data. This requires a strong coordination. As a team and with a coordinator at the national level, we can make a damn good job while wasting the least time possible.
I appreciate that a national agency is there to harmonize practices, develop common standards, manage the maintenance of a common catalog, negotiate contents, etc. I think it’s rather a good thing, that leaves a lot less to deal with in each individual library.
The network is a great idea. Me like.
You did not seriously think you’d escape from my favorite group on this post?
A little more of the interface
Well, some problems maybe with the deduplication in their authorities. I’m sure our national IdRef works better, though ! 😉