Pages

30 December 2010

Name matching strategy using bibliographic data | Open Biblio (graphic) Projects

Name matching strategy using bibliographic data | Open Biblio (graphic) Projects: "One of the aims of an RDF representation of bibliographic data should be to have authors represented by unique, reference-able points within the data (as URIs), rather than as free-text fields. What steps can we do to match up the text value representing an author’s name to another example of their name in the data?

It’s not realistic to expect a match between say, Mark Twain to Samuel Clemens, without using some extra information typically not present in bibliographic datasets. What can be achieved however, is the ‘fuzzy’ matching of alternate forms of names – due to typos, mistakes and omitted initials and the like. It is important that these matches are understood to be fuzzy and not precise, based more on statistics than a definite assertion."

No comments:

Post a Comment