Mining relatedness graphs for data integration

Engle, J. T. Feng, Y., Goldstone, R. L. (2012). Mining relatedness graphs for data integration.  Proceedings of the Thirty-Fourth Annual Conference of the Cognitive Science Society.  (pp. 1524-1529).  Sapporo, Japan: Cognitive Science Society.

In this paper, we present the AbsMatcher system for schema matching which uses a graph based approach. The primary contribution of this paper is the development of new types of relationships for generating graph edges and the effectiveness of integrating schemas using those graphs. AbsMatcher creates a graph of related attributes within a schema, mines similarity between attributes in different schemas, and then combines all information using the ABSURDIST graph matching algorithm. The attribute-to-attribute relationships this paper focuses on are semantic in nature and have few requirements for format or structure. These relationships sources provide a baseline which can be improved upon with relationships specific to formats, such as XML or a relational database. Simulations demonstrate how the use of automatically mined graphs of within-schema relationships, when combined with cross-schema pair-wise similarity, can result in matching accuracy not attainable by either source of information on its own.
Download PDF version of this paper