DISERTASI Administrasi Bisnis: Collaborative Data Sharing With Mappings And Provenance [kode: D-BIS-1]


ABSTRACT
A key challenge in science today involves integrating data from databases managed by different collaborating scientists. In this dissertation, we develop the foundations and applications of collaborative data sharing systems (CDSSs), which address this challenge. A CDSS allows collaborators to define loose confederations of heterogeneous databases, relating them through schema mappings that establish how data should flow from one site to the next. In addition to simply propagating data along the mappings, it is critical to record data provenance (annotations describing where and how data originated) and to support policies allowing scientists to specify whose data they trust, and when. Since a large data sharing confederation is certain to evolve over time, the CDSS must also efficiently handle incremental changes to data, schemas, and mappings. We focus in this dissertation on the formal foundations of CDSSs, as well as practical issues of its implementation in a prototype CDSS called Orchestra. We propose a novel model of data provenance appropriate for CDSSs, based on a framework of semiring-annotated relations. This framework elegantly generalizes a number of other important database semantics involving annotated relations, including ranked results, prior provenance models, and probabilistic databases. We describe the design and implementation of the Orchestra prototype, which supports update propagation across schema mappings while maintaining data provenance and filtering data according to trust policies. We investigate fundamental questions of query containment and equivalence in the context of provenance information. We use the results of these investigations to
develop novel approaches to efficiently propagating changes to data and mappings in a CDSS. Our approaches highlight unexpected connections between the two problems and with the problem of optimizing queries using materialized views. Finally, we show that semiring annotations also make sense for XML and nested relational data, paving the way towards a future extension of CDSS to these richer data models.


Contents
1 Introduction - 1
1.1 Overview of CDSS and Orchestra - 3
1.2 Overview of Technical Contributions -12
1.3 Roadmap - 25
2 Provenance Semirings - 27
2.1 Queries on Annotated Relations - 28
2.2 Positive Relational Algebra - 30
2.3 Polynomials for Provenance - 34
2.4 A Hierarchy of Provenance -36
2.5 Datalog on K-Relations - 39
2.6 Formal Power Series for Provenance - 44
2.7 Computing Provenance Series - 46
2.8 Application to Incomplete/Probabilistic Databases - 49
2.9 Provenance-Annotated Queries - 49
3 Update Exchange in Orchestra - 53
3.1 CDSS Update Exchange - 55
3.2 Update Exchange Formalized - 60
3.3 Performing Update Exchange -67
3.4 Implementation -75
3.5 Experimental Evaluation - 77
4 Optimizing Queries on Annotated Relations 85
4.1 Preliminaries - 87
4.2 Containment Mappings - 89
4.3 Bounds from Semiring Homomorphisms - 90
4.4 Main Results - 91
4.5 Datalog -103
5 Ring-Annotated Relations and Differences - 106
5.1 Applications of Differences - 109
5.2 Z-Relations - 111
5.3 Reformulation Using Views - 114
5.4 Finding Query Rewritings -123
5.5 Applications to Bag and Set Semantics -125
5.6 Built-in Predicates - 128
5.7 Z[X]-relations -129
6 Semiring-Annotated XML 131
6.1 Semiring Annotations - 133
6.2 Annotated and Unordered XML -134
6.3 A Security Application -142
6.4 Incomplete and Probabilistic K-UXML -144
6.5 Semantics via Complex Values - 146
6.6 Commutation with Homomorphisms - 152
6.7 Semantics via Relations - 154
7 Related Work 158
7.1 Paradigms for Data Integration  - 158
7.2 Provenance and Annotated Data Models  - 164
7.3 Update Exchange - 165
7.4 Query Containment and Equivalence - 166
7.5 Ring-Annotated Relations and Updates  - 167
7.6 Semiring-Annotated XML - 168
7.7 Further Work on Orchestra - 168
8 Conclusions and Future Directions 170
8.1 Immediate Next Steps - 171
8.2 Longer-Term Directions - 172
Bibliography - 174