| Preface | 5 |
|---|
| Organization | 6 |
|---|
| Table of Contents | 7 |
|---|
| Queries over Unstructured Data: Probabilistic Methods to the Rescue (Keynote) | 8 |
|---|
| Unstructured Data in Enterprises | 8 |
| Probabilistic Models for Information Extraction | 10 |
| Representing Noisy Extractions as Imprecise Databases | 11 |
| Multi-attribute Extractions | 13 |
| Imprecise Data Models for Representing Uncertainty of De-duplication | 15 |
| Probability of Two Records Being Duplicates | 15 |
| Probability over Entity Groupings | 15 |
| Queries over Imprecise Duplicates | 16 |
| Concluding Remarks | 18 |
| References | 19 |
| Federated Stream Processing Support for Real-Time Business Intelligence Applications | 21 |
|---|
| Introduction | 21 |
| Related Work | 22 |
| The MaxStream Federated Stream Processing System | 24 |
| Architecture | 26 |
| Two Key Building Blocks | 28 |
| Hybrid Queries: Using Persistence with Streams | 30 |
| Using MaxStream in Real-Time BI Scenarios | 32 |
| Reducing Latency in Event-Driven Business Intelligence | 32 |
| Persistent Events in Supply-Chain Monitoring | 33 |
| Other Real-Time BI Applications | 34 |
| Feasibility Study | 34 |
| Conclusions and Future Directions | 36 |
| References | 37 |
| VPipe: Virtual Pipelining for Scheduling of DAG Stream Query Plans | 39 |
|---|
| Introduction | 39 |
| Preliminaries | 42 |
| Review of the Chain Scheduling | 42 |
| Problem Definition | 43 |
| The VPipe Execution Scheme | 44 |
| Change of Operator Logic | 45 |
| Discussion | 47 |
| Stochastic Analysis of Chain | 47 |
| System Model Basis | 48 |
| Case 1: System Analysis for SOS Synchronization | 48 |
| Case 2: System Analysis for IDS Synchronization | 50 |
| Performance Study | 53 |
| Experiment 1: Response Time Comparison | 53 |
| Experiment 2: Broken Pipeline Probability | 54 |
| Related Work | 54 |
| Conclusion | 55 |
| References | 55 |
| Ad-Hoc Queries over Document Collections – A Case Study | 57 |
|---|
| Introduction | 57 |
| Query Planning and Query Plan Execution | 59 |
| Understanding “Human-Powered” Query Execution Strategies | 59 |
| Elementary Plan Operators | 60 |
| The Coverage-Join (CJ) and Density-Join (DJ) Operator | 64 |
| Example Query and Example Plans | 64 |
| Plan Enumeration | 65 |
| Case Study | 66 |
| Heuristics for Plan Selection | 66 |
| Results and Discussion | 67 |
| Related Work | 69 |
| Summary and Future Work | 70 |
| References | 71 |
| Appendix: Implementing the KEYWORD-Operator | 72 |
| ASSET Queries: A Set-Oriented and Column-Wise Approach to Modern OLAP | 73 |
|---|
| Introduction | 73 |
| Grouping Analysis: A Retrospective | 74 |
| Group by | 75 |
| Cubes | 75 |
| Grouping Variables and the MD-Join | 76 |
| Windows | 76 |
| MapReduce | 77 |
| Associated Sets (ASSET) Queries | 77 |
| Definitions | 77 |
| SQL Syntax | 78 |
| DataMingler: A Spreadsheet-Like GUI | 79 |
| ASSET Queries and Data Streams (COSTES) | 80 |
| Financial Application Motivating Examples | 81 |
| COSTES: Continuous Spreadsheet-Like Computations | 83 |
| ASSET Queries and Persistent Data Sources (ASSET QE) | 84 |
| Social Networks: A Motivating Example | 84 |
| ASSET Query Engine (QE) | 86 |
| Conclusions and Future Work | 88 |
| References | 89 |
| Evaluation of Load Scheduling Strategies for Real-Time Data Warehouse Environments | 91 |
|---|
| Introduction | 91 |
| System Model and Problem Statement | 93 |
| System Architecture | 93 |
| Workload Model | 94 |
| Scheduling Performance Objective | 95 |
| Problem Statement | 96 |
| Scheduling Policies | 97 |
| Scheduling Algorithms for Push-Based Update Propagation | 97 |
| Evaluation and Discussion | 98 |
| Simulation Framework | 98 |
| Effect of the Data Production Process Length | 99 |
| Comparison of Local and Global Scheduling | 100 |
| Effects of Stage-Concurrent and Long-Running Updates | 101 |
| Ratio of Stage-Concurrent Updates | 102 |
| Pruning of Irretrievable Queries | 103 |
| Effects of Long-Running Update and Queries during Runtime | 103 |
| Related Work | 104 |
| Conclusion | 105 |
| References | 106 |
| Near Real-Time Data Warehousing Using State-of-the-Art ETL Tools | 107 |
|---|
| Near Real-Time Data Warehousing | 107 |
| Related Work | 108 |
| Data Wareh
|