Semantic data integration (also referred to as “semantic data fusion”) refers to the process of populating a semantic model (i.e., an ontology) with instance data coming from multiple heterogeneous sources. The result of this population is called a Knowledge Graph (KG). The data ingested into the ontology may either be raw data (e.g., sensor measurements), or higher-level data (e.g., analysis outputs from other software tools). Semantic data integration is a highly demanding and error-prone process, and the few existing tools out there are facing multiple challenges, like, e.g., efficiency, scalability, and redundancy in the generated outputs.
What is CASPAR?
CASPAR stands for “StruCtured DAta Semantic ExPloitAtion FRamework” and is Catalink’s powerful domain-agnostic framework for the automated retrieval and fusion of structured data from disparate sources into domain-specific semantic models, facilitating the discovery of new knowledge along with the extraction of actionable insights. CASPAR currently features connectors for acquiring data from third-party APIs, relational databases, and message buses, and can ingest them into any RDF triplestore hosting an ontology. The Mapper, the core CASPAR component, transforms incoming data into SPARQL queries according to predefined mappings, which specify how every data field of interest from the input data is associated to ontology concepts and relationships.
CASPAR vs Competition
When compared to other similar tools and frameworks (e.g., RML.io or Karma), CASPAR’s key advantage lies in its capacity to actually employ SPARQL and not custom ETL (Extract, Transform, Load) pipelines for ingesting the data. Via appropriate definitions in the mappings that lead to respective SPARQL update queries, CASPAR inserts only new nodes to the graph in case new information arrives for instances already existing in the KG. This way, no expensive “delete queries” are needed and redundancy in the KG is avoided altogether.
CASPAR stands for “StruCtured DAta Semantic ExPloitAtion FRamework” and is a powerful domain-agnostic framework for the automated retrieval and fusion of structured data from disparate sources into domain-specific semantic models, facilitating the discovery of new knowledge along with the extraction of actionable insights.
Converting a Taxonomy into a SKOS Vocabulary
For organizations such as museums, converting their existing taxonomies into a SKOS-compliant format is a non-trivial and rather costly task, demanding experience in Semantic Web knowledge representation formats and relevant technologies. Towards addressing this challenge, this blog post presents the deployment of our in-house CASPAR semantic data integration platform that substantially speeds up and simplifies the process.
Semantic Data Integration in (Semi)Autonomous Vehicles
The first application domain we deployed CASPAR in was within the context of the ongoing CPSoSaware EU-funded project and, specifically, its use case on (semi)autonomous vehicles. This work was also presented at the latest SEMAPRO conference in 2021 (paper, presentation). Third-party components, developed by other partners in the project’s consortium, feed CASPAR with (a) quantitative trajectory evaluations (Absolute Trajectory Error – ATE, Relative Pose Error – RPE) of odometry algorithms, (b) driver’s drowsiness levels, based on the Eye Aspect Ratio (EAR) and the PERcentage of Eye CLOSure (PERCLOS), (c) occupancy factor (generated by the vehicle’s LiDAR) indicating how clear the road is beyond the driver’s field of view. The W3C-recommended Semantic Sensor Network Ontology (SSN) serves as the core semantic model. Below is a subset of the KG representing ATE and RPE observations.
After ingesting the above input into the KG, dedicated rules running “on top” of the data can determine the robustness of odometry algorithms and can calculate the risk levels during a driving session. Below is a sample risk level summary, which could potentially constitute part of a report, e.g., after a traffic accident.
Semantic Data Integration in an e-Health Framework
Another (totally diverse) domain CASPAR is being deployed in is within the context of the ALAMEDA e-Health EU-funded project, which is aimed at developing an AI-powered framework for bridging early diagnosis and treatment for people with brain diseases. The same approach is adopted, with third-party components developed by other project partners submitting their analysis outputs to CASPAR. Those outputs involve multimodal analyses of data coming from smartwatches, smart bracelets, the camera of the smartphone, smart insoles, smart mattresses, and belts. CASPAR, in turn, appropriately inserts the outputs into the underlying semantic model. In ALAMEDA, the latter is a custom-build ontology network (i.e., a set of interconnected ontologies), that is properly aligned to the project’s use cases. A preliminary report of the latest progress of this work was presented at the latest MTSR conference in 2021. A sample instantiation of the resulting KG based on the input presented above is shown below.
Deploying CASPAR in Other Use Cases
In the meantime, CASPAR was also deployed for generating a SKOS taxonomy for Tate’s subject hierarchy and will play a pivotal role in the upcoming SILVANUS research project, serving as the semantic data fusion framework for ingesting wildfire-related data into a semantic model. Consequently, CASPAR proves to be in practice a truly domain-agnostic semantic data integration framework that can be easily deployed in virtually any domain. If you would like to know more, or if you wish to deploy CASPAR for your use case, please feel free to contact us. We will be more than happy to discuss your specific needs in depth!