knowledge-agents

Tractus-X Provisioning Agent (KA-PROV)

KA-PROV is a module of the Tractus-X Knowledge Agents Reference Implementations.

About this Module

This is a folder providing a FOSS implementations of a Data Binding (aka Provisioning) Agent.

Binding Agents are needed by any Agent-Enabled dataspace providers to connect the dataspace protocol/representation (here: a combination of the SPARQL query language/operating on RDF triple graphs) to the underlying business data and logic.

The Provisioning Agent in particular is able to interact with typical relational and structured backend sources based on SQL interfaces. The SPARQL profile which is used is called KA-BIND (the Knowledge Agent Binding Profile).

Commercial alternatives to the FOSS Provisioning Agent are:

Architecture

The FOSS Provisioning Agent uses the OnTop Virtual Knowledge Graph system.

Provisioning Agent Architecture

According to their homepage: “… exposes the content of arbitrary relational databases as knowledge graphs. These graphs are virtual, which means that data remains in the data sources instead of being moved to another database.”

Ontop operates on four standards: three W3C standards and one ANSI standard. It translates

The Ontop CLI is a Java/Spring application which must be extended with an appropriate JDBC driver and that can host only one endpoint per port. We have hence extended the original docker entrypoint scripting resources such that multiple endpoints and ports can be hosted in a single container. The following environment properties which were originally single-valued can now be set to arrays (where the last entry behaves as the original process to which the container liveness checks are tied to):

Tractus-X focusses on not only accessing traditional SQL databases, such as PostgreSQL, but also on accessing modern cloud-based data lake/virtualization infrastructures, such as Dremio and Druid. For that purpose, we have added a few Ontop Extensions. These extensions can be activated by setting the right properties in the files referenced by ONTOP_PROPERTIES_FILE

com.dremio.jdbc.Driver-metadataProvider = it.unibz.inf.ontop.dbschema.impl.KeyAwareDremioDBMetadataProvider
com.dremio.jdbc.Driver-schemas = HI_TEST_OEM, TRACE_TEST_OEM
com.dremio.jdbc.Driver-tables.HI_TEST_OEM = CX_RUL_SerialPartTypization_Vehicle,CX_RUL_SerialPartTypization_Component,CX_RUL_AssemblyPartRelationship,CX_RUL_LoadCollective
com.dremio.jdbc.Driver-unique.HI_TEST_OEM.CX_RUL_SerialPartTypization_Vehicle = UC_VEHICLE
com.dremio.jdbc.Driver-unique.HI_TEST_OEM.CX_RUL_SerialPartTypization_Component = UC_COMPONENT
com.dremio.jdbc.Driver-unique.HI_TEST_OEM.CX_RUL_AssemblyPartRelationship = UC_ASSEMBLY
... 
# Use the Data Virtualization backend
jdbc.url=jdbc\:avatica\:remote\:url=http://data-backend:8888/druid/v2/sql/avatica/
jdbc.driver=org.apache.calcite.avatica.remote.Driver
org.apache.calcite.avatica.remote.Driver-metadataProvider = it.unibz.inf.ontop.dbschema.impl.DruidMetadataProvider
org.apache.calcite.avatica.remote.Driver-typeFactory = it.unibz.inf.ontop.model.type.impl.DefaultSQLDBTypeFactory
org.apache.calcite.avatica.remote.Driver-symbolFactory = it.unibz.inf.ontop.model.term.functionsymbol.db.impl.DefaultSQLDBFunctionSymbolFactory 

Security

Besides the authentication of the Ontop engine at the relational database via jdbc (one url/user per endpoint), there are no additional (row-level) security mechanism.

Hence we recommend to apply a role-based approach.

For any accessing role:

Data Sources and Scaleablility

For the sample deployments, we use single agent container with an embedded database (H2) and/or a second database virtualization container (Dremio Community Edition) using preloaded files.

Practical deployments will

Deployment

Compile, Test & Package

mvn package

This will generate

Containerizing (Provisioning Agent)

You could either call

mvn install -Pwith-docker-image

or invoke the following docker command after a successful package run

docker build -t tractusx/provisioning-agent:1.9.5-SNAPSHOT -f src/main/docker/Dockerfile .

The image contains

To run the docker image using some default data, you could invoke this command

docker run -p 8080:8080 \
  -v $(pwd)/resources/university.ttl:/input/ontology.ttl \
  -v $(pwd)/resources/university-role1.obda:/input/mapping.obda \
  -v $(pwd)/resources/university-role1.properties:/input/settings.properties \
  -v $(pwd)/resources/university.sql:/tmp/university.sql \
  tractusx/provisioning-agent:1.9.5-SNAPSHOT

Afterwards, you should be able to access the local SparQL endpoint via the browser or by directly invoking a query

curl --location --request POST 'http://localhost:8080/sparql' \
--header 'Content-Type: application/sparql-query' \
--header 'Accept: application/json' \
--data-raw 'PREFIX : <http://example.org/voc#>

SELECT ?x
WHERE {
   ?x a :Professor .
}'

You may manipulate any of the following environment variables to configure the image behaviour. Note that there is no builtin security (ssl/auth) for the exposed endpoints. This must be provided by hiding them in an appropriate service network layer.

ENVIRONMENT VARIABLE Required Example Description List
JAVA_TOOL_OPTIONS   -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:8090 JMV (Debugging option) X
ONTOP_PORT   8080 (default) A port number X
ONTOP_ONTOLOGY_FILE   /input/ontology.ttl (default) Path to ontology file (ttl or xml) X
ONTOP_MAPPING_FILE   /input/mapping.obda (default) Path to mapping file (obda) X
ONTOP_PROPERTIES_FILE   /input/settings.properties (default) Path to settings file (properties) X
ONTOP_PORTAL_FILE   /input/portal.toml Path to portal config (toml) X
ONTOP_CORS_ALLOWED_ORIGINS   * (default) CORS domain name  
ONTOP_DEV_MODE   true (default) Redeploy endpoint on file changes X

Here is an example which exposes two endpoints for two different roles (database users, restricted mappings but same ontology)

docker run -p 8080:8080 -p 8082:8082 \
  -v $(pwd)/resources/university.ttl:/input/ontology.ttl \
  -v $(pwd)/resources/university-role1.obda:/input/role1.obda \
  -v $(pwd)/resources/university-role1.properties:/input/role1.properties \
  -v $(pwd)/resources/university-role2.obda:/input/role2.obda \
  -v $(pwd)/resources/university-role2.properties:/input/role2.properties \
  -v $(pwd)/resources/university.sql:/tmp/university.sql \
  -e ONTOP_PORT="8080 8082" \
  -e ONTOP_ONTOLOGY_FILE="/input/ontology.ttl /input/ontology.ttl" \
  -e ONTOP_MAPPING_FILE="/input/role1.obda /input/role2.obda" \
  -e ONTOP_PROPERTIES_FILE="/input/role1.properties /input/role2.properties" \
  -e ONTOP_DEV_MODE="false false" \
  tractusx/provisioning-agent:1.9.5-SNAPSHOT

Accessing entities spanning two schemas using the first role/endpoint delivers a greater count

curl --location --request POST 'http://localhost:8080/sparql' \
--header 'Content-Type: application/sparql-query' \
--header 'Accept: application/json' \
--data-raw 'PREFIX : <http://example.org/voc#>

SELECT (COUNT(DISTINCT ?x) as ?count)
WHERE {
   ?x a :Course .
}'
{
  "head" : {
    "vars" : [
      "count"
    ]
  },
  "results" : {
    "bindings" : [
      {
        "count" : {
          "datatype" : "http://www.w3.org/2001/XMLSchema#integer",
          "type" : "literal",
          "value" : "12"
        }
      }
    ]
  }
}

Accessing entities using the restricted role/endpoint delivers a smaller count

curl --location --request POST 'http://localhost:8082/sparql' \
--header 'Content-Type: application/sparql-query' \
--header 'Accept: application/json' \
--data-raw 'PREFIX : <http://example.org/voc#>

SELECT (COUNT(DISTINCT ?x) as ?count)
WHERE {
   ?x a :Course .
}'
{
  "head" : {
    "vars" : [
      "count"
    ]
  },
  "results" : {
    "bindings" : [
      {
        "count" : {
          "datatype" : "http://www.w3.org/2001/XMLSchema#integer",
          "type" : "literal",
          "value" : "6"
        }
      }
    ]
  }
}

Notice for Docker Image

DockerHub: https://hub.docker.com/r/tractusx/provisioning-agent

Eclipse Tractus-X product(s) installed within the image: GitHub: https://github.com/eclipse-tractusx/knowledge-agents/tree/main/provisioning Project home: https://projects.eclipse.org/projects/automotive.tractusx Dockerfile: https://github.com/eclipse-tractusx/knowledge-agents/blob/main/provisioning/src/main/docker/Dockerfile Project license: Apache License, Version 2.0

Used base image

As with all Docker images, these likely also contain other software which may be under other licenses (such as Bash, etc from the base distribution, along with any direct or indirect dependencies of the primary software being contained).

As for any pre-built image usage, it is the image user’s responsibility to ensure that any use of this image complies with any relevant licenses for all software contained within.

Helm

A helm chart for deploying the remoting agent can be found under this folder.

It can be added to your umbrella chart.yaml by the following snippet

dependencies:
  - name: provisioning-agent
    repository: https://eclipse-tractusx.github.io/charts/dev
    version: 1.9.5-SNAPSHOT
    alias: my-provider-agent

and then installed using

helm dependency update

In your values.yml, you configure your specific instance of the remoting agent like this

#######################################################################################
# Data Binding Agent
#######################################################################################

my-provider-agent: 
  securityContext: *securityContext
  nameOverride: my-provider-agent
  fullnameOverride: my-provider-agent
  resources:
    requests:
      cpu: 500m
      # you should employ 512Mi per endpoint
      memory: 1Gi
    limits:
      cpu: 500m
      # you should employ 512Mi per endpoint
      memory: 1Gi
  bindings: 
    # disables the default sample binding
    dtc: null
    # real production mapping
    telematics2: 
      port: 8081
      path: /t2/(.*)
      settings: 
        jdbc.url: 'jdbc:postgresql://intradb:5432/schema'
        jdbc.user: <path:vaultpath#username>
        jdbc.password: <path:vaultpath#password>
        jdbc.driver: 'org.postgresql.Driver'
      ontology: cx-ontology.xml
      mapping: |-
        [PrefixDeclaration]
        cx-common:          https://w3id.org/catenax/ontology/common#
        cx-core:            https://w3id.org/catenax/ontology/core#
        cx-vehicle:         https://w3id.org/catenax/ontology/vehicle#
        cx-reliability:     https://w3id.org/catenax/ontology/reliability#
        uuid:		            urn:uuid:
        bpnl:		            bpn:legal:
        owl:		            http://www.w3.org/2002/07/owl#
        rdf:		            http://www.w3.org/1999/02/22-rdf-syntax-ns#
        xml:		            http://www.w3.org/XML/1998/namespace
        xsd:		            http://www.w3.org/2001/XMLSchema#
        json:               https://json-schema.org/draft/2020-12/schema#
        obda:		            https://w3id.org/obda/vocabulary#
        rdfs:		            http://www.w3.org/2000/01/rdf-schema#
        oem:                urn:oem:

        [MappingDeclaration] @collection [[
        mappingId	vehicles
        target		<{vehicle_id}> rdf:type cx-vehicle:Vehicle ; cx-vehicle:vehicleIdentificationNumber {van}^^xsd:string; cx-vehicle:worldManufaturerId bpnl:{localIdentifiers_manufacturerId}; cx-vehicle:productionDate {production_date}^^xsd:date.
        source		SELECT vehicle_id, van, 'BPNL0000000DUMMY' as localIdentifiers_manufacturerId, production_date FROM vehicles

        mappingId	partsvehicle
        target		<{gearbox_id}> cx-vehicle:isPartOf <{vehicle_id}> .
        source		SELECT vehicle_id, gearbox_id FROM vehicles

        mappingId	vehicleparts
        target		<{vehicle_id}> cx-vehicle:hasPart <{gearbox_id}> .
        source		SELECT vehicle_id, gearbox_id FROM vehicles

        ]]  
  ingresses:
    - enabled: true
      # -- The hostname to be used to precisely map incoming traffic onto the underlying network service
      hostname: "my-provider-agent.public.ip"
      annotations:
        nginx.ingress.kubernetes.io/rewrite-target: /$1
        nginx.ingress.kubernetes.io/use-regex: "true"
      # -- Agent endpoints exposed by this ingress resource
      endpoints:
        - telematics2
      tls:
        enabled: true
        secretName: my-provider-tls