This feature is only available starting from version 3.0.

OpenMetadata

Metadata Management

OpenMetadata is a unified open-source metadata management platform designed for data discovery, governance, data quality, observability, and collaboration.

OpenMetadata is powered by a central metadata repository, deep lineage, and built-in team collaboration.

It is one of the fastest-growing open-source projects, with a vibrant community and adoption across diverse industries.

Based on Open Metadata Standards and APIs, and supporting connectors for a wide range of data services, OpenMetadata enables end-to-end metadata management.

The core idea is that metadata is a first-class asset and should be:

Easy to query and explore (discovery).
Reliable and up-to-date (observability and quality).
Contextualized (who uses it, for what purpose, and in which flow it fits).
Governed (with owners, policies, classifications).

Figure 1 - OpenMetadata

What it does

Cataloging and discovery of assets (tables, dashboards, pipelines, models).
Collaborative documentation: descriptions, comments, data dictionary.
Governance and quality: ownership assignment, policies, built-in data tests.
Visual lineage, including at column level.
Observability: freshness, volume, and quality metrics, plus alerts.
Collaboration and notifications via comments, webhooks, access control.

OpenMetadata features:

Licensed under the Apache License, currently version 2.0
Simplified architecture: essentially 4 services (API Server, relational DB, Elasticsearch, Ingestion).
Extensible ingestion framework: 90+ ready-to-use connectors, with simple paths to build new ones.
Schema-first: metadata contracts are versioned JSON Schemas.
Open APIs: full functionality exposed via REST, easing automation and CI/CD integration.
Modern UI: one interface for search, lineage visualization, quality dashboards, and ingestion workflows.

OpenMetadata architecture

OpenMetadata follows a layered architecture:

UI (Frontend): React-based web interface for exploration, collaboration, and governance.
API Server (Backend): implemented in Java (Dropwizard/Jetty), exposes REST APIs and enforces consistency.
Metadata Store (relational DB, typically MySQL/PostgreSQL): stores entities, relationships, and history.
Search Engine (Opensearch): indexes all assets for fast, faceted search.
Ingestion Framework: implemented in Python, runs as pipelines (can be scheduled via Airflow or standalone), collects metadata from sources and sends it to the API.

Figure 2 - OpenMetadata architecture

Its main components are:

Entities: table, column, user, pipeline, dashboard, etc.
Ingestion Workflows: connectors for databases, ETL, BI, ML.
Glossary/Taxonomy: business term dictionary and domains.
Policies & Roles: fine-grained access control.
Lineage Graph: engine that builds and visualizes dependencies.
Collaborative UI: comments, notifications, Slack/Jira integration.

OpenMetadata capabilities

Ready-made connectors for databases (Snowflake, BigQuery, Redshift, Hive), BI (Looker, Superset), pipelines (Airflow, Dagster, Spark).
Detailed lineage with drill-down to individual columns.
Data Quality: create native tests (e.g., expected values, thresholds, consistency).
Observability: dashboards for freshness, volume, and popularity.
Governance: ownership assignment, sensitive-data classification, organizational domains.
Extensibility: easily build custom connectors and schemas.

Good practices when using OpenMetadata

Start small: catalog critical sources and expand gradually.
Automate ingestion: schedule workflows (Airflow, cron).
Define clear ownership: assign owners for tables, dashboards, and pipelines.
Adopt a corporate glossary: business terms help non-technical users understand data.
Monitor quality: set minimum tests on critical datasets.
Integrate communications: Slack, Teams, Jira for alerts and collaboration.
Version and audit: keep history of metadata and changes.
Use RBAC and policies: restrict who can edit what.
Engage with the community: leverage existing extensions and contribute back.

Project details

OpenMetadata’s backend is developed in Java with Dropwizard/Jetty.
It implements the Ingestion Framework in Python and the frontend (UI) in TypeScript/React.

TDP Kubernetes

Available in TDP Kubernetes

This component is also available in the TDP Kubernetes edition since version 3.0.

The current version is 1.12.4, distributed via Helm Chart tdp-openmetadata v3.0.1.

For configuration details, see the TDP Kubernetes documentation.

Source(s):

openmetadata.org

OpenMetadata

Metadata Management

What it does​

OpenMetadata features:​

OpenMetadata architecture​

OpenMetadata capabilities​

Good practices when using OpenMetadata​

Project details​

TDP Kubernetes​