Skip to main content
This feature is only available starting from version 3.0.0.

OpenMetadata

Metadata Management

OpenMetadata icon

OpenMetadata is a unified open-source metadata management platform designed for data discovery, governance, data quality, observability, and collaboration.

OpenMetadata is powered by a central metadata repository, deep lineage, and built-in team collaboration.

It is one of the fastest-growing open-source projects, with a vibrant community and adoption across diverse industries.

Based on Open Metadata Standards and APIs, and supporting connectors for a wide range of data services, OpenMetadata enables end-to-end metadata management.

The core idea is that metadata is a first-class asset and should be:

  • Easy to query and explore (discovery).
  • Reliable and up-to-date (observability and quality).
  • Contextualized (who uses it, for what purpose, and in which flow it fits).
  • Governed (with owners, policies, classifications).
Figure 1 - OpenMetadata
Figure 1 - OpenMetadata

What it does

  • Cataloging and discovery of assets (tables, dashboards, pipelines, models).
  • Collaborative documentation: descriptions, comments, data dictionary.
  • Governance and quality: ownership assignment, policies, built-in data tests.
  • Visual lineage, including at column level.
  • Observability: freshness, volume, and quality metrics, plus alerts.
  • Collaboration and notifications via comments, webhooks, access control.

OpenMetadata features:

  • Licensed under the Apache License, currently version 2.0
  • Simplified architecture: essentially 4 services (API Server, relational DB, Elasticsearch, Ingestion).
  • Extensible ingestion framework: 90+ ready-to-use connectors, with simple paths to build new ones.
  • Schema-first: metadata contracts are versioned JSON Schemas.
  • Open APIs: full functionality exposed via REST, easing automation and CI/CD integration.
  • Modern UI: one interface for search, lineage visualization, quality dashboards, and ingestion workflows.

OpenMetadata architecture

OpenMetadata follows a layered architecture:

  • UI (Frontend): React-based web interface for exploration, collaboration, and governance.
  • API Server (Backend): implemented in Java (Dropwizard/Jetty), exposes REST APIs and enforces consistency.
  • Metadata Store (relational DB, typically MySQL/PostgreSQL): stores entities, relationships, and history.
  • Search Engine (Opensearch): indexes all assets for fast, faceted search.
  • Ingestion Framework: implemented in Python, runs as pipelines (can be scheduled via Airflow or standalone), collects metadata from sources and sends it to the API.
Figure 2 - OpenMetadata architecture
Figure 2 - OpenMetadata architecture

Its main components are:

  • Entities: table, column, user, pipeline, dashboard, etc.
  • Ingestion Workflows: connectors for databases, ETL, BI, ML.
  • Glossary/Taxonomy: business term dictionary and domains.
  • Policies & Roles: fine-grained access control.
  • Lineage Graph: engine that builds and visualizes dependencies.
  • Collaborative UI: comments, notifications, Slack/Jira integration.

OpenMetadata capabilities

  • Ready-made connectors for databases (Snowflake, BigQuery, Redshift, Hive), BI (Looker, Superset), pipelines (Airflow, Dagster, Spark).
  • Detailed lineage with drill-down to individual columns.
  • Data Quality: create native tests (e.g., expected values, thresholds, consistency).
  • Observability: dashboards for freshness, volume, and popularity.
  • Governance: ownership assignment, sensitive-data classification, organizational domains.
  • Extensibility: easily build custom connectors and schemas.

Good practices when using OpenMetadata

  • Start small: catalog critical sources and expand gradually.
  • Automate ingestion: schedule workflows (Airflow, cron).
  • Define clear ownership: assign owners for tables, dashboards, and pipelines.
  • Adopt a corporate glossary: business terms help non-technical users understand data.
  • Monitor quality: set minimum tests on critical datasets.
  • Integrate communications: Slack, Teams, Jira for alerts and collaboration.
  • Version and audit: keep history of metadata and changes.
  • Use RBAC and policies: restrict who can edit what.
  • Engage with the community: leverage existing extensions and contribute back.

Project details

OpenMetadata’s backend is developed in Java with Dropwizard/Jetty.
It implements the Ingestion Framework in Python and the frontend (UI) in TypeScript/React.

TDP Kubernetes

Available in TDP Kubernetes

This component is also available in the TDP Kubernetes edition (from version 3.0.0) with version 1.9.11, deployed via Helm Chart tdp-openmetadata. For configuration details, see the OpenMetadata documentation in TDP Kubernetes.

Source(s):