OpenMetadata
Metadata Management

OpenMetadata is a unified open-source metadata management platform designed for data discovery, governance, data quality, observability, and collaboration.
OpenMetadata is powered by a central metadata repository, deep lineage, and built-in team collaboration.
It is one of the fastest-growing open-source projects, with a vibrant community and adoption across diverse industries.
Based on Open Metadata Standards and APIs, and supporting connectors for a wide range of data services, OpenMetadata enables end-to-end metadata management.
The core idea is that metadata is a first-class asset and should be:
- Easy to query and explore (discovery).
- Reliable and up-to-date (observability and quality).
- Contextualized (who uses it, for what purpose, and in which flow it fits).
- Governed (with owners, policies, classifications).

What it does
- Cataloging and discovery of assets (tables, dashboards, pipelines, models).
- Collaborative documentation: descriptions, comments, data dictionary.
- Governance and quality: ownership assignment, policies, built-in data tests.
- Visual lineage, including at column level.
- Observability: freshness, volume, and quality metrics, plus alerts.
- Collaboration and notifications via comments, webhooks, access control.
OpenMetadata features:
- Licensed under the Apache License, currently version 2.0
- Simplified architecture: essentially 4 services (API Server, relational DB, Elasticsearch, Ingestion).
- Extensible ingestion framework: 90+ ready-to-use connectors, with simple paths to build new ones.
- Schema-first: metadata contracts are versioned JSON Schemas.
- Open APIs: full functionality exposed via REST, easing automation and CI/CD integration.
- Modern UI: one interface for search, lineage visualization, quality dashboards, and ingestion workflows.
OpenMetadata architecture
OpenMetadata follows a layered architecture:
- UI (Frontend): React-based web interface for exploration, collaboration, and governance.
- API Server (Backend): implemented in Java (Dropwizard/Jetty), exposes REST APIs and enforces consistency.
- Metadata Store (relational DB, typically MySQL/PostgreSQL): stores entities, relationships, and history.
- Search Engine (Opensearch): indexes all assets for fast, faceted search.
- Ingestion Framework: implemented in Python, runs as pipelines (can be scheduled via Airflow or standalone), collects metadata from sources and sends it to the API.

Its main components are:
- Entities: table, column, user, pipeline, dashboard, etc.
- Ingestion Workflows: connectors for databases, ETL, BI, ML.
- Glossary/Taxonomy: business term dictionary and domains.
- Policies & Roles: fine-grained access control.
- Lineage Graph: engine that builds and visualizes dependencies.
- Collaborative UI: comments, notifications, Slack/Jira integration.
OpenMetadata capabilities
- Ready-made connectors for databases (Snowflake, BigQuery, Redshift, Hive), BI (Looker, Superset), pipelines (Airflow, Dagster, Spark).
- Detailed lineage with drill-down to individual columns.
- Data Quality: create native tests (e.g., expected values, thresholds, consistency).
- Observability: dashboards for freshness, volume, and popularity.
- Governance: ownership assignment, sensitive-data classification, organizational domains.
- Extensibility: easily build custom connectors and schemas.
Good practices when using OpenMetadata
- Start small: catalog critical sources and expand gradually.
- Automate ingestion: schedule workflows (Airflow, cron).
- Define clear ownership: assign owners for tables, dashboards, and pipelines.
- Adopt a corporate glossary: business terms help non-technical users understand data.
- Monitor quality: set minimum tests on critical datasets.
- Integrate communications: Slack, Teams, Jira for alerts and collaboration.
- Version and audit: keep history of metadata and changes.
- Use RBAC and policies: restrict who can edit what.
- Engage with the community: leverage existing extensions and contribute back.
Project details
OpenMetadata’s backend is developed in Java with Dropwizard/Jetty.
It implements the Ingestion Framework in Python and the frontend (UI) in TypeScript/React.
TDP Kubernetes
This component is also available in the TDP Kubernetes edition (from version 3.0.0) with version 1.9.11, deployed via Helm Chart tdp-openmetadata.
For configuration details, see the OpenMetadata documentation in TDP Kubernetes.
Source(s):