PostGIS

ST_ClusterDBSCAN

Prev

8.16. Clustering Functions

Next

Name

ST_ClusterDBSCAN — Window function that returns a cluster id for each input geometry using the DBSCAN algorithm.

Synopsis

integer +`*`+ST_ClusterDBSCAN*(`geometry winset `geom, float8 eps, integer minpoints`)`;

Description

Returns cluster number for each input geometry, based on a 2D implementation of the Density-based spatial clustering of applications with noise (DBSCAN) algorithm. Unlike ST_ClusterKMeans, it does not require the number of clusters to be specified, but instead uses the desired distance (eps) and density (minpoints) parameters to construct each cluster.

An input geometry will be added to a cluster if it is either:

  • A "core" geometry, that is within eps distance of at least minpoints input geometries (including itself) or

  • A "border" geometry, that is within eps distance of a core geometry.

Note that border geometries may be within eps distance of core geometries in more than one cluster; in this case, either assignment would be correct, and the border geometry will be arbitrarily asssigned to one of the available clusters. In these cases, it is possible for a correct cluster to be generated with fewer than minpoints geometries. When assignment of a border geometry is ambiguous, repeated calls to ST_ClusterDBSCAN will produce identical results if an ORDER BY clause is included in the window definition, but cluster assignments may differ from other implementations of the same algorithm.

Note

Input geometries that do not meet the criteria to join any other cluster will be assigned a cluster number of NULL.

Availability: 2.3.0

Examples

Assigning a cluster number to each polygon within 50 meters of each other. Require at least 2 polygons per cluster

st_clusterdbscan01

within 50 meters at least 2 per cluster. singletons have NULL for cid

SELECT name, ST_ClusterDBSCAN(geom, eps := 50, minpoints := 2) over () AS cid
FROM boston_polys
WHERE name > '' AND building > ''
    AND ST_DWithin(geom,
        ST_Transform(
            ST_GeomFromText('POINT(-71.04054 42.35141)', 4326), 26986),
           500);
                name

bucket -------------------------------------+-------- Manulife Tower

0 Park Lane Seaport I

0 Park Lane Seaport II

0 Renaissance Boston Waterfront Hotel

0 Seaport Boston Hotel

0 Seaport Hotel & World Trade Center

0 Waterside Place

0 World Trade Center East

0 100 Northern Avenue

1 100 Pier 4

1 The Institute of Contemporary Art

1 101 Seaport

2 District Hall

2 One Marina Park Drive

2 Twenty Two Liberty

2 Vertex

2 Vertex

2 Watermark Seaport

2 Blue Hills Bank Pavilion

NULL World Trade Center West

Combining parcels with the same cluster number into a single geometry. This uses named argument calling

SELECT cid, ST_Collect(geom) AS cluster_geom, array_agg(parcel_id) AS ids_in_cluster FROM (
    SELECT parcel_id, ST_ClusterDBSCAN(geom, eps := 0.5, minpoints := 5) over () AS cid, geom
    FROM parcels) sq
GROUP BY cid;

See Also