PostGIS
Chapter 11. PostGIS Extras
Table of Contents
This chapter documents features found in the extras folder of the PostGIS source tarballs and source repository. These are not always packaged with PostGIS binary releases, but are usually PL/pgSQL based or standard shell scripts that can be run as is.
11.1. Address Standardizer
This is a fork of the PAGC standardizer (original code for this portion was PAGC PostgreSQL Address Standardizer).
The address standardizer is a single line address parser that takes an input address and normalizes it based on a set of rules stored in a table and helper lex and gaz tables.
The code is built into a single PostgreSQL extension library called
address_standardizer
which can be installed with
CREATE EXTENSION address_standardizer;
. In addition to the
address_standardizer extension, a sample data extension called
address_standardizer_data_us
extensions is built, which contains
gaz, lex, and rules tables for US data. This extensions can be installed
via: CREATE EXTENSION address_standardizer_data_us;
The code for this extension can be found in the PostGIS
extensions/address_standardizer
and is currently self-contained.
For installation instructions refer to: Section 2.3, “Installing and Using the address standardizer”.
11.1.1. How the Parser Works
The parser works from right to left looking first at the macro elements for postcode, state/province, city, and then looks micro elements to determine if we are dealing with a house number street or intersection or landmark. It currently does not look for a country code or name, but that could be introduced in the future.
- Country code
-
Assumed to be US or CA based on: postcode as US or Canada state/province as US or Canada else US
- Postcode/zipcode
-
These are recognized using Perl compatible regular expressions. These regexs are currently in the parseaddress-api.c and are relatively simple to make changes to if needed.
- State/province
-
These are recognized using Perl compatible regular expressions. These regexs are currently in the parseaddress-api.c but could get moved into includes in the future for easier maintenance.
11.1.2. Address Standardizer Types
Abstract
This section lists the PostgreSQL data types installed by Address Standardizer extension. Note we describe the casting behavior of these which is very important especially when designing your own functions.
stdaddr[.refpurpose]# — A composite
type that consists of the elements of an address. This is the return
type for standardize_address
function.#
11.1.3. Address Standardizer Tables
Abstract
This section lists the PostgreSQL table formats used by the address_standardizer for normalizing addresses. Note that these tables do not need to be named the same as what is referenced here. You can have different lex, gaz, rules tables for each country for example or for your custom geocoder. The names of these tables get passed into the address standardizer functions.
The packaged extension address_standardizer_data_us
contains data
for standardizing US addresses.
rules table[.refpurpose]# — The rules table contains a set of rules that maps address input sequence tokens to standardized output sequence. A rule is defined as a set of input tokens followed by -1 (terminator) followed by set of output tokens followed by -1 followed by number denoting kind of rule followed by ranking of rule.#
lex table[.refpurpose]# — A lex table is used to classify alphanumeric input and associate that input with (a) input tokens ( See the section called “Input Tokens”) and (b) standardized representations.#
gaz table[.refpurpose]# — A gaz table is used to standardize place names and associate that input with (a) input tokens ( See the section called “Input Tokens”) and (b) standardized representations.#
11.1.4. Address Standardizer Functions
debug_standardize_address[.refpurpose]# — Returns a json formatted text listing the parse tokens and standardizations#
parse_address[.refpurpose]# — Takes a 1 line address and breaks into parts#
standardize_address[.refpurpose]# — Returns an stdaddr form of an input address utilizing lex, gaz, and rule tables.#
11.2. Tiger Geocoder
Abstract
A plpgsql based geocoder written to work with the TIGER (Topologically Integrated Geographic Encoding and Referencing system ) / Line and Master Address database export released by the US Census Bureau.
There are four components to the geocoder: the data loader functions, the address normalizer, the address geocoder, and the reverse geocoder.
Although it is designed specifically for the US, a lot of the concepts and functions are applicable and can be adapted to work with other country address and road networks.
The script builds a schema called tiger
to house all the tiger
related functions, reusable lookup data such as road type prefixes,
suffixes, states, various control tables for managing data load, and
skeleton base tables from which all the tiger loaded tables inherit
from.
Another schema called tiger_data
is also created which houses all
the census data for each state that the loader downloads from Census
site and loads into the database. In the current model, each set of
state tables is prefixed with the state code e.g ma_addr
,
ma_edges
etc with constraints to enforce only that state data. Each
of these tables inherits from the tables addr
, faces
, edges
,
etc located in the tiger schema
.
All the geocode functions only reference the base tables, so there is no
requirement that the data schema be called tiger_data
or that data
can’t be further partitioned into other schemas — e.g a different
schema for each state, as long as all the tables inherit from the tables
in the tiger
schema.
For instructions on how to enable the extension in your database and also to load data using it, refer to Section 2.4.1, “Tiger Geocoder Enabling your PostGIS database”.
|
|
The Pagc_Normalize_Address function as a drop in replacement for in-built Normalize_Address. Refer to Section 2.3, “Installing and Using the address standardizer” for compile and installation instructions.
Design:
The goal of this project is to build a fully functional geocoder that can process an arbitrary United States address string and using normalized TIGER census data, produce a point geometry and rating reflecting the location of the given address and likeliness of the location. The higher the rating number the worse the result.
The reverse_geocode
function, introduced in PostGIS 2.0.0 is useful
for deriving the street address and cross streets of a GPS location.
The geocoder should be simple for anyone familiar with PostGIS to install and use, and should be easily installable and usable on all platforms supported by PostGIS.
It should be robust enough to function properly despite formatting and spelling errors.
It should be extensible enough to be used with future data updates, or alternate data sources with a minimum of coding changes.
|
Drop_Indexes_Generate_Script[.refpurpose]#
— Generates a script that drops all non-primary key and non-unique
indexes on tiger schema and user specified schema. Defaults schema to
tiger_data
if no schema is specified.#
Drop_Nation_Tables_Generate_Script[.refpurpose]#
— Generates a script that drops all tables in the specified schema that
start with county_all
, state_all
or state code followed by
county
or state
.#
Drop_State_Tables_Generate_Script[.refpurpose]#
— Generates a script that drops all tables in the specified schema that
are prefixed with the state abbreviation. Defaults schema to
tiger_data
if no schema is specified.#
Geocode[.refpurpose]# — Takes in an address as a string (or other normalized address) and outputs a set of possible locations which include a point geometry in NAD 83 long lat, a normalized address for each, and the rating. The lower the rating the more likely the match. Results are sorted by lowest rating first. Can optionally pass in maximum results, defaults to 10, and restrict_region (defaults to NULL)#
Geocode_Intersection[.refpurpose]#
— Takes in 2 streets that intersect and a state, city, zip, and outputs
a set of possible locations on the first cross street that is at the
intersection, also includes a geomout as the point location in NAD 83
long lat, a normalized_address
(addy) for each location, and the
rating. The lower the rating the more likely the match. Results are
sorted by lowest rating first. Can optionally pass in maximum results,
defaults to 10. Uses Tiger data (edges, faces, addr), PostgreSQL fuzzy
string matching (soundex, levenshtein).#
Get_Geocode_Setting[.refpurpose]# — Returns value of specific setting stored in tiger.geocode_settings table.#
Get_Tract[.refpurpose]# — Returns census tract or field from tract table of where the geometry is located. Default to returning short name of tract.#
Install_Missing_Indexes[.refpurpose]# — Finds all tables with key columns used in geocoder joins and filter conditions that are missing used indexes on those columns and will add them.#
Loader_Generate_Census_Script[.refpurpose]#
— Generates a shell script for the specified platform for the specified
states that will download Tiger census state tract, bg, and tabblocks
data tables, stage and load into tiger_data
schema. Each state
script is returned as a separate record.#
Loader_Generate_Script[.refpurpose]#
— Generates a shell script for the specified platform for the specified
states that will download Tiger data, stage and load into tiger_data
schema. Each state script is returned as a separate record. Latest
version supports Tiger 2010 structural changes and also loads census
tract, block groups, and blocks tables.#
Loader_Generate_Nation_Script[.refpurpose]# — Generates a shell script for the specified platform that loads in the county and state lookup tables.#
Missing_Indexes_Generate_Script[.refpurpose]# — Finds all tables with key columns used in geocoder joins that are missing indexes on those columns and will output the SQL DDL to define the index for those tables.#
Normalize_Address[.refpurpose]#
— Given a textual street address, returns a composite norm_addy
type
that has road suffix, prefix and type standardized, street, streetname
etc. broken into separate fields. This function will work with just the
lookup data packaged with the tiger_geocoder (no need for tiger census
data).#
Pagc_Normalize_Address[.refpurpose]#
— Given a textual street address, returns a composite norm_addy
type
that has road suffix, prefix and type standardized, street, streetname
etc. broken into separate fields. This function will work with just the
lookup data packaged with the tiger_geocoder (no need for tiger census
data). Requires address_standardizer extension.#
Pprint_Addy[.refpurpose]# —
Given a norm_addy
composite type object, returns a pretty print
representation of it. Usually used in conjunction with
normalize_address.#
Reverse_Geocode[.refpurpose]# — Takes a geometry point in a known spatial ref sys and returns a record containing an array of theoretically possible addresses and an array of cross streets. If include_strnum_range = true, includes the street range in the cross streets.#
Topology_Load_Tiger[.refpurpose]# — Loads a defined region of tiger data into a PostGIS Topology and transforming the tiger data to spatial reference of the topology and snapping to the precision tolerance of the topology.#
Set_Geocode_Setting[.refpurpose]# — Sets a setting that affects behavior of geocoder functions.#
There are a couple other open source geocoders for PostGIS, that unlike tiger geocoder have the advantage of multi-country geocoding support
-
Nominatim uses OpenStreetMap gazeteer formatted data. It requires osm2pgsql for loading the data, PostgreSQL 8.4+ and PostGIS 1.5+ to function. It is packaged as a webservice interface and seems designed to be called as a webservice. Just like the tiger geocoder, it has both a geocoder and a reverse geocoder component. From the documentation, it is unclear if it has a pure SQL interface like the tiger geocoder, or if a good deal of the logic is implemented in the web interface.
-
GIS Graphy also utilizes PostGIS and like Nominatim works with OpenStreetMap (OSM) data. It comes with a loader to load OSM data and similar to Nominatim is capable of geocoding not just US. Much like Nominatim, it runs as a webservice and relies on Java 1.5, Servlet apps, Solr. GisGraphy is cross-platform and also has a reverse geocoder among some other neat features.
ST_DFullyWithin |
stdaddr |