PostGIS
Name
debug_standardize_address — Returns a json formatted text listing the parse tokens and standardizations
Synopsis
text +`*`+debug_standardize_address
*(`text `lextab
, text
gaztab
, text rultab
, text micro
, text macro=NULL`
)`;
Description
This is a function for debugging address standardizer rules and lex/gaz mappings. It returns a json formatted text that includes the matching rules, mapping of tokens, and best standardized address stdaddr form of an input address utilizing lex table table name, gaz table, and rules table table names and an address.
For single line addresses use just micro
For two line address A micro
consisting of standard first line of
postal address e.g. house_num street
, and a macro consisting of
standard postal second line of an address e.g
city, state postal_code country
.
Elements returned in the json document are
- input_tokens
-
For each word in the input address, returns the position of the word, token categorization of the word, and the standard word it is mapped to. Note that for some input words, you might get back multiple records because some inputs can be categorized as more than one thing.
- rules
-
The set of rules matching the input and the corresponding score for each. The first rule (highest scoring) is what is used for standardization
- stdaddr
-
The standardized address elements stdaddr that would be returned when running standardize_address
Availability: 3.4.0
This method needs address_standardizer extension.
Examples
Using address_standardizer_data_us extension
CREATE EXTENSION address_standardizer_data_us; -- only needs to be done once
Variant 1: Single line address and returning the input tokens
SELECT it->>'pos' AS position, it->>'word' AS word, it->>'stdword' AS standardized_word,
it->>'token' AS token, it->>'token-code' AS token_code
FROM jsonb(
debug_standardize_address('us_lex',
'us_gaz', 'us_rules', 'One Devonshire Place, PH 301, Boston, MA 02109')
) AS s, jsonb_array_elements(s->'input_tokens') AS it;
position | word | standardized_word | token | token_code
----------+------------+-------------------+--------+------------
0 | ONE | 1 | NUMBER | 0
0 | ONE | 1 | WORD | 1
1 | DEVONSHIRE | DEVONSHIRE | WORD | 1
2 | PLACE | PLACE | TYPE | 2
3 | PH | PATH | TYPE | 2
3 | PH | PENTHOUSE | UNITT | 17
4 | 301 | 301 | NUMBER | 0
(7 rows)
Variant 2: Multi line address and returning first rule input mappings and score
SELECT (s->'rules'->0->>'score')::numeric AS score, it->>'pos' AS position,
it->>'input-word' AS word, it->>'input-token' AS input_token, it->>'mapped-word' AS standardized_word,
it->>'output-token' AS output_token
FROM jsonb(
debug_standardize_address('us_lex',
'us_gaz', 'us_rules', 'One Devonshire Place, PH 301', 'Boston, MA 02109')
) AS s, jsonb_array_elements(s->'rules'->0->'rule_tokens') AS it;
score | position | word | input_token | standardized_word | output_token
----------+----------+------------+-------------+-------------------+--------------
0.876250 | 0 | ONE | NUMBER | 1 | HOUSE
0.876250 | 1 | DEVONSHIRE | WORD | DEVONSHIRE | STREET
0.876250 | 2 | PLACE | TYPE | PLACE | SUFTYP
0.876250 | 3 | PH | UNITT | PENTHOUSE | UNITT
0.876250 | 4 | 301 | NUMBER | 301 | UNITT
(5 rows)