Skip to content

Latest commit

 

History

History
48 lines (34 loc) · 1.49 KB

README.md

File metadata and controls

48 lines (34 loc) · 1.49 KB

address-standardizer

An address parser and standardizer in C++

The goal is to be able to take an address string and parse it into tokens classify and standardize the tokens using an appropriate lexicon, then map the tokens to specific output fields based on rules.

The purpose of this code is to be used as part of a geocoding or record matching system and is not useful for standardizing addresses for postal correctness or delivery point solutions.

Dependencies

  • boost/regex.hpp
  • boost/regex/icu.hpp
  • boost/lexical_cast.hpp
  • boost/algorithm/string.hpp
  • boost/algorithm/string/join.hpp
  • boost/algorithm/string/split.hpp
  • boost/algorithm/string/classification.hpp
  • boost/serialization/*
  • unicode/* (from icu packages)
  • PostgreSQL server development for 9.2, 9.3, 9.4, 9.5, and/or 9.6

For Ubuntu/Debian

sudo apt-get install libicu-dev libicu52 libboost1.55-dev libboost-serialization1.55-dev libboost-serialization1.55.0 libboost-regex1.55-dev libboost-regex1.55.0

Build

See file src/README.address\standardizer

Getting Started

See script tools/getting-started.sh

Funding Needed

There are a lot of pieces to this project and funding will help it along. There is a lot of code to write, test and document. Beyond that we need to develop lexicons and rules addressing standards for every country that one might want to support.

Please contact me if you are interested in this project and/or you can provide some amount of funding. woodbri (at) imaptools (dot) com.