Address Book Data Cleanup

Hi,

I’m working on address book solution to become a system-of-record for addresses in multiple foreign systems. I’ve have an export from another system with an incredible number of duplicates and near-duplicates with errors, typos, “Street” vs “St”, etc.

Total dataset is ~14,000 records. I suspect 70% of this is duplicates or near duplicates.

Have any of you tried using Glide or other tools to clean something up like this? Was thinking an LLM via API might be a solution?

The standing features in Excel are not up to the job on this.

My best guess would be something like this, or an autocomplete API, or maybe even a geocoding API?

You have to standardize all sorts of variations into a structured format, and also hope those APIs can correct the typos as well.

1 Like