How to get state from address

hello, im looking for help to pull just the state from an address i have address being verified by google and have complete address in a db but cant seem to pull out just the abbreviated state? any help would be great!

Sample data:

# Address Expected State
1 123 Main St, Anytown, CA 90210 CA
2 456 Oak Avenue, Springfield, IL 62704-5678 IL
3 789 Pine Ln Smallville NY 10001 NY
4 PO Box 100 Somewhere AZ 85001-9999 AZ
5 1600 Pennsylvania Avenue NW, Washington, DC 20500 DC
6 1 Mountain Rd, Boulder , CO 80302 CO
7 Unit NV, 100 Casino Way, Las Vegas, NV 89109 NV
8 1 Apple Park Way, Cupertino, California 95014 null
9 1 Canada Way, Ottawa, ON K1A 0A9 null
10 10 Downing St, London null

Actual implementation with a JavaScript column, address being p1.

function extractStateAbbreviation(addressString) {
  if (!addressString || typeof addressString !== 'string') {
    return null;
  }

  const regex = /,\s*([A-Z]{2})\s+\d{5}(-\d{4})?\s*$/;
  const match = addressString.match(regex);

  if (match && match[1]) {
    return match[1];
  }

  const regexNoComma = /\b([A-Z]{2})\s+\d{5}(-\d{4})?\s*$/;
  const matchNoComma = addressString.match(regexNoComma);
   if (matchNoComma && matchNoComma[1]) {
     const potentialStateIndex = matchNoComma.index;
     if (potentialStateIndex > 0) {
        const charBefore = addressString[potentialStateIndex-1];
        if (charBefore === ' ' || charBefore === ',') {
             return matchNoComma[1];
        }
     }
  }

  return "";
}

return extractStateAbbreviation(p1)

What the function does (and it relies on you having a ZIP code in the address, it can return false positives if we adjust the function to not require the ZIP code though):

  • First, it quickly checks if you actually gave it an address string. If the input is empty, missing, or not text at all, it immediately stops and returns null (which basically means “I couldn’t find anything”).

The Main Search (Looking for “, ST ZIP”):

  • It then uses a specific search pattern (this is the regex part) designed to find the most common way a state appears in a US address.

  • It looks for this sequence near the end of the address string:

    • A comma (,)

    • Followed by maybe some spaces (\s*)

    • Followed by exactly two uppercase letters ([A-Z]{2}) - this is what it hopes is the state.

    • Followed by one or more spaces (\s+)

    • Followed by exactly five numbers (\d{5}) - the main ZIP code.

    • Optionally, followed by a hyphen and four more numbers ((-\d{4})?) - the ZIP+4 code.

    • And maybe some more spaces at the very end (\s*$).

  • If it finds this exact pattern, it grabs those two uppercase letters it found and returns them as the result.

The Backup Search (Looking for " ST ZIP" without a comma):

  • If the first search didn’t find anything (maybe the address was formatted like “Anytown ST 12345” without a comma before the state), it tries a second, slightly different search pattern.

  • This pattern looks for:

    • The start of a “word” (\b) - basically making sure it’s not in the middle of another word.

    • Followed by exactly two uppercase letters ([A-Z]{2}).

    • Followed by one or more spaces (\s+).

    • Followed by the five-digit ZIP code (\d{5}), optionally with the ZIP+4 ((-\d{4})?).

    • All happening near the end of the address (\s*$).

  • Extra Check: If it finds this pattern, it does one small extra check: it looks at the character immediately before the two letters it found. If that character is a space or a comma, it’s more confident that it actually found the state (and not just two letters that happen to be at the end of the city name). If this check passes, it returns the two letters.

1 Like

Using Glide computed columns only (without using JavaScript).

I never have to deal with such cases, but I would attempt to approach the problem as follows:

  1. I would split the address using the comma as a delimiter.
  2. I would assume that the state abbreviation was in the last element of the array, so maybe depending on the number of elements I would try to isolate the last element or remove the first ones. I’m not sure I’d be able to achieve this, but possibly.
  3. I might have to convert the array to text and then I’d remove any trailing spaces at the beginning and end.
  4. I’d split again using space as the delimiter.
  5. I would assume that the state abbreviation would be the first element of the array. And just like previously I would try to isolate it.

In the experimental columns there is a group of columns to work with arrays. Those would come in handy.

There are a few assumptions here on the position of the abbreviation. I suppose there could be a final verification to assess the length of the resulting string. If >3, then it would not be the state abbreviation.

This is how I would approach it. I’m not saying it’s a good way, but it could be a way.

2 Likes