I am looking to extract parts of an address. I need the direction and street name only. I’ve tried a 100 ways to get ChatGPT help on this and looked through other posts. Here is where ChatGPT has gotten me so far. It seems that it works unless the user fails to input the street type or the street type isn’t capitalized?
I’m not good with regex and I question if AI is as well. Especially when there are a lot of variations in addresses. Curious though what you are expecting if direction is written after the street name and type, which can be common. I’m working on a javascript solution for this and trying to cover as many variations as possible.
This is what I have so far. Pass the address in the p1 parameter. Let me know if anything is still slipping through or not giving results as expected.
function extractStreetNameWithFlexibleDirection(address) {
// Expanded set of common street types and abbreviations.
const streetTypes = new Set([
"Road", "Rd",
"Street", "St",
"Avenue", "Ave",
"Boulevard", "Blvd",
"Drive", "Dr",
"Lane", "Ln",
"Court", "Ct",
"Place", "Pl",
"Way",
"Parkway", "Pkwy",
"Circle", "Cir",
"Highway", "Hwy",
"Trail", "Trl",
"Terrace", "Ter",
"Alley", "Aly",
"Cove", "Cv",
"Expressway", "Expy",
"Square", "Sq",
"Loop",
"Plaza", "Plz",
"Ridge",
"Row",
"Park",
"Gardens",
"Glen",
"Grove",
"Commons",
"Center", "Centre",
"Crossing"
]);
// Define valid directional tokens (compare using uppercase)
const directions = new Set([
"N", "S", "E", "W", "NE", "NW", "SE", "SW",
"NORTH", "SOUTH", "EAST", "WEST"
]);
// Helper: Proper-case the final string (capitalize first letter of each word)
function toProperCase(str) {
return str.replace(/\w\S*/g, function(txt) {
return txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase();
});
}
// Remove commas (and similar punctuation) and split the address into tokens.
let tokens = address.replace(/[,]/g, "").split(/\s+/);
// Remove the house number (first token) if it is entirely numeric.
if (tokens.length && /^\d+$/.test(tokens[0])) {
tokens.shift();
}
// Check for a directional token at the beginning.
let prefixDir = "";
if (tokens.length && directions.has(tokens[0].toUpperCase())) {
prefixDir = tokens.shift().toUpperCase() + " ";
}
// We now have the remaining tokens (which may include parts of the street name,
// possible intermediate street type words, and a final street type).
// We'll scan for the **last occurrence** of any recognized street type.
let finalStreetTypeIndex = -1;
for (let i = 0; i < tokens.length; i++) {
for (let st of streetTypes) {
if (tokens[i].toLowerCase() === st.toLowerCase()) {
finalStreetTypeIndex = i; // update index; don't break out early
break; // break inner loop once a match is found for this token
}
}
}
let streetNameTokens = [];
let suffixDir = "";
if (finalStreetTypeIndex !== -1) {
// Use everything BEFORE the final street type as the complete street name.
// (This preserves any street type that appears as part of the street name earlier.)
streetNameTokens = tokens.slice(0, finalStreetTypeIndex);
// Check if there's a trailing directional token immediately after the final street type.
if (tokens.length > finalStreetTypeIndex + 1 &&
directions.has(tokens[finalStreetTypeIndex + 1].toUpperCase())) {
suffixDir = " " + tokens[finalStreetTypeIndex + 1].toUpperCase();
}
} else {
// If no street type is found, assume the remaining tokens are all part of the street name.
streetNameTokens = tokens;
}
// Assemble the final result and proper-case it.
const result = (prefixDir + streetNameTokens.join(" ") + suffixDir).trim();
return toProperCase(result);
}
return extractStreetNameWithFlexibleDirection(p1);
WOW! Thank you. As far as the direction, I have only seen on scenario in our data where the direction was entered at the end. I am considering changing my form so going forward the user has to choose a direction from a choice menu to make it more consistent.
That worked perfectly for what we have in our system now. Thank you so much!!!
The javascript will account for direction either before or after the street.
Let me know if you come across anything that doesn’t give the correct result. I’m only testing with a handful of addresses, so I’m wouldn’t be surprised if it’s missing something.
Do you have access to Call API? There are APIs out there that offers this exact service.