Let’s ask ChatGPT.
Task: Given an array of 10,000 strings, and a user supplied search term, write a JavaScript algorithm to efficiently search the array and return the 5 closest matches.
Response: To efficiently search for the 5 closest matches in an array of 10,000 strings, you can use the Levenshtein distance algorithm to measure the similarity between strings. The Levenshtein distance calculates the minimum number of single-character edits (insertion, deletion, or substitution) required to transform one string into another. The lower the Levenshtein distance between two strings, the more similar they are.
Here’s a JavaScript algorithm to achieve this:
function levenshteinDistance(s1, s2) {
const m = s1.length;
const n = s2.length;
const dp = Array.from(Array(m + 1), () => Array(n + 1).fill(0));
for (let i = 0; i <= m; i++) {
for (let j = 0; j <= n; j++) {
if (i === 0) dp[i][j] = j;
else if (j === 0) dp[i][j] = i;
else if (s1.charAt(i - 1) === s2.charAt(j - 1)) dp[i][j] = dp[i - 1][j - 1];
else {
dp[i][j] = 1 + Math.min(dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1]);
}
}
}
return dp[m][n];
}
function findClosestMatches(searchTerm, array, numMatches = 5) {
const sortedArray = array.slice().sort((a, b) => {
const distanceA = levenshteinDistance(searchTerm, a);
const distanceB = levenshteinDistance(searchTerm, b);
return distanceA - distanceB;
});
return sortedArray.slice(0, numMatches);
}
// Example usage:
const array = [...]; // Your array of 10,000 strings
const searchTerm = "user_supplied_search_term";
const closestMatches = findClosestMatches(searchTerm, array, 5);
console.log(closestMatches);
Replace [...]
with your array of 10,000 strings. The findClosestMatches
function takes the user-supplied search term and the array as input and returns an array of the numMatches
(default is 5) closest matches in the array, sorted by their Levenshtein distance to the search term. The result will contain the top 5 closest strings to the search term.
Please note that the Levenshtein distance algorithm has a time complexity of O(m * n), where m and n are the lengths of the two input strings. However, since the number of strings is fixed (10,000 in this case), the time complexity for this scenario would be O(10,000 * average_string_length). In practice, this algorithm should perform well for a reasonable average string length.