Seeking Efficient Method to Search a Large Array Column in Glide

Hello Glide Community,

I’m dealing with a single array cell (note, all in a single cell) containing around 10k values. I need to enable a search where users can enter a value in a textbox, and the top 5 closest matches appear. Selecting a match would print the value in a different cell.

From a UX perspective, users enters value, list sorts matches, filter limits results to 5, selecting a match is a list action to print the value.

Could the Joined Lists column and regular expressions be the answer, or might there be a more efficient way? Your insights are highly valued, and your thoughts would be greatly appreciated!

I’m thinking- maybe even the data structure integration…

set to elements and set contains elements

Cc @Darren_Murphy

Is related to your other question where you’re parsing a JSON payload?

I think I’d be inclined to use JavaScript. Just need to be a little bit wary with regular expressions. If not crafted well, they can be horribly inefficient.

Yes, it is related. I’m looking to avoid partial searches, I will have all available values loaded locally and perform the search locally-

I will then execute the call when the user selects a value.

I think I’d be inclined to use JavaScript. Just need to be a little bit wary with regular expressions. If not crafted well, they can be horribly inefficient.

Yes, I’m aware, they can be quite finicky…
Is there a JavaScript function you would recommend?
It’s a clean array of values.

The JavaScript column won’t accept arrays, so you’d have to pass the whole thing as a joined list anyway.

I think the trickiest part will be returning “the top 5 matches”. That suggests that you’re going to have to rank/score each match in some way. Have you thought about how you’ll do that? How well will something need to match to be considered “good enough”, and how will you evaluate that?

wait,
what about having a list of comma seperated values in a text component, and then using query and self relations to feed a choice component virtual rows of data? I then use the componenets built-in search capability

Sorry, I’m not following your thinking.
If you use a query on a single row of data, it’s either going to return one row or zero rows.

Am wondering whether there’s a way to feed the choice components from split text values/virtual arrays and not actual rows…

Nope, I’m afraid not. There is a feature request for that floating around somewhere…

Maybe you could use one of the AI integrations?

what about creating a virtual list instead- so as to utilize the collection component and use the native search module?

Sure, you could do that - with a helper table. But I thought you were trying to avoid that?

what do you mean by a helper table?
i’m trying to avoid creating an actual table with 10k rows. But I’m ok with a virtual list that is derived from a single cell of strings.

I meant a Helper Table with actual rows and actual columns.

You seem to be fixated on this virtual list idea. And sure, you can create that. In fact, that’s basically what a joined list is. But there are limits to what you can do with that. You can’t use it as the source of a Choice Component, and you can’t use it as the source of a Collection. To do either of those things, you need to transpose your list across multiple rows. And for this you normally use a Helper Table.

And so we’ve come full circle :slight_smile:

Getting back to your original question, I still think JavaScript is probably the best approach. You just need to be clear about your goal and matching criteria.

Yeah, fixated as if I’m on a mission :))
I’ll explain the fundamental reasoning-

I’m diligently exploring a method for storing data flexibly and syncing it efficiently. I’ve mostly succeeded and implemented it beautifully, but my roadblock is searching data.

Given that I can only import limited rows, I need a system to search all values (e.g., 10k organization names). My proposed solution is to sync a single, lightweight array/joined-list containing all values. However, the challenge lies in indexing/searching this list and highlighting the closest matches for user selection-

Which essentially triggers the appropriate data fetch of said value.

Let’s ask ChatGPT.

Task: Given an array of 10,000 strings, and a user supplied search term, write a JavaScript algorithm to efficiently search the array and return the 5 closest matches.

Response: To efficiently search for the 5 closest matches in an array of 10,000 strings, you can use the Levenshtein distance algorithm to measure the similarity between strings. The Levenshtein distance calculates the minimum number of single-character edits (insertion, deletion, or substitution) required to transform one string into another. The lower the Levenshtein distance between two strings, the more similar they are.

Here’s a JavaScript algorithm to achieve this:

function levenshteinDistance(s1, s2) {
  const m = s1.length;
  const n = s2.length;
  const dp = Array.from(Array(m + 1), () => Array(n + 1).fill(0));

  for (let i = 0; i <= m; i++) {
    for (let j = 0; j <= n; j++) {
      if (i === 0) dp[i][j] = j;
      else if (j === 0) dp[i][j] = i;
      else if (s1.charAt(i - 1) === s2.charAt(j - 1)) dp[i][j] = dp[i - 1][j - 1];
      else {
        dp[i][j] = 1 + Math.min(dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1]);
      }
    }
  }

  return dp[m][n];
}

function findClosestMatches(searchTerm, array, numMatches = 5) {
  const sortedArray = array.slice().sort((a, b) => {
    const distanceA = levenshteinDistance(searchTerm, a);
    const distanceB = levenshteinDistance(searchTerm, b);
    return distanceA - distanceB;
  });

  return sortedArray.slice(0, numMatches);
}

// Example usage:
const array = [...]; // Your array of 10,000 strings
const searchTerm = "user_supplied_search_term";
const closestMatches = findClosestMatches(searchTerm, array, 5);
console.log(closestMatches);

Replace [...] with your array of 10,000 strings. The findClosestMatches function takes the user-supplied search term and the array as input and returns an array of the numMatches (default is 5) closest matches in the array, sorted by their Levenshtein distance to the search term. The result will contain the top 5 closest strings to the search term.

Please note that the Levenshtein distance algorithm has a time complexity of O(m * n), where m and n are the lengths of the two input strings. However, since the number of strings is fixed (10,000 in this case), the time complexity for this scenario would be O(10,000 * average_string_length). In practice, this algorithm should perform well for a reasonable average string length.

2 Likes

I brainstormed all this with GPT-4 as well, tried finding a method without JS, but I do not have good enough reason to avoid this time!

There’s always a way, Will try this tomorrow! Truly appreciate you :pray:

Looks like it works okay, although not so sure how good that algorithm is. Might need to find a better one. But conceptually, I think this could work for you.

Here is the JavaScript I used:

function levenshteinDistance(s1, s2) {
  const m = s1.length;
  const n = s2.length;
  const dp = Array.from(Array(m + 1), () => Array(n + 1).fill(0));

  for (let i = 0; i <= m; i++) {
    for (let j = 0; j <= n; j++) {
      if (i === 0) dp[i][j] = j;
      else if (j === 0) dp[i][j] = i;
      else if (s1.charAt(i - 1) === s2.charAt(j - 1)) dp[i][j] = dp[i - 1][j - 1];
      else {
        dp[i][j] = 1 + Math.min(dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1]);
      }
    }
  }

  return dp[m][n];
}

function findClosestMatches(searchTerm, array, numMatches = 5) {
  const sortedArray = array.slice().sort((a, b) => {
    const distanceA = levenshteinDistance(searchTerm, a);
    const distanceB = levenshteinDistance(searchTerm, b);
    return distanceA - distanceB;
  });

  return sortedArray.slice(0, numMatches);
}

const array = p1.split(',');
return findClosestMatches(p2, array, 5).join('\n');
2 Likes

Wow!! And it’s blazing fast… :sparkles:

One last missing component, user selection of their desired match (rather than having to type it out).

Why? User selection will result in action to print the value for a fetch column to execute a query for that exact match.

Was gonna make that happen with a collection but I guess that’s impossible…

That’s the easy part. The JavaScript column returns a joined list of matches. All you need to do is split that and then transpose it, and you can use it as the source of a choice component.