Seeking Efficient Method to Search a Large Array Column in Glide

Eden · July 29, 2023, 4:55pm

Hello Glide Community,

I’m dealing with a single array cell (note, all in a single cell) containing around 10k values. I need to enable a search where users can enter a value in a textbox, and the top 5 closest matches appear. Selecting a match would print the value in a different cell.

From a UX perspective, users enters value, list sorts matches, filter limits results to 5, selecting a match is a list action to print the value.

Could the Joined Lists column and regular expressions be the answer, or might there be a more efficient way? Your insights are highly valued, and your thoughts would be greatly appreciated!

Eden · July 29, 2023, 5:18pm

I’m thinking- maybe even the data structure integration…

set to elements and set contains elements

Cc @Darren_Murphy

Darren_Murphy · July 29, 2023, 5:25pm

Is related to your other question where you’re parsing a JSON payload?

I think I’d be inclined to use JavaScript. Just need to be a little bit wary with regular expressions. If not crafted well, they can be horribly inefficient.

Eden · July 29, 2023, 5:32pm

Yes, it is related. I’m looking to avoid partial searches, I will have all available values loaded locally and perform the search locally-

I will then execute the call when the user selects a value.

I think I’d be inclined to use JavaScript. Just need to be a little bit wary with regular expressions. If not crafted well, they can be horribly inefficient.

Yes, I’m aware, they can be quite finicky…
Is there a JavaScript function you would recommend?
It’s a clean array of values.

Darren_Murphy · July 29, 2023, 5:40pm

The JavaScript column won’t accept arrays, so you’d have to pass the whole thing as a joined list anyway.

I think the trickiest part will be returning “the top 5 matches”. That suggests that you’re going to have to rank/score each match in some way. Have you thought about how you’ll do that? How well will something need to match to be considered “good enough”, and how will you evaluate that?

Eden · July 29, 2023, 6:34pm

wait,
what about having a list of comma seperated values in a text component, and then using query and self relations to feed a choice component virtual rows of data? I then use the componenets built-in search capability

Darren_Murphy · July 29, 2023, 6:45pm

Sorry, I’m not following your thinking.
If you use a query on a single row of data, it’s either going to return one row or zero rows.

Eden · July 29, 2023, 6:51pm

Am wondering whether there’s a way to feed the choice components from split text values/virtual arrays and not actual rows…

Darren_Murphy · July 29, 2023, 7:02pm

Nope, I’m afraid not. There is a feature request for that floating around somewhere…

Maybe you could use one of the AI integrations?

Eden · July 29, 2023, 8:18pm

what about creating a virtual list instead- so as to utilize the collection component and use the native search module?

Darren_Murphy · July 29, 2023, 8:41pm

Sure, you could do that - with a helper table. But I thought you were trying to avoid that?

Eden · July 29, 2023, 8:45pm

what do you mean by a helper table?
i’m trying to avoid creating an actual table with 10k rows. But I’m ok with a virtual list that is derived from a single cell of strings.

Darren_Murphy · July 30, 2023, 3:57am

I meant a Helper Table with actual rows and actual columns.

You seem to be fixated on this virtual list idea. And sure, you can create that. In fact, that’s basically what a joined list is. But there are limits to what you can do with that. You can’t use it as the source of a Choice Component, and you can’t use it as the source of a Collection. To do either of those things, you need to transpose your list across multiple rows. And for this you normally use a Helper Table.

And so we’ve come full circle

Getting back to your original question, I still think JavaScript is probably the best approach. You just need to be clear about your goal and matching criteria.

Eden · July 30, 2023, 8:58am

Yeah, fixated as if I’m on a mission :))
I’ll explain the fundamental reasoning-

I’m diligently exploring a method for storing data flexibly and syncing it efficiently. I’ve mostly succeeded and implemented it beautifully, but my roadblock is searching data.

Given that I can only import limited rows, I need a system to search all values (e.g., 10k organization names). My proposed solution is to sync a single, lightweight array/joined-list containing all values. However, the challenge lies in indexing/searching this list and highlighting the closest matches for user selection-

Which essentially triggers the appropriate data fetch of said value.

Darren_Murphy · July 30, 2023, 9:08am

Let’s ask ChatGPT.

Task: Given an array of 10,000 strings, and a user supplied search term, write a JavaScript algorithm to efficiently search the array and return the 5 closest matches.

Response: To efficiently search for the 5 closest matches in an array of 10,000 strings, you can use the Levenshtein distance algorithm to measure the similarity between strings. The Levenshtein distance calculates the minimum number of single-character edits (insertion, deletion, or substitution) required to transform one string into another. The lower the Levenshtein distance between two strings, the more similar they are.

Here’s a JavaScript algorithm to achieve this:

function levenshteinDistance(s1, s2) {
  const m = s1.length;
  const n = s2.length;
  const dp = Array.from(Array(m + 1), () => Array(n + 1).fill(0));

  for (let i = 0; i <= m; i++) {
    for (let j = 0; j <= n; j++) {
      if (i === 0) dp[i][j] = j;
      else if (j === 0) dp[i][j] = i;
      else if (s1.charAt(i - 1) === s2.charAt(j - 1)) dp[i][j] = dp[i - 1][j - 1];
      else {
        dp[i][j] = 1 + Math.min(dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1]);
      }
    }
  }

  return dp[m][n];
}

function findClosestMatches(searchTerm, array, numMatches = 5) {
  const sortedArray = array.slice().sort((a, b) => {
    const distanceA = levenshteinDistance(searchTerm, a);
    const distanceB = levenshteinDistance(searchTerm, b);
    return distanceA - distanceB;
  });

  return sortedArray.slice(0, numMatches);
}

// Example usage:
const array = [...]; // Your array of 10,000 strings
const searchTerm = "user_supplied_search_term";
const closestMatches = findClosestMatches(searchTerm, array, 5);
console.log(closestMatches);

Replace [...] with your array of 10,000 strings. The findClosestMatches function takes the user-supplied search term and the array as input and returns an array of the numMatches (default is 5) closest matches in the array, sorted by their Levenshtein distance to the search term. The result will contain the top 5 closest strings to the search term.

Please note that the Levenshtein distance algorithm has a time complexity of O(m * n), where m and n are the lengths of the two input strings. However, since the number of strings is fixed (10,000 in this case), the time complexity for this scenario would be O(10,000 * average_string_length). In practice, this algorithm should perform well for a reasonable average string length.

Eden · July 30, 2023, 9:17am

I brainstormed all this with GPT-4 as well, tried finding a method without JS, but I do not have good enough reason to avoid this time!

There’s always a way, Will try this tomorrow! Truly appreciate you

Darren_Murphy · July 30, 2023, 9:57am

Looks like it works okay, although not so sure how good that algorithm is. Might need to find a better one. But conceptually, I think this could work for you.

https://www.loom.com/share/45097a8edc88440c9ce132191a3044bb

Here is the JavaScript I used:

function levenshteinDistance(s1, s2) {
  const m = s1.length;
  const n = s2.length;
  const dp = Array.from(Array(m + 1), () => Array(n + 1).fill(0));

  for (let i = 0; i <= m; i++) {
    for (let j = 0; j <= n; j++) {
      if (i === 0) dp[i][j] = j;
      else if (j === 0) dp[i][j] = i;
      else if (s1.charAt(i - 1) === s2.charAt(j - 1)) dp[i][j] = dp[i - 1][j - 1];
      else {
        dp[i][j] = 1 + Math.min(dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1]);
      }
    }
  }

  return dp[m][n];
}

function findClosestMatches(searchTerm, array, numMatches = 5) {
  const sortedArray = array.slice().sort((a, b) => {
    const distanceA = levenshteinDistance(searchTerm, a);
    const distanceB = levenshteinDistance(searchTerm, b);
    return distanceA - distanceB;
  });

  return sortedArray.slice(0, numMatches);
}

const array = p1.split(',');
return findClosestMatches(p2, array, 5).join('\n');

Eden · July 30, 2023, 4:33pm

Wow!! And it’s blazing fast…

One last missing component, user selection of their desired match (rather than having to type it out).

Why? User selection will result in action to print the value for a fetch column to execute a query for that exact match.

Was gonna make that happen with a collection but I guess that’s impossible…

Darren_Murphy · July 30, 2023, 6:19pm

That’s the easy part. The JavaScript column returns a joined list of matches. All you need to do is split that and then transpose it, and you can use it as the source of a choice component.

Topic		Replies	Views
Splitting large data entries into multiple single value columns Ask for Help actions	10	787	October 28, 2023
Query Array or Joined List Ask for Help	12	1027	June 15, 2022
Request for a better "search bar" capability Feature Requests	18	1417	May 29, 2022
Alternative Search Bar Approach Ask for Help	22	1148	March 18, 2022
Collection Component: How to Limit Search to Single Column and Word Beginnings? Ask for Help	13	78	November 17, 2024

Seeking Efficient Method to Search a Large Array Column in Glide

Related topics