Finding and Deleting Duplicate Entries

I’m reporting this as “Ask for Help” because I’m likely missing something obvious, but what’s happening, and is repeatable, makes no sense to me.

I have an app/gsheet that collects leads, and that sheet looks for duplicate names and addresses – if a single entry has both, it’s likely a duplicate lead. When viewing that lead in Glide I have a *hint that shows at the top of the screen indicating it may be a duplicate, with the optional action to delete it. There is also a filter option for the leads list to just show entries flagged as possible duplicates.

The action to delete the duplicate entry is simply “delete row this item.”

What happens when clicked is the data is removed from the sheet row, but the line itself isn’t removed. Since there is a RowID turned on, the row simply existing means that the original ID is deleted along with all the data, but then a new blank row still exists and that column is backfilled with a new RowID.

What happens on the frontend is weird… the “lead” disappears, initially, as you would expect, but then on future screen refreshes it reappears, and still, somehow, has all it’s data intact. Mind you, the row is gone, left only with a different, new, RowID.

If you enter that lead again, and again click delete on the “duplicate” notice at the top, it again disappears. However, as soon as the RowID refreshes with the blank row that’s left, the lead reappears. This time I think the cache is gone and while it retains the name and data displayed on the lead list, clicking it a third time results in nothing. You cannot “enter” the lead as before. It’s just dead. Manually deleteing the entire row in sheets makes it disappear for good.

Lastly, if I manually add a line in gsheets and copy in intentionally duplicate data, but let Glide generate the RowID, the duplicate is flagged as duplicate in gsheets (TRUE), but the filter on the leads screen in the app does not show it. Turning off that filter, and back on, does not solve it. It will not show any additional duplicates.

Help? :melting_face:

This suggests to me that you most likely have an arrayformula in the attached Google Sheet?
Perhaps that’s used to flag your duplicates?
If that’s the case, my suggestion would be to get rid of it and instead build that logic in Glide.
It won’t solve the empty row issue in your GSheet, but it should prevent new RowIDs being added to those empty rows. Which may in turn solve the bigger issue.

That said, there have been reports recently of weird stuff going on with Google Sheet syncing. See below:

2 Likes

I do have arrayformulas, a plenty. However, if the RowID is empty, the arrayformulas essentially do nothing. Maybe they aren’t actually outputting “nothing”? lol

That does not explain how Glide is retaining information that no longer exists. The leads page lists each lead/row, and the “deleted row” still has the original lead name, relative time, and status that it did before the deletion.

I did make a query column to see if I could test the same logic, but while I’m fairly wizard with sheets, I’m still learning Glide… Here is what I tried:

  • Last Name is This Row → Last Name, AND
  • Service Address is This Row → Service Address

The thought here is that if the same last name AND the service address both match, likely dupe. The data I get returned is the First Name value, which is a completely different column – however, that First Name shows up twice if there is a dupe.

Am I approaching this correctly?

Also, why doesn’t “delete row this item” actually delete the row? It just deletes the data, leaving otherwise a blank row in the middle of the sheet. :man_shrugging:

It does appear your linked bug may be related if the google syncs are messing up. I’ll link this issue there as well…

It’s usually been the case that Glide will see rows affected by array formulas as filled rows, they typically will not touch them. This is apparent when adding a new row in Glide. It will skip over the existing rows, even if they appear empty in your google sheet.

As @Darren_Murphy pointed out, there is probably a bigger sync issue at play here, which is causing all of these issues.

1 Like

Ok, so here is what I have so far. Looking for protips on a) using less columns, and/or b) getting at the actual data that I want.

The first 3 columns are the sheets columns – check dupe last name, check dupe service address, if both dupes – probably a dupe. Flag the /second/ entry as the possible dupe.

The last 3 columns is replicating this in Glide. Same-ish result, same number of columns, but it flags /both/ records as the dupes.

Please say I’m missing something obvious!

I assume you want only the second record to be marked as a duplicate?

I think what you can do is get rid of the countDupes column because you don’t need it. Then add a Single Value column that gets the first RowID from from your Query. Finally, change your isDupe IF column to check if the RowID for the current row is not equal to the Single Value column RowID. If it is not a match, then return ‘true’.

With this setup, only the second, third, fourth, etc. duplicates will be marked as true. Any that are the first occurrence or those that do not have a duplicate will be marked as ‘false’

2 Likes

Brilliant! So still 3 columns, but replicating desired outcome from Glide.

Let me try the delete lead (row) action again and see if this makes a difference…

1 Like

Ok, so an update to this for everyone…

It seems like this solution is working as expected. The key takeaways here, I think, are to do what you have to do when working with Google Sheets, but as you iterate through a new app do ensure that you convert as many array formulas to glide native as you can.

In doing this I have had to add a couple data points to ensure /some/ data that needs to be visible in Sheets is available, but for the most part converting everything to Glide has solved the update and delete row issues mentioned here.

I still notice that Glide is simply deleting the data, but the row remains. That doesn’t seem to cause the “caching and synching” glitch mentioned above, however, empty rows will inevitably exist throughout the dataset over time due to this. I assume this will not affect billing or app operation.

Cheers!

2 Likes

I don’t know if there’s a technical reason but Glide has always done this with Google Sheets since their first days.

There are different endpoints in the Google Sheets API to clear specific ranges in a row, and delete the row altogether.

If I recall right, when clearing a row, they always keep the rowID value? Might be a way to deal with having to specify row index when clearing/deleting rows, but honestly I’m not sure.