How can I strip the text in front of the https for this url?

Note this url:

The Orphan Collector: A Heroic Novel of Survival During the 1918 Influenza Pandemic https://www.amazon.com/dp/1496715861/ref=cm_sw_r_cp_api_glt_fabc_YXJ2TJYZ14F5T705NC3T

For some reason, Amazon adds descriptive text before the URL when I copy/post from their website on my phone. I need to strip out everything before the https. What’s the best way to do that in Glide?

I would probably create a split column to split on ‘://’. Then create a single value column to get the second array item from the split array. The create a template column to join ‘https://’ and the single value, which should be the rest of the url. This should work if there is or isn’t extra text concatenated to the beginning of the link.

5 Likes

Thanks, Jeff

1 Like

Another option could be the Extract Matching Text plugin…

One thing to note is that both solutions would fail if you had extra text after the URL. Should be possible to craft a regex to deal with that, but my regex-fu is a bit rusty.

Update: (https:[^\s]+) seems to work when there is trailing text. It just stops capturing when it hits the first whitespace character.

Of course, writing regex to match URL’s is a bit of a slippery slope… :smiley:

8 Likes

You could also use a Code/Javascript column with this code:

return p1.substr(p1.toUpperCase().indexOf("HTTP"));
7 Likes

Thanks. See below. Where does the JS go?

This is what mine looks like. I don’t know why to Function Error shows up but it works anyway.

1 Like

I’d guess the error would be triggered by rows where the column being referenced is empty.

2 Likes

I added a type check but interestingly this code gives the error initially but if you tap the Reload then it goes away.

return (typeof p1 === "string" ? 
 p1.substr(p1.toUpperCase().indexOf("HTTP")) : "")
1 Like

That’s right, the error goes away once you reload. It does that with all return functions. But all work.

Hola Wiz and George,

The function error is caused by some empty row in your Test column, this happens using any plugin or EC.

The plugin runs fine when it finds all valid values (rows) belonging to column set as parameter. I think it is not an error at all, looks like a warning :warning:

Bye

1 Like

Thank you :slightly_smiling_face:

If you want to extract just the url without having to worry about leading and trailing texts or don’t want to use JavaScript, then use this Regex

((http|https)://[a-zA-Z0-9-.]+.[a-zA-Z]{2,3}(/\S*)?)

Steps:

  1. Add the extract matching text plugin
  2. For text, select your amazon column
  3. In the regular expression field copy paste the above regex

Note

In Glide expressions from what I have discovered so far need to be wrapped inside () for them to yield true results.

2 Likes

erm…

:stuck_out_tongue:

1 Like

I’ve noticed this too

1 Like

In regex speak, they are referred to as “capturing parentheses” :wink:

3 Likes

Love that you know this. This is why you’re such an asset in this community!

3 Likes

I had a mis-spent youth as a Perl hacker, so regular expressions are not entirely a new concept :wink:

5 Likes

Sorry missed it :relaxed:

1 Like

The only problem with your regex is that it will not catch http.
The one I suggested would catch both http/s