How can I use regex to clean/normalize user-submitted urls?

My bookmarking app allows users to submit any URL. I want to clean/normalize these URLs using regex. Eg strip off query params or any chars prior to http*.


is cleaned to:

using regex = ([^?]+)(?.*)?

Bonus: Once I can do this for a single regex, Iā€™d like to chain regexes for more complex cleaning and standardization. So the result of applying regex1 feeds into regex2, etc.

If you want to regex it I think you can try this.

If you simply want to remove all the query parameters then I think you can try using a split text column on the ā€œ?ā€ character, then a single value column to retrieve the 1st part.


Works as expected.


This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.