So just as the title says, trying to extract data from Yelp (inspected the search results page to get my HTML/JSON) and parse it out.
The JSON is within the HTML and I’m having a hard time getting my script to work to extract just the JSON. Any pointers are appreciated!
Code below:
window.function = function (p1) {
if (!p1) {
return undefined;
}
let html = p1;
let startMarker = ‘<script type="application/json"data-hypernova-key="yelpfrontend__56414__yelpfrontend__GondolaSearch__dynamic"data-hypernova-id=“dc346d70-c1bc-4741-b827-4ea0ab1ee84e”>’;
let endMarker = ‘–>’;
let startIndex = html.indexOf(startMarker) + startMarker.length;
let endIndex = html.indexOf(endMarker, startIndex);
if (startIndex !== -1 && endIndex !== -1 && endIndex > startIndex) {
let jsonString = html.substring(startIndex, endIndex).trim();
return jsonString;
}
Can you give us a sample of p1’s value?
Definitely!
Attaching 2 screenshots, the amount of information for p1’s value is way too much to copy and paste. The 2 screenshots are of the beginning of the JSON object/end of it. Let me know if there is anything in specific you’d need to see. Thank you!!
’
I think you could do this quite effectively without writing any code at all, by using a combination of the Get Webpage Source column and an AI column to extract the JSON.
I tried that but the page source is too large for AI to handle. I get an error saying I maxed out the context length.
I see.
Okay, well just looking at your code the first thing I would try is removing that enclosing function.
So try something like the following:
if (!p1) {
return undefined;
}
let html = p1;
let startMarker = ‘<script type="application/json"data-hypernova-key="yelpfrontend__56414__yelpfrontend__GondolaSearch__dynamic"data-hypernova-id=“dc346d70-c1bc-4741-b827-4ea0ab1ee84e”>’;
let endMarker = ‘–>’;
let startIndex = html.indexOf(startMarker) + startMarker.length;
let endIndex = html.indexOf(endMarker, startIndex);
if (startIndex !== -1 && endIndex !== -1 && endIndex > startIndex) {
let jsonString = html.substring(startIndex, endIndex).trim();
return jsonString;
}
Just got this error:
Function Error
SyntaxError: Invalid or unexpected token
arr, I think when I copy/pasted your code, it wound up with some smart quotes, which would have broken it.
Try this:
if (!p1) {
return undefined;
}
let html = p1;
let startMarker = `<script type="application/json"data-hypernova-key="yelpfrontend__56414__yelpfrontend__GondolaSearch__dynamic"data-hypernova-id="dc346d70-c1bc-4741-b827-4ea0ab1ee84e">`;
let endMarker = '–>';
let startIndex = html.indexOf(startMarker) + startMarker.length;
let endIndex = html.indexOf(endMarker, startIndex);
if (startIndex !== -1 && endIndex !== -1 && endIndex > startIndex) {
let jsonString = html.substring(startIndex, endIndex).trim();
return jsonString;
}
You’re the man! Thank you!! It worked!!
1 Like