Google Custom Search Results

Want to find out what Google has on your web site? Do you wish to tabulate a list of all the pages Google has searched and indexed? Here's how to document it! Caveat: you must have a google business account to use the API and not have ads returned with your results.

To view what google has indexed, go to Google, of course, and type in your site's domain, prefixed with "site:", e.g., "site:mysite.com". This will give you the first 10 results. You could copy and paste, but you are too smart and lazy for that.

This tutorial will NOT go into how to set up Google Custom Search or access the API. I'm assuming you either already know how to do that or will look that up on your own. The Google Custom Search API returns a whole bunch of stuff, but we only want in this case, the URLs of every page indexed.

I have the CFM template in its entirety below, but we'll step through what's going on here first. The form which calls itself for the action.

view plain print about
1<p class="welcome">Googe Custom Search</p>
2<form action="joe-google-custom-search.cfm">
3    <label>
4        Search Terms:
5        <input type="text" name="googlesearchterms" id="googlesearchterms" size="30" />
6    </label>
7    <button id="googlesearch" name="googlesearch" type="submit">Search</button>
8</form>
To submit the form, you would want to enter, "site:yourdomain.com". Or you could put in whatever search terms you wish. Once the form is submitted, we want to format the search terms by converting the spaces to plus signs (+). The query string will be appended to the API call's URL.
view plain print about
1<cfset searchterms = Replace(trim(form.googlesearchterms, " ", "+", "all")) />

We want to set our counter and make sure we get a maximum of 1000 returns, if there are that many. This will error out at the end. We are going to go in increments of 10 because that's the default number of results Google will return.

view plain print about
1<!--- set our result counter --->
2    <cfset CountVar = 1 />
3    <!--- limit returns to 1000 --->
4    <cfloop condition="CountVar LESS THAN 1000">

This is where the action happens. We add our query string, our API key and our incremented count.

view plain print about
1<!--- hit up Google --->
2        <cfhttp url="https://www.googleapis.com/customsearch/v1?key=#myGoogleAPIKey#&q=#searchterms#&start=#CountVar#"
3                resolveURL="yes"
4                result="variables.response"
5                method="get">

We will have to convert the returned JSON to an easily usable format:
view plain print about
1<cfset variables.GoogleSearchResults = deserializeJSON(variables.response.Filecontent) />

Then pluck out our array of returned result items:

view plain print about
1<cfset variables.GoogleSearchResults = deserializeJSON(variables.response.Filecontent) />

The results come back in groups of 10, so we have to loop this set and write to our file, which should already be created and in place. You can output to the screen too, but that's a bit too much. For testing I set the CountVar to 100 and returned to the screen, a reasonable amount for verification.

view plain print about
1<!--- loop thru and add to list (dump to screen) --->
2        <cfloop array="#variables.GoogleArray#" index="arrayitem">
3            <cffile file="C:\path\to\my\text\file\googlesearchtext.txt" action="append" output="#arrayitem.link#" />
4            <!--- <cfoutput>#arrayitem.link#</cfoutput><br/> --->
5        </cfloop>

Check your text file, it should be much larger now. From here you can cut and paste into Excel or whatever format you need. If desired, you could create a SQL insert statement too.

Here's all the code in one piece, it's called joe-google-custom-search.cfm. So if you change the name, don't forget the form action attribute.

view plain print about
1<p class="welcome">Googe Custom Search</p>
2<form action="joe-google-custom-search.cfm">
3    <label>
4        Search Terms:
5        <input type="text" name="googlesearchterms" id="googlesearchterms" size="30" />
6    </label>
7    <button id="googlesearch" name="googlesearch" type="submit">Search</button>
8</form>
9
10
11<!--- form submitted? --->
12<cfif StructKeyExists(form, "googlesearchterms") />
13    
14    <!--- format search terms --->
15    <cfset searchterms = Replace(trim(form.googlesearchterms, " ", "+", "all")) />
16    <cfset myGoogleAPIKey = "123412342342341234:oiuo12341234" />
17    
18    <!--- set our result counter --->
19    <cfset CountVar = 1 />
20    <!--- limit returns to 1000 --->
21    <cfloop condition="CountVar LESS THAN 1000">
22        
23        <!--- hit up Google --->
24        <cfhttp url="https://www.googleapis.com/customsearch/v1?key=#myGoogleAPIKey#&q=#searchterms#&start=#CountVar#"
25                resolveURL="yes"
26                result="variables.response"
27                method="get">

28    
29        <!--- pick out what we want --->
30        <cfset variables.GoogleSearchResults = deserializeJSON(variables.response.Filecontent) />
31        <cfset variables.GoogleArray = variables.GoogleSearchResults.items />
32    
33        <!--- loop thru and add to list (dump to screen) --->
34        <cfloop array="#variables.GoogleArray#" index="arrayitem">
35            <cffile file="C:\path\to\my\text\file\googlesearchtext.txt" action="append" output="#arrayitem.link#" />
36            <!--- <cfoutput>#arrayitem.link#</cfoutput><br/> --->
37        </cfloop>
38    
39        <!--- Google returns results in groups of 10 --->
40        <cfset CountVar = CountVar + 10 />
41    </cfloop>
42</cfif>

Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
BlogCFC was created by Raymond Camden. This blog is running version 5.9.7. Contact Blog Owner