JQuery Autocomplete using Solr and ColdFusion

Everyone who has ever set up a search interface for a client has heard it. "We want it to automatically fill in like Google." It sounds simple enough, but I definitely experienced a bit of a learning curve setting this up. There was a lot of conflicting information out there and there was a lot of trial and error. This is what I finally got to work. There are a couple ways to accomplish this type of UI with Solr, the two most popular being JQuery UI Autocomplete and Bootstrap Type Ahead. Today, we're going to discuss JQuery UI Autocomplete.

The first thing we need to do is set up Solr to create a library of words to make available to fill in our text field. We start by adding a field and field type to our schema.

Schema.xml


<!-- Auto Suggest Field Type -->
    
<fieldType class="solr.TextField" name="text_auto">
<analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.ShingleFilterFactory" maxShingleSize="4" outputUnigrams="true" outputUnigramsIfNoShingles="false" />
</analyzer>
<analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>

<!--- Auto Suggest Field --->

<field name="content_autosuggest" type="text_auto" indexed="true" stored="true" multiValued="false"/>

<!--- Tell Solr to copy contents of indexed documents to our Auto Suggest Field --->

<copyField source="content" dest="content_autosuggest"/>

When initially setting this up, I found a variety of articles on how to set up the field type to return phrases instead of single words. A lot of the articles described the way it was SUPPOSED to work, but for me it never did. Through some more research and tinkering, I found that using the Standard Tokenizer Factory with the Shingle Filter Factory in the index analyzer did the trick. Also, be sure to use the Remove Duplicates filter in your query analyzer so you don't get duplicate results at query time. Finally, we add a copyField tag to tell Solr to use data put into the "content" field to be used for our dictionary. This occurs at index time. You can change this to whatever field you want your results to come from. For example, if your users will be searching on the "title" field, you'll want to copy title data into the content_autosuggest field. In my case, they're searching for text within indexed documents, so I'm using the content field.

Next, we need to set up a search component and request handler in our solrconfig to handle our Auto Suggest requests. Technically, auto suggest is a spell check component since it's actually taking our keystrokes and suggesting possible alternate spellings to complete the word or phrase we're typing in. We set it up like so.

solrConfig.xml


<searchComponent class="solr.SpellCheckComponent" name="suggest">
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<str name="field">content_autosuggest</str> <!-- the indexed field to derive suggestions from -->
<str name="buildOnCommit">true</str>
<str name="storeDir">C:\AutoSuggestDictionary</str>
</lst>
<str name="queryAnalyzerFieldType">text_auto</str>
</searchComponent>

<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
<lst name="defaults">
<str name="df">content_autosuggest</str>
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.count">25</str>
<str name="spellcheck.collate">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>

We name our component "suggest" so we know when we query against this component, we're getting back our auto complete "suggestions". Really you can name it whatever you want, but I found "suggest" to make the most sense. There is a bit of mixed information out there on what to set in the "lookupImpl" attribute. This is the lookup class used to make matches. Some information I have read says to always use the FSTLookup due to it's performance. In my case, after some tinkering, I found TSTLookup to work for me better. Information on the different available classes is available on the Solr Wiki page. In the "field" attribute, we list the name of the field we're using for our auto suggest data and set "buildOnCommit" to true. This will ensure that as new content is indexed and committed, it is made available to the suggester component. You can set this to false to save resources, but you will have to run the build command manually to get any new data into the dictionary. The "storeDir" attribute tells Solr where to build the dictionary file. If you do not specify this attribute, the dictionary will be built and stored in memory. This eats up A LOT of memroy. We then specify the field type used for the auto-suggest data, which we set to text_auto in the schema.xml file above.

The request handler is relatively straightforward. We set up a search handler called "/suggest" and set up some default values. Set the "df" (default field) value to content_autosuggest, the field we use exclusively for auto suggest data. Since our component is a spellcheck component, we also have to set up some default values for the spellchecker. First, setting "spellcheck" to true lets Solr know we are using a spellcheck component. The "spellcheck.dictionary" attribute specifys which dictionary (or spellchecker) we're pulling our results from. We're setting this to the "suggest" component we defined above the request handler. The "onlyMorePopular" attribute is not used for spell check, but is used with the suggester to return results sorted by frequency rather than alphabetically. "Count" is simply how many results to return per request. Setting "collate" to true modifys the query slightly. This just ensures that we get the top results for our search term by ordering them properly. Finally, we tie our request handler to our suggest component by adding it to the "components" section of the request handler.

That's it for the Solr setup. I know that was a lot to take in. There's definitely A LOT of configuration to do and it's easy to make a mistake here or there getting all of the pieces tied together. Take it slow and pay attention to the details. As always if you hit a snag, the guy with the green hair is here to lend a hand.

Now, on to the ColdFusion side of the house.

In the latest release of CFSolrLib, there's a method in cfsolrlib.cfc called "getAutoSuggestResults".

getAutoSuggestResults method:


<cffunction name="getAutoSuggestResults" access="remote" returntype="any" output="false">
<cfargument name="term" type="string" required="no">
<cfif Len(trim(ARGUMENTS.term)) gt 0>
<!--- Remove any leading spaces in the search term --->
     <cfset ARGUMENTS.term = "#trim(ARGUMENTS.term)#">
     <cfscript>
h = new http();
h.setMethod("get");
h.setURL("#THIS.solrURL#/suggest?q=#ARGUMENTS.term#");
local.suggestResponse = h.send().getPrefix().Filecontent;
if (isXML(local.suggestResponse)){
         local.XMLResponse = XMLParse(local.suggestResponse);
         local.wordList = "";
         if (ArrayLen(XMLResponse.response.lst) gt 1 AND structKeyExists(XMLResponse.response.lst[2].lst, "lst")){
            local.wordCount = ArrayLen(XMLResponse.response.lst[2].lst.lst);
            For (j=1;j LTE local.wordCount; j=j+1){
             if(j eq local.wordCount){
                local.resultCount = XMLResponse.response.lst[2].lst.lst[j].int[1].XmlText;
                local.resultList = arrayNew(1);
                For (i=1;i LTE local.resultCount; i=i+1){
                 arrayAppend(local.resultList, local.wordList & XMLResponse.response.lst[2].lst.lst[j].arr.str[i].XmlText);
                }
             }else{
                local.wordList = local.wordList & XMLResponse.response.lst[2].lst.lst[j].XMLAttributes.name & " ";
             }
            }
            //sort results aphabetically
            if (ArrayLen(local.resultList)){
             ArraySort(local.resultList,"textnocase","asc");
            }
         }else{
            local.resultList = "";
         }
}else{
local.resultList = "";
}
</cfscript>
<cfelse>
    <cfset local.resultList = "">
</cfif>
<cfreturn local.resultList />
</cffunction>

There are lots of loops in there that are basically building lists of suggestions. The cfc sets up the http call to the suggester request handler and parses the XML that gets returned. I've done a little work so the cfc knows if we're typing our second or third word, the suggester takes that into account when looking for suggestions instead of just looking for the word we're currently typing. It's getting back the top valid results from solr and then alphabetizing the list to make it a little more pleasing to the user's eye. It also does a little error checking to make sure we're getting a valid result back from Solr. If not, it simply returns a blank result rather than throwing back an error to the form and blowing the whole business up in the user's face.

Now onto our form.


<InvalidTag src="js/jquery-1.7.2.js"></script>
<InvalidTag src="js/jqueryui/jqueryui-1.8.22.js"></script>
<link rel="stylesheet" href="css/jqueryui/jqueryui-1.8.22.css" type="text/css" />
<InvalidTag type="text/javascript">
$(function() {
$("#keyword").autocomplete({
source: "components/cfsolrlib.cfc?method=getAutoSuggestResults&returnformat=json"
});
});
</script>

<html>
<head>
    <title>CFSolrLib 3.0 | Auto-Suggest example</title>
</head>
<body>

    Keyword: <input id="keyword" />

</body>
</html>

First, we include JQuery and JQuery UI to make sure JQuery's autocomplete methods are available. For this example, I just created an input called "keyword" that we will be using to generate our results. In the script block at the top, we're binding our input to our CFC that makes the call to Solr and specifying that we want JSON as our return format. JQuery Autocomplete expects JSON as it's data.

As long as you already have information in your index, Solr will build the dictionary when you start it up. If not, start Solr and index a few things. Since we set buildOnCommit = "true", the items will be added to our dictionary when we commit our changes to the index. You can always manually rebuild your dictionary at any time like so.


<cfscript>
h = new http();
h.setMethod("get");
h.setURL("http://localhost:8983/solr/suggest?spellcheck.build=true");
h.send();
</cfscript>

You can simplify this further by just typing that URL into a browser to rebuild the dictionary, but this code snippet works well if you want to insert a button or link into an application to rebuild your dictionary on the fly while debugging.

If all went well and you have all of your bits and pieces set up correctly, you should be able to run this in a browser and see results drop down as you begin to type in the input box.

There's a fully functional example of this code, including a properly set up Solr 4.0 instance in the latest CFSolrLib, available on GitHub.

A working example can be viewed at http://jimleether.com/solrexample/autoSuggestExample.cfm

This is definitely a lot of information to take in. If you get it all working on your first try, well kudos to you. When you get everything customized to your application, this is a very powerful tool.

I plan on writing a post about Bootstrap Type Ahead and Solr very soon. Enjoy!

Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
Laksma's Gravatar Thanks for sharing. It's very useful.
# Posted By Laksma | 2/20/13 3:26 PM
Richard Hughes's Gravatar Got a link for a working solution?
# Posted By Richard Hughes | 2/22/13 6:20 PM
Jim Leether's Gravatar I actually tried to set one up for this post, but this web server is running ColdFusion 8 and gets pissy about some of the code in CFSolrLib, namely the scripted http calls. I'm in the process of setting up a new web server and plan on adding one there. I'll see if I can find a temporary place for an example in the mean time. There's also a full example of this exact code and Solr setup in the latest CFSolrLib download on GitHub that will run on ColdFusion 9 and up.
# Posted By Jim Leether | 2/22/13 6:47 PM
Jim Leether's Gravatar A working example can be viewed at http://jimleether.com/solrexample/autoSuggestExamp...
# Posted By Jim Leether | 2/22/13 7:56 PM
Copyright © 2008 - Jim Leether BlogCFC was created by Raymond Camden. This blog is running version 5.9.1.001. Contact Jim