JQuery Autocomplete using Solr and ColdFusion

Everyone who has ever set up a search interface for a client has heard it. "We want it to automatically fill in like Google." It sounds simple enough, but I definitely experienced a bit of a learning curve setting this up. There was a lot of conflicting information out there and there was a lot of trial and error. This is what I finally got to work. There are a couple ways to accomplish this type of UI with Solr, the two most popular being JQuery UI Autocomplete and Bootstrap Type Ahead. Today, we're going to discuss JQuery UI Autocomplete.

The first thing we need to do is set up Solr to create a library of words to make available to fill in our text field. We start by adding a field and field type to our schema.

Schema.xml


<!-- Auto Suggest Field Type -->
    
<fieldType class="solr.TextField" name="text_auto">
<analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.ShingleFilterFactory" maxShingleSize="4" outputUnigrams="true" outputUnigramsIfNoShingles="false" />
</analyzer>
<analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>

<!--- Auto Suggest Field --->

<field name="content_autosuggest" type="text_auto" indexed="true" stored="true" multiValued="false"/>

<!--- Tell Solr to copy contents of indexed documents to our Auto Suggest Field --->

<copyField source="content" dest="content_autosuggest"/>

When initially setting this up, I found a variety of articles on how to set up the field type to return phrases instead of single words. A lot of the articles described the way it was SUPPOSED to work, but for me it never did. Through some more research and tinkering, I found that using the Standard Tokenizer Factory with the Shingle Filter Factory in the index analyzer did the trick. Also, be sure to use the Remove Duplicates filter in your query analyzer so you don't get duplicate results at query time. Finally, we add a copyField tag to tell Solr to use data put into the "content" field to be used for our dictionary. This occurs at index time. You can change this to whatever field you want your results to come from. For example, if your users will be searching on the "title" field, you'll want to copy title data into the content_autosuggest field. In my case, they're searching for text within indexed documents, so I'm using the content field.

Next, we need to set up a search component and request handler in our solrconfig to handle our Auto Suggest requests. Technically, auto suggest is a spell check component since it's actually taking our keystrokes and suggesting possible alternate spellings to complete the word or phrase we're typing in. We set it up like so.

solrConfig.xml


<searchComponent class="solr.SpellCheckComponent" name="suggest">
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<str name="field">content_autosuggest</str> <!-- the indexed field to derive suggestions from -->
<str name="buildOnCommit">true</str>
<str name="storeDir">C:\AutoSuggestDictionary</str>
</lst>
<str name="queryAnalyzerFieldType">text_auto</str>
</searchComponent>

<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
<lst name="defaults">
<str name="df">content_autosuggest</str>
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.count">25</str>
<str name="spellcheck.collate">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>

We name our component "suggest" so we know when we query against this component, we're getting back our auto complete "suggestions". Really you can name it whatever you want, but I found "suggest" to make the most sense. There is a bit of mixed information out there on what to set in the "lookupImpl" attribute. This is the lookup class used to make matches. Some information I have read says to always use the FSTLookup due to it's performance. In my case, after some tinkering, I found TSTLookup to work for me better. Information on the different available classes is available on the Solr Wiki page. In the "field" attribute, we list the name of the field we're using for our auto suggest data and set "buildOnCommit" to true. This will ensure that as new content is indexed and committed, it is made available to the suggester component. You can set this to false to save resources, but you will have to run the build command manually to get any new data into the dictionary. The "storeDir" attribute tells Solr where to build the dictionary file. If you do not specify this attribute, the dictionary will be built and stored in memory. This eats up A LOT of memroy. We then specify the field type used for the auto-suggest data, which we set to text_auto in the schema.xml file above.

The request handler is relatively straightforward. We set up a search handler called "/suggest" and set up some default values. Set the "df" (default field) value to content_autosuggest, the field we use exclusively for auto suggest data. Since our component is a spellcheck component, we also have to set up some default values for the spellchecker. First, setting "spellcheck" to true lets Solr know we are using a spellcheck component. The "spellcheck.dictionary" attribute specifys which dictionary (or spellchecker) we're pulling our results from. We're setting this to the "suggest" component we defined above the request handler. The "onlyMorePopular" attribute is not used for spell check, but is used with the suggester to return results sorted by frequency rather than alphabetically. "Count" is simply how many results to return per request. Setting "collate" to true modifys the query slightly. This just ensures that we get the top results for our search term by ordering them properly. Finally, we tie our request handler to our suggest component by adding it to the "components" section of the request handler.

That's it for the Solr setup. I know that was a lot to take in. There's definitely A LOT of configuration to do and it's easy to make a mistake here or there getting all of the pieces tied together. Take it slow and pay attention to the details. As always if you hit a snag, the guy with the green hair is here to lend a hand.

Now, on to the ColdFusion side of the house.

In the latest release of CFSolrLib, there's a method in cfsolrlib.cfc called "getAutoSuggestResults".

getAutoSuggestResults method:


<cffunction name="getAutoSuggestResults" access="remote" returntype="any" output="false">
<cfargument name="term" type="string" required="no">
<cfif Len(trim(ARGUMENTS.term)) gt 0>
<!--- Remove any leading spaces in the search term --->
     <cfset ARGUMENTS.term = "#trim(ARGUMENTS.term)#">
     <cfscript>
h = new http();
h.setMethod("get");
h.setURL("#THIS.solrURL#/suggest?q=#ARGUMENTS.term#");
local.suggestResponse = h.send().getPrefix().Filecontent;
if (isXML(local.suggestResponse)){
         local.XMLResponse = XMLParse(local.suggestResponse);
         local.wordList = "";
         if (ArrayLen(XMLResponse.response.lst) gt 1 AND structKeyExists(XMLResponse.response.lst[2].lst, "lst")){
            local.wordCount = ArrayLen(XMLResponse.response.lst[2].lst.lst);
            For (j=1;j LTE local.wordCount; j=j+1){
             if(j eq local.wordCount){
                local.resultCount = XMLResponse.response.lst[2].lst.lst[j].int[1].XmlText;
                local.resultList = arrayNew(1);
                For (i=1;i LTE local.resultCount; i=i+1){
                 arrayAppend(local.resultList, local.wordList & XMLResponse.response.lst[2].lst.lst[j].arr.str[i].XmlText);
                }
             }else{
                local.wordList = local.wordList & XMLResponse.response.lst[2].lst.lst[j].XMLAttributes.name & " ";
             }
            }
            //sort results aphabetically
            if (ArrayLen(local.resultList)){
             ArraySort(local.resultList,"textnocase","asc");
            }
         }else{
            local.resultList = "";
         }
}else{
local.resultList = "";
}
</cfscript>
<cfelse>
    <cfset local.resultList = "">
</cfif>
<cfreturn local.resultList />
</cffunction>

There are lots of loops in there that are basically building lists of suggestions. The cfc sets up the http call to the suggester request handler and parses the XML that gets returned. I've done a little work so the cfc knows if we're typing our second or third word, the suggester takes that into account when looking for suggestions instead of just looking for the word we're currently typing. It's getting back the top valid results from solr and then alphabetizing the list to make it a little more pleasing to the user's eye. It also does a little error checking to make sure we're getting a valid result back from Solr. If not, it simply returns a blank result rather than throwing back an error to the form and blowing the whole business up in the user's face.

Now onto our form.


<InvalidTag src="js/jquery-1.7.2.js"></script>
<InvalidTag src="js/jqueryui/jqueryui-1.8.22.js"></script>
<link rel="stylesheet" href="css/jqueryui/jqueryui-1.8.22.css" type="text/css" />
<InvalidTag type="text/javascript">
$(function() {
$("#keyword").autocomplete({
source: "components/cfsolrlib.cfc?method=getAutoSuggestResults&returnformat=json"
});
});
</script>

<html>
<head>
    <title>CFSolrLib 3.0 | Auto-Suggest example</title>
</head>
<body>

    Keyword: <input id="keyword" />

</body>
</html>

First, we include JQuery and JQuery UI to make sure JQuery's autocomplete methods are available. For this example, I just created an input called "keyword" that we will be using to generate our results. In the script block at the top, we're binding our input to our CFC that makes the call to Solr and specifying that we want JSON as our return format. JQuery Autocomplete expects JSON as it's data.

As long as you already have information in your index, Solr will build the dictionary when you start it up. If not, start Solr and index a few things. Since we set buildOnCommit = "true", the items will be added to our dictionary when we commit our changes to the index. You can always manually rebuild your dictionary at any time like so.


<cfscript>
h = new http();
h.setMethod("get");
h.setURL("http://localhost:8983/solr/suggest?spellcheck.build=true");
h.send();
</cfscript>

You can simplify this further by just typing that URL into a browser to rebuild the dictionary, but this code snippet works well if you want to insert a button or link into an application to rebuild your dictionary on the fly while debugging.

If all went well and you have all of your bits and pieces set up correctly, you should be able to run this in a browser and see results drop down as you begin to type in the input box.

There's a fully functional example of this code, including a properly set up Solr 4.0 instance in the latest CFSolrLib, available on GitHub.

A working example can be viewed at http://jimleether.com/solrexample/autoSuggestExample.cfm

This is definitely a lot of information to take in. If you get it all working on your first try, well kudos to you. When you get everything customized to your application, this is a very powerful tool.

I plan on writing a post about Bootstrap Type Ahead and Solr very soon. Enjoy!

Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
Laksma's Gravatar Thanks for sharing. It's very useful.
# Posted By Laksma | 2/20/13 3:26 PM
Richard Hughes's Gravatar Got a link for a working solution?
# Posted By Richard Hughes | 2/22/13 6:20 PM
Jim Leether's Gravatar I actually tried to set one up for this post, but this web server is running ColdFusion 8 and gets pissy about some of the code in CFSolrLib, namely the scripted http calls. I'm in the process of setting up a new web server and plan on adding one there. I'll see if I can find a temporary place for an example in the mean time. There's also a full example of this exact code and Solr setup in the latest CFSolrLib download on GitHub that will run on ColdFusion 9 and up.
# Posted By Jim Leether | 2/22/13 6:47 PM
Jim Leether's Gravatar A working example can be viewed at http://jimleether.com/solrexample/autoSuggestExamp...
# Posted By Jim Leether | 2/22/13 7:56 PM
Spiderdev's Gravatar Hello Lee,
we are developing a search engine for our ecommerce platform, but we have a problem with the autocomplete connected to Solr in autosuggestion, for example if I search "650D" (a well-known camera model) the autosuggestion does not return any data until they arrive to the letter "d" with results that concern only "d" without trying the 6,5,0 does not work. We have installed both version 4.0 and 4.6 of the Solr, but I do not think it is a version problem, you can help solve this problem I've seen others have experienced around the web, but no one was able to give a solution, thanks and excuse my English.
# Posted By Spiderdev | 12/11/13 5:54 AM
Jim Leether's Gravatar The fact that you're getting results at all tells me that you have the autocomplete code written properly. This sounds like a field type or indexing issue to me. Do you mind if I take a look at your schema.xml file?
# Posted By Jim Leether | 12/11/13 11:31 AM
spiderdev's Gravatar Thanks Jim also I think the problem is the FieldType, but they are two days that I try to solve the problem but I have not succeeded here is the link https://www.dropbox.com/s/x48nb2n9ha6imux/schema.x...
# Posted By spiderdev | 12/11/13 12:46 PM
Jim Leether's Gravatar This definitely stems from the types of tokenizers and filters being used. Keywords starting with numbers can be tricky. I will get some information for you as quickly as I can.
# Posted By Jim Leether | 12/11/13 2:03 PM
spiderdev's Gravatar You're really kind we look forward to your help in the meantime, let's get on with the other changes thanks.
# Posted By spiderdev | 12/11/13 2:13 PM
Jim Leether's Gravatar Try adding this to your solrconfig.xml

<queryConverter name="queryConverter" class="org.apache.solr.spelling.SuggestQueryConverter"/>

It changes the way the search terms are analyzed as they're sent to the suggester. I placed it between the suggest search component and request handler (around line 1331 in the default config file), but I don't believe it matters where it goes. This may be an easy fix if it works.
# Posted By Jim Leether | 12/11/13 2:22 PM
spiderdev's Gravatar You are great!
In 5 minutes I've solved the problem and I've got lost 1 whole day.

At this point since you are a phenomenon of Solr also ask you this: how can I give more or less importance to terms in suggestions?
# Posted By spiderdev | 12/11/13 3:14 PM
Jim Leether's Gravatar Can you give me an example of the types of things you find more or less important? I'm pretty sure you can do it with weighting, but I want to make sure it fits your situation.
# Posted By Jim Leether | 12/11/13 4:17 PM
Spiderdev's Gravatar For example:

- Canon pfi 5630 (cartridge) 0 sales but is a product with 100 variations "canon pfi 56 **"
- Canon EOS 650D (SLR) in 1500 sales, but with 5 variants "Canon EOS 650D, Canon EOS 650D kit, etc."

if I write "canon" in autosuggest comes first "canon pfi" with 0 sales instead of "Canon EOS 650D" which is very famous and 1500 sales, I added a rating field via "setDocumentBoost" but it does not change anything, you can also help us in this thing?
# Posted By Spiderdev | 12/12/13 3:09 PM
Jim Leether's Gravatar I think you may be getting a little TOO granular here. Broken down to its most basic level, Solr analyzes text. While you can boost a field so a product's name is considered more relevant to your search than it's part number or category, Solr isn't designed to hold certain index records in a higher light than other.

Right now, your setup has:

<str name="spellcheck.onlyMorePopular">true</str>

This is telling Solr to look at the index and put the terms that show up most often in the index at the top. The method in CFSolrLib.cfc then alphabetizes those results. Since the Canon pfi 56** shows up 100 times, but the Canon EOS 650D only shows up 5 times, Solr considers the pfi 5630 to be a much more important match than the EOS 650D. If you set the onlyMorePopular to false, it will just sort them alphabetically and ignore the terms' frequency in the index.

If you want to put the camera with the most sales at the top, I do have some ideas of how you can accomplish this with a combination of Solr and ColdFusion. I want to try out a few things on my test server to see what works best, then I'll get back to you with my findings.

By the way, I appreciate your kind words. I'm glad I can use what knowledge I have to help out.
# Posted By Jim Leether | 12/13/13 2:33 AM
Deepthi's Gravatar This post really helped me. Thanks a lot. I would like to know if we can use more than one field, like Title and Author, for autocompletion?
# Posted By Deepthi | 12/13/13 7:16 PM
Spiderdev's Gravatar sure,
you have to use in schema.xml:

<copyField source="cat" dest="content_autosuggest"/>
<copyField source="name" dest="content_autosuggest"/>

considers that the destination "content_autosuggest" you have to define is usually above the copyfield:

<field name="content_autosuggest" type="text_auto" indexed="true" stored="true" multiValued="true"/>

"content_autosuggest" is the name that I gave to my field suggest, it's very likely that your call in another way.
# Posted By Spiderdev | 12/14/13 5:33 AM
Jim Leether's Gravatar Spiderdev is correct.

Your field containing your auto-suggest terms is populated using copyfield tags in your schema.xml file. If you add a copyfield for both fields with your auto-suggest field as the destination, both will be included in your auto-suggest results.
# Posted By Jim Leether | 12/14/13 3:46 PM
Spiderdev's Gravatar Hello Jim with QueryElevatorComponent I have not found a solution, however, I found this looking http://goo.gl/DCNfcz What do you think?
# Posted By Spiderdev | 12/23/13 11:28 AM
Deepthi's Gravatar Thank You. That helped :)
# Posted By Deepthi | 12/27/13 2:19 PM
Copyright © 2008 - Jim Leether BlogCFC was created by Raymond Camden. This blog is running version 5.9.1.001. Contact Jim