Solr Highlighting - Enterprise Solr vs ColdFusion's Built-in Solr

First, a big THANK YOU to everyone who attended my session at CF Objective. It was very rewarding to be able to spread the information rattling around in my head and answer your questions about Solr. One of the questions I received has inspired a blog post.

This particular attendee needed to be able to display highlighting from multiple areas of a document in the case that the matching search term appears more than once in that document. Using the Solr instance built into ColdFusion, she was unable to achieve this. I'll outline the different highlighting options available in both ColdFusion's and the full version of Apache Lucene Solr in this post.

First, I am in no way bashing Adobe's roll out of Solr. It will work fine for certain types of applications that only require basic search and indexing features. If you need a lot of details out of your search, however, the full version available from Apache is the way to go.

So let's see what ColdFusion has to offer us when we want highlighting results with our search.

I created a document containing 5 paragraphs of "Lorem Ipsum" and placed the word "dog" at four random locations within the document. I then used CFIndex to add the document to my ColdFusion collection and performed a search using CFSearch. In the "context" column of the returned query object I get:

ColdFusion provides us with a short snippet of text with the search term surrounded by HTML emphasis tags. Although the search term appears four times inside the document text, we only get the first match here. The HTML tags surrounding the matched term can be customized using the contextHighlightBegin and contextHighlightEnd attributes of the CFSearch tag in the case you wanted to change the background color or have your match noted a different way.

Okay, so if we want ALL of the highlighting data, that doesn't help us. Let's see what we get with the full version of Solr 4.8.0 with CFSolrLib to integrate it into our CF application.

I indexed the same text file, using Apache Tika to extract the text from the document and the "add" method of CFSolrLib to index my data. I placed the text in the "title" field since I already have that field set up for highlighting. There are two types of highlighters available in Apache Solr. There is a "simple" highlighting component, which is much like what you see in ColdFusion's instance, and a "Fast Vector Highligher", which is a much faster and far superior highlighting component. For this example, I will be using the Fast Vector Highlighter.

First thing to note, when using the Fast Vector Highlighter, the field you intend to pull your highlighting data from has to have Term Offsets, Term Positions and Term Vectors set to true. In the Solr instance distributed with CFSolrLib you'll find this line in your schema:


<field name="title" type="text_general" indexed="true" stored="true" termOffsets="true" termPositions="true" termVectors="true" />

This field has the proper attributes set to use the Fast Vector Highlighter.

Now let's look at our search example (searchExample.cfm distributed with CFSolrLib). Taking a look at the top of the example, there are highlighting attributes being passed to Solr.


<cfset local.params = structNew()>
<cfset local.params["hl"] = "on">
<cfset local.params["hl.fl"] = "title">
<cfset local.params["hl.fragListBuilder"] = "simple">
<cfset local.params["hl.fragsize"] = 20>
<cfset local.params["hl.snippets"] = 10>
<cfset local.params["hl.useFastVectorHighlighter"] = true>
<cfset local.params["hl.fragmentsBuilder"] = "colored">
<cfset local.params["hl.boundaryScanner"] = "default">
<cfset local.params["hl.usePhraseHighlighter"] = true>
<cfset searchResponse = sampleSolrInstance.search(URL.q,0,100,"title",local.params) />

hl.fl contains the list of fields we want highlighting results from and we're telling Solr to use the simple frag list builder. Fragsize tells Solr how long to make the snippets of text returned in the results and hl.snippets sets a cap on the number of snippets that will be returned. We're telling Solr we want to use the Fast Vector Highlighter and that we want it to automatically change the background color behind the matching text. This will set a new color for each matching term, if you're searching for more than one word. We've also enabled Phrase Highlighting in the case that we search for a phrase, which must be surrounded by quotes. Finally we're passing the field name "title" along with our parameters to the search function since we're expecting highlighting results from that field. Currently CFSolrLib is configured to return highlighting results from only one field, but it could be easily modified to accept a list of fields.

If I run searchExample.cfm in a browser and search for "dog" with highlighting enabled, I'll get my document back with just the first instance of "dog" highlighted. If we take a look at the code we'll see:


<cfif structKeyExists(currentResult,"highlightingResult")>#currentResult.highlightingResult[1]#</cfif>

If we get highlighting results back, it displays only the first value in an array. This was done for simplicity in the example. If you wanted to display all of the results, the array could easily be looped to show all of the results, or a specific row could be displayed. If we dump out our result we see:

You'll notice that instead of a query object, we get back a ColdFusion structure containing an array of results. In our highlighting node of the struct we get back an array containing each of the highlighting snippets Solr generated. This array can easily be accessed to display the highlighting result from any section of the document. The fragsize attribute can be adjusted to make these snippets of text longer or shorter. I recommend playing with the example some to see how you can change the way your highlighting results are returned.

As always I'm happy to answer any questions you have about this post and keep in mind, just because you're not getting what you want from ColdFusion's Solr, don't give up on it. Solr is fast and feature rich. Chances are, it will do more than you think.

CFSolrLib can be downloaded at https://github.com/iotashan/cfsolrlib. The version available for download there is distributed with Solr 4.0.0, but I'm working on updating my fork of the repository, https://github.com/VWRacer/cfsolrlib, with Solr 4.8.0. I'll update this post when it's done.

My Proposed Topic for CF Objective 2014

This time around, I decided to throw my hat in the ring to potentially be a speaker at CF Objective and, of course, it involves Apache Solr. My topic is named "Beyond CFIndex - Apache Solr Enterprise Search Server and ColdFusion Integration". I did a similar presentation in October 2012 at CF.Objective(ANZ) in Melbourne, Australia. Solr 4.0 had just been released and I was able to highlight some of the new shiny parts, especially the redesigned web admin interface. My goal with this topic for 2014 is to show off some of the powerful features of Apache Solr that just aren't available to the user when using ColdFusion's built-in version of Solr and how to harness them within ones CF Application. In addition, I'll be highlighting some of the breakthroughs that have come about in Solr 4 since its initial release. Most recently, Solr 4.5 has introduced some VERY useful features including the ability to manage your schema through the API or run in a schema-less mode where Solr uses its best guess on the field type depending on what you send it for indexing.

With all of that said, voting is now open for proposed topics and we need to CF Community's opinions to make CF Objective the great conference it always has been. I have never had the means to go on my own, so speaking would open a lot of doors for me. Speaker or not, I will find some way to be there this time around.

Voting on topics can be done at:

https://trello.com/b/4M6JSoyL/cf-objective-call-for-speakers-2014

You will have to sign up for a Trello username to vote, or you can link Trello to your Google+ account.

I thank everyone for their support and will see you out there!

Bootstrap Typeahead using Solr and ColdFusion

As promised all the way back in February, (not really as soon as I would have liked) here is my entry on using Apache Solr and ColdFusion with Bootstrap Typeahead. Previously, I described how to accomplish this with JqueryUI, but with the increasing popularity of Bootstrap, I thought it would be good to add it to the examples.

This example will use the same Solr setup and CFC method (included in CFSolrLib 3.0) as our previous example. If you want to check out the instructions on how to set those up and build your dictionary your auto suggest terms will come from, you can see them here.

I'm using Bootstrap 2.3.2 and JQuery 1.10.2 in this sample. The JavaScript to call Typeahead on my input is as follows:


$(document).ready(function() {

$('#keyword').typeahead({
minLength: 1,

source: function(query, process) {
$.post('components/cfsolrlib.cfc?method=getAutoSuggestResults&returnformat=json&term=' + query, { limit: 8 }, function(data) {
var parJSON = JSON.parse(data);
                var suggestions = [];
                $.each(parJSON, function (i, suggestTerm) {
suggestions.push(suggestTerm);
});
                process(suggestions);
});
}
});
});

In the JS, I'm telling Bootstrap we're requiring the user to enter at least one character before we show any results. I'm posting to my cfc, passing the string the user has typed into the text box as the "query" variable and making sure to specify a return format of JSON. We're also specifying a limit of 8 results. In the JQuery UI example, we were relying on Solr to limit results. We have a little more control here. I then parse the returned JSON and use the push method to add the results to the suggestions array before calling Bootstrap's process method, returning the results to the input. There are a few more options I could pass to the typeahead method to update multiple inputs, do some custom hightlighting, etc, but for this example we'll keep it basic.

The HTML for our input is pretty basic:


Keyword: <input type="text" id="keyword" name="keyword" class="typeahead" data-provide="typeahead" autocomplete="off" />

I'm using the default "typeahead" style and specifying data-provide="typeahead" while making sure to turn autocomplete off so the browser doesn't try to trump my results.

A working example of the Bootstrap Typeahead version of the auto complete can be viewed at http://jimleether.com/solrexample/typeaheadExample.cfm.

For more ColdFusion and Solr fun, keep your browser pointed here!

JQuery Autocomplete using Solr and ColdFusion

Everyone who has ever set up a search interface for a client has heard it. "We want it to automatically fill in like Google." It sounds simple enough, but I definitely experienced a bit of a learning curve setting this up. There was a lot of conflicting information out there and there was a lot of trial and error. This is what I finally got to work. There are a couple ways to accomplish this type of UI with Solr, the two most popular being JQuery UI Autocomplete and Bootstrap Type Ahead. Today, we're going to discuss JQuery UI Autocomplete.

The first thing we need to do is set up Solr to create a library of words to make available to fill in our text field. We start by adding a field and field type to our schema.

Schema.xml


<!-- Auto Suggest Field Type -->
    
<fieldType class="solr.TextField" name="text_auto">
<analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.ShingleFilterFactory" maxShingleSize="4" outputUnigrams="true" outputUnigramsIfNoShingles="false" />
</analyzer>
<analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>

<!--- Auto Suggest Field --->

<field name="content_autosuggest" type="text_auto" indexed="true" stored="true" multiValued="false"/>

<!--- Tell Solr to copy contents of indexed documents to our Auto Suggest Field --->

<copyField source="content" dest="content_autosuggest"/>

When initially setting this up, I found a variety of articles on how to set up the field type to return phrases instead of single words. A lot of the articles described the way it was SUPPOSED to work, but for me it never did. Through some more research and tinkering, I found that using the Standard Tokenizer Factory with the Shingle Filter Factory in the index analyzer did the trick. Also, be sure to use the Remove Duplicates filter in your query analyzer so you don't get duplicate results at query time. Finally, we add a copyField tag to tell Solr to use data put into the "content" field to be used for our dictionary. This occurs at index time. You can change this to whatever field you want your results to come from. For example, if your users will be searching on the "title" field, you'll want to copy title data into the content_autosuggest field. In my case, they're searching for text within indexed documents, so I'm using the content field.

Next, we need to set up a search component and request handler in our solrconfig to handle our Auto Suggest requests. Technically, auto suggest is a spell check component since it's actually taking our keystrokes and suggesting possible alternate spellings to complete the word or phrase we're typing in. We set it up like so.

solrConfig.xml


<searchComponent class="solr.SpellCheckComponent" name="suggest">
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<str name="field">content_autosuggest</str> <!-- the indexed field to derive suggestions from -->
<str name="buildOnCommit">true</str>
<str name="storeDir">C:\AutoSuggestDictionary</str>
</lst>
<str name="queryAnalyzerFieldType">text_auto</str>
</searchComponent>

<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
<lst name="defaults">
<str name="df">content_autosuggest</str>
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.count">25</str>
<str name="spellcheck.collate">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>

We name our component "suggest" so we know when we query against this component, we're getting back our auto complete "suggestions". Really you can name it whatever you want, but I found "suggest" to make the most sense. There is a bit of mixed information out there on what to set in the "lookupImpl" attribute. This is the lookup class used to make matches. Some information I have read says to always use the FSTLookup due to it's performance. In my case, after some tinkering, I found TSTLookup to work for me better. Information on the different available classes is available on the Solr Wiki page. In the "field" attribute, we list the name of the field we're using for our auto suggest data and set "buildOnCommit" to true. This will ensure that as new content is indexed and committed, it is made available to the suggester component. You can set this to false to save resources, but you will have to run the build command manually to get any new data into the dictionary. The "storeDir" attribute tells Solr where to build the dictionary file. If you do not specify this attribute, the dictionary will be built and stored in memory. This eats up A LOT of memroy. We then specify the field type used for the auto-suggest data, which we set to text_auto in the schema.xml file above.

The request handler is relatively straightforward. We set up a search handler called "/suggest" and set up some default values. Set the "df" (default field) value to content_autosuggest, the field we use exclusively for auto suggest data. Since our component is a spellcheck component, we also have to set up some default values for the spellchecker. First, setting "spellcheck" to true lets Solr know we are using a spellcheck component. The "spellcheck.dictionary" attribute specifys which dictionary (or spellchecker) we're pulling our results from. We're setting this to the "suggest" component we defined above the request handler. The "onlyMorePopular" attribute is not used for spell check, but is used with the suggester to return results sorted by frequency rather than alphabetically. "Count" is simply how many results to return per request. Setting "collate" to true modifys the query slightly. This just ensures that we get the top results for our search term by ordering them properly. Finally, we tie our request handler to our suggest component by adding it to the "components" section of the request handler.

That's it for the Solr setup. I know that was a lot to take in. There's definitely A LOT of configuration to do and it's easy to make a mistake here or there getting all of the pieces tied together. Take it slow and pay attention to the details. As always if you hit a snag, the guy with the green hair is here to lend a hand.

Now, on to the ColdFusion side of the house.

In the latest release of CFSolrLib, there's a method in cfsolrlib.cfc called "getAutoSuggestResults".

getAutoSuggestResults method:


<cffunction name="getAutoSuggestResults" access="remote" returntype="any" output="false">
<cfargument name="term" type="string" required="no">
<cfif Len(trim(ARGUMENTS.term)) gt 0>
<!--- Remove any leading spaces in the search term --->
     <cfset ARGUMENTS.term = "#trim(ARGUMENTS.term)#">
     <cfscript>
h = new http();
h.setMethod("get");
h.setURL("#THIS.solrURL#/suggest?q=#ARGUMENTS.term#");
local.suggestResponse = h.send().getPrefix().Filecontent;
if (isXML(local.suggestResponse)){
         local.XMLResponse = XMLParse(local.suggestResponse);
         local.wordList = "";
         if (ArrayLen(XMLResponse.response.lst) gt 1 AND structKeyExists(XMLResponse.response.lst[2].lst, "lst")){
            local.wordCount = ArrayLen(XMLResponse.response.lst[2].lst.lst);
            For (j=1;j LTE local.wordCount; j=j+1){
             if(j eq local.wordCount){
                local.resultCount = XMLResponse.response.lst[2].lst.lst[j].int[1].XmlText;
                local.resultList = arrayNew(1);
                For (i=1;i LTE local.resultCount; i=i+1){
                 arrayAppend(local.resultList, local.wordList & XMLResponse.response.lst[2].lst.lst[j].arr.str[i].XmlText);
                }
             }else{
                local.wordList = local.wordList & XMLResponse.response.lst[2].lst.lst[j].XMLAttributes.name & " ";
             }
            }
            //sort results aphabetically
            if (ArrayLen(local.resultList)){
             ArraySort(local.resultList,"textnocase","asc");
            }
         }else{
            local.resultList = "";
         }
}else{
local.resultList = "";
}
</cfscript>
<cfelse>
    <cfset local.resultList = "">
</cfif>
<cfreturn local.resultList />
</cffunction>

There are lots of loops in there that are basically building lists of suggestions. The cfc sets up the http call to the suggester request handler and parses the XML that gets returned. I've done a little work so the cfc knows if we're typing our second or third word, the suggester takes that into account when looking for suggestions instead of just looking for the word we're currently typing. It's getting back the top valid results from solr and then alphabetizing the list to make it a little more pleasing to the user's eye. It also does a little error checking to make sure we're getting a valid result back from Solr. If not, it simply returns a blank result rather than throwing back an error to the form and blowing the whole business up in the user's face.

Now onto our form.


<InvalidTag src="js/jquery-1.7.2.js"></script>
<InvalidTag src="js/jqueryui/jqueryui-1.8.22.js"></script>
<link rel="stylesheet" href="css/jqueryui/jqueryui-1.8.22.css" type="text/css" />
<InvalidTag type="text/javascript">
$(function() {
$("#keyword").autocomplete({
source: "components/cfsolrlib.cfc?method=getAutoSuggestResults&returnformat=json"
});
});
</script>

<html>
<head>
    <title>CFSolrLib 3.0 | Auto-Suggest example</title>
</head>
<body>

    Keyword: <input id="keyword" />

</body>
</html>

First, we include JQuery and JQuery UI to make sure JQuery's autocomplete methods are available. For this example, I just created an input called "keyword" that we will be using to generate our results. In the script block at the top, we're binding our input to our CFC that makes the call to Solr and specifying that we want JSON as our return format. JQuery Autocomplete expects JSON as it's data.

As long as you already have information in your index, Solr will build the dictionary when you start it up. If not, start Solr and index a few things. Since we set buildOnCommit = "true", the items will be added to our dictionary when we commit our changes to the index. You can always manually rebuild your dictionary at any time like so.


<cfscript>
h = new http();
h.setMethod("get");
h.setURL("http://localhost:8983/solr/suggest?spellcheck.build=true");
h.send();
</cfscript>

You can simplify this further by just typing that URL into a browser to rebuild the dictionary, but this code snippet works well if you want to insert a button or link into an application to rebuild your dictionary on the fly while debugging.

If all went well and you have all of your bits and pieces set up correctly, you should be able to run this in a browser and see results drop down as you begin to type in the input box.

There's a fully functional example of this code, including a properly set up Solr 4.0 instance in the latest CFSolrLib, available on GitHub.

A working example can be viewed at http://jimleether.com/solrexample/autoSuggestExample.cfm

This is definitely a lot of information to take in. If you get it all working on your first try, well kudos to you. When you get everything customized to your application, this is a very powerful tool.

I plan on writing a post about Bootstrap Type Ahead and Solr very soon. Enjoy!

Apache Solr, Tika and Those Dreaded "X-Files"

I had a great post written on multicore mode, indexing and searching in multicore mode and doing a distributed search over several cores, but good ole Skype locked up my computer and I lost the entire thing. So instead I decided to touch on this subject.

So we all know Solr is nifty by now. If you've done any in depth reading its capabilities, the ways you can break up and analyze text to make your searches are relevant are endless. Solr uses Apache Tika to parse many types of documents and Tika does a great job extracting content and metadata from a large number of different kinds of files.

Those of you using Solr with your CF applications probably know that it's smart to load Tika on the ColdFusion side and parse a file's content before sending it over to Solr. That way you're not streaming an entire file over HTTP, but simply sending a string. This frees up resources and bandwidth for other things, and is also a lot faster. There is an example of how to use Tika on the ColdFusion side in the index example in the latest CFSolrLib.

In an application I maintain the Solr server and code for, things were humming along just fine until I tried to parse a .docx file. All of the sudden the application chokes and I receive a ColdFusion error. This became a huge thorn in my side and was happening with any of the "newer" Microsoft Office files with file extensions ending in "X" (aka Open XML). They became known around the office as "X-Files". In the stack trace of the error was:

Caused by: java.lang.ClassCastException: org.dom4j.DocumentFactory cannot be cast to org.dom4j.

So what the heck does that mean???

To put it in English, Tika processes Open XML formatted documents (docx, xlsx, pptx, etc) in a different way and as a result, must use a different context class loader. This used to be difficult and required a lot of code, but thanks to Mark Mandel, a switchThreadContextClassLoader method was added to Javaloader that does this for us automatically.

In the application I work on, the file extension is stored in a database, so it's very easy for me to make a comparison and switch the context class loader when needed:


<cfscript>
// I probably do not have all the file formats in this list, but these are some common ones.
if (listFindNoCase("docx,xlsx,pptx,docm,xlsm,pptm,ppsx",arguments.fileExtension)) {
// parsing OpenXML files must be done using a different context class loader
var fileObject = application.javaloader.switchThreadContextClassLoader(processOpenXmlFile, { filePath = arguments.filePath });
SolrInstance.add([{name="content",value=fileObject},{name="attr_fileName",value=ARGUMENTS.fileName},{name="id",value=ARGUMENTS.id}]);
} else {
// use our cached copy of tika and parse the file
application.tika.setMaxStringLength(-1);
var fileObject = application.tika.parseToString(createObject("java","java.io.File").init(arguments.filePath));
SolrInstance.add([{name="content",value=fileObject},{name="attr_fileName",value=ARGUMENTS.fileName},{name="id",value=ARGUMENTS.id}]);
}
</cfscript>

The code above reads the file extension and decides how to process the incoming file based on whether or not it's an Open XML formatted file. If it is an Open XML file, it calls the switchThreadContextClassLoader method and loads a new instance of Tika in the processOpenXMLFile method, where the file content is parsed and returned.

Here's look at the processOpenXMLFile method:


<cffunction name="processOpenXMLFile" access="private" returntype="string">
<cfargument name="filepath" type="string" required="yes">

<cfscript>
    // grab a new instance of tika
    var tika = application.javaloader.create("org.apache.tika.Tika").init();
        
    // parse the file
    tika.setMaxStringLength(-1);
    var returnValue = tika.parseToString(createObject("java","java.io.File").init(arguments.filePath));
        
// return the parsed string
    return returnValue;
        
</cfscript>

</cffunction>

Take note of the tika.setMaxStringLength(-1); setting. By default, Tika will only extract the first 1000 characters from a document. You can set this to as many characters as you want, but setting it to -1 will remove the restriction altogether.

Using the code above will allow your application to handle any Open XML file you want to throw at it, just make sure the file extensions you need are listed in the function. If you need to place the code in more than one location, you could store the list of file extensions in a database or variable, that way you only have to maintain it in one place.

This error was a huge pain for me and I want to thank Mark Mandel and Jeff Coughlin for helping me to flush out the issue. I'm a big fan of "passing along the love", so hopefully this will help someone who encounters this problem flush it out in a timely manner. I wasn't able to find much on the subject, and what I was able to find was more for the Java programmer, which I am not.

Have fun out there and keep on indexing!

Dynamically Creating Solr Cores From ColdFusion 9

As you may or may not know, Solr allows you to create and maintain several "cores". Each core can have its own configuration, schema and index. This comes in very handy if you maintain several applications, or different divisions of a single applictaion. You can manually create a core on the Solr server itself by creating a folder to contain the config information and index and adding the name of the core to the solr.xml file, but physically accessing the server and manually creating all of these files isn't always an available solution. So what now? Glad you asked.

First, you're going to want to make sure your Solr server is set up to store your created cores properly. Pop open solr.xml in your multicore folder. You'll probably see something like this:


<solr persistent="false">
<cores adminPath="/admin/cores" host="${host:}" hostPort="${jetty.port:}">
<core name="core0" instanceDir="core0" />
<core name="core1" instanceDir="core1" />
</cores>
</solr>

The important part is persistent="false". By changing this to true, we're telling Solr that we want cores we create "on the fly" to be permanent additions to our list of cores. When this is set to false, the cores we dynamically create will be deleted the next time the server is restarted. Set this to true and, if Solr is already running, restart Solr.

Solr makes it very easy to create a new core with a simple http request.

http://{IP.of.Solr.Server}:8983/solr/admin/cores?action=CREATE&name={NameOfNewCore}&instanceDir={DirectoryContainingConfig}&dataDir={WhereToStoreIndex}

Breakdown of parameters: 8983 - The default port that Solr communicates over. This can be changed when Solr is started if needed, but I haven't had to change this myself.
action=CREATE - Tells Solr we're creating a new core
name - What we want to name our new core
instanceDir - This one requires a bit of explaination...

This will be a directory on the server that contains the config and schema information needed for the core. Naturally we're not going to be dynamically generating these files. What I have done for my appication is manually create a core called "Template". In the Template folder lives in the multicore folder and contains the config and schema files set up for my application. If you're creating cores for a specific application, the schema will be set up to match the fields you need. Otherwise, you can create a generic set of fields that will work in a variety of appllications, or use dynamic fields, which are a bit more advanced and I won't get into now. When I create a core, I set my instanceDir to "Template" and it uses the files in this folder as a template for the new core. This method has worked very well for my needs.

dataDir - The folder that will contain the core's index. This folder does not have to already exist. Solr will create it when the core loads.

There are two additional parameters that I typically don't use:
config - the name of the core's config file
schema - the name of the core's schema file

By default, these are named solrconfig.xml and schema.xml. I tend to just stick to the defaults. If you want to name them something else, you'll have to add the parameters when you create your core.

You can create a core from ColdFusion by doing something like this:


<cfscript>
newCoreRequest = new http();
newCoreRequest.setMethod("get");
newCoreRequest.setURL("http://localhost:8983/solr/admin/cores?action=CREATE&name=MyNewCore&instanceDir=Template&dataDir=MyCoreData");
response = newCoreRequest.send().getPrefix();
</cfscript>

The response will contain a structure containing information about the success or the failure of the creation of the new core.

If you don't want to write it yourself, I have good news for you. I got ambitious last night and added two methods to my CFSolrLib for Solr 4.0 GitHub branch. There's a method called checkForCore that checks to see if a core already exists and another called createNewCore that does just that. I've also added an example cfm that shows how to check for a core and then create a new one based on whether or not it already exists. All of this is available on GitHub at https://github.com/VWRacer/cfsolrlib.

In summary, multicore mode is a very useful way to maintain several indexes on one Solr installation. With a simple http call you can create cores as needed from ColdFusion (or any other kind of applictaion for that matter). The new methods added to CFSolrLib will allow you to easily plug this functionality into any existing CF applictaion. As always, I want to thank Shannon Hicks for writing CFSolrLib to begin with. He created the base version and without his initial hard work, I wouldn't have base code to improve upon. If you're working with a 3.X version of Solr, have a look at Shannon's GitHub repository at https://github.com/iotashan/cfsolrlib for the latest code. I haven't tested my version with any of the Solr 3 versions yet. It may very well work, but I know for a fact Shannon's code works with Solr 3.

**EDIT - My modifications to CFSolrLib have been rolled into the original repository. CFSolrLib is now distributed with Solr 4.0.

More Solr goodness to come...

A Note About ColdFusion and Solr 4.0 Using CFSolrLib

For those of you who saw my presentation at CF.Objective(ANZ), you know that I had updated CFSolrLib to work with Solr 4.0. It was a bit buggy and SolrJ was still in Beta, but it worked. When I got home, I forked the original CFSolrLib branch on GitHub and made my changes public. Now that SolrJ has a release version for version 4.0, the final changes have been committed.

I had some challenges getting this all to work and I figured I'd share what I discovered. It will save headaches in the future. I thought I was being a "good little code monkey" by upgrading all of the jar files SolrJ uses to communicate with Solr. Some of the updates were necessary, as quite a few of the methods used in the SolrJ API for previous versions of Solr had beed depricated. After upgrading, however, I ran into a flurry of problems. To make a long story short, what I discovered is that ColdFusion 9 uses slf4j-log4j12-1.5.6.jar. I had upgraded the log4j jar files in CFSolrLib to version 1.6.6. These two versions are imcompatible and cannot communicate with each other. Shiny and new is tempting, but apparently not always the best path.

The error messages in the stack trace were not the most descriptive.

org.slf4j.impl.Log4jLoggerAdapter.log(Lorg/slf4j/Marker;Ljava/lang/String;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V

I went round and round quite a few times before I figured out which files were causing the issue. I downgraded the log4j files in CFSolrLib down to 1.5.6 and all was well in the CF and Solr world again.

For those of you wishing to tinker with ColdFusion and the standalone Solr 4.0.0 server, you can download the newest version of CFSolrLib from my GitHub at https://github.com/VWRacer/cfsolrlib/. The download includes the most recent version of Javaloader as well as a version of Solr 4.0 set up to work with the example. It also includes examples for indexing, searching, using result highlighting and setting up an auto-complete suggestion as your users type in a search string. I have a pull request in place, so hopefully my changes will become part of the original CFSolrLib repository soon. Documentation on setting up your own custom instance of Solr 4.0 is available on the Apache Solr Wiki page, although I HIGHLY recommend Solr Enterprise Search Server and Solr Cookbook publications available. They were a huge help to me when I was getting spun up. The Cookbook has released a version for 4.0, but the Enterprise Search Server book is still only available for version 3. It's still very valueable, but I expect a version 4 book will be released shortly.

Keep your eyes here for more Solr and ColdFusion examples in the future. I plan on writing up a multicore example very soon.

Ledet Flex Training

Recently, I was informed that I was going to take over development on an AIR application that I wrote the original prototype for back in 2007. Back then my skill set was limited and the code definitely had its issues, but I managed. The problem now is that I haven't written a line of MXML or AS3 since then. The other issue is that many of the newer spark components were not available then, and I have no experience with them. The application, which has been in the hands of a few other talented Flex developers in the mean time, is in great shape. I'm taking over where the last developer left off and he left very good notes as to what was being worked on when he stopped. I just needed a refresher to get me going again.

I'm currently about halfway through a three-day Intro to Flex course from Ledet Graphics Training. So far, I'm very impressed. It's being held at Zenith Technologies, who has gone out of their way to make us comfortable at the training facility. The instructor is extremely knowledgeable in his field and is actively developing, which has kept his skills fresh. He was able to show us some real world examples of his work.

My lunch break is over....back to work for now.

Thank you CF.Objective(ANZ)!

I returned late last night from CF.Objective(ANZ) in Melbourne and had a wonderful time. Great sessions, great friends and colleagues and a great overall experience. I especially want to thank Kai Konig, Mark Mandel and Julie Allen, whom without this event would not have been possible.

The keynote was extremely inspiring to the CFML developer. Mark Drew presented on some of the new great features in Railo 4 and showcased future support for the language. I think everyone left with that "warm fuzzy" feeling. The sessions that followed highlighted technologies like Clojure, FW/1, CFBulder, PhoneGap, digital video, using SCRUM efficiently and, of course, Apache Lucene Solr. The information provided in these sessions was invaluable. A very solid group of professionals was represented at CF.Objective(ANZ) and it was a priveledge to present alongside them.

I want to thank all that attended my workshop and my session on integrating Solr with ColdFusion applications. It felt great to pass along my enthusiasm for this technology to others.

For those interested, slides and code samples can be downloaded here.

I know there are a couple questions some of you had that I need to research. I will post the answers to those shortly.

Thank you again and I hope to see you all at a future conference.

Happy "How I Got Started In ColdFusion" Day!!

Steve Bryant had the great idea of making August 1st "How I Got Started In ColdFusion" Day. He proposed that every blogger write a post about their beginnings in ColdFusion. This is my story.

Back in 2007 (yes, I'm actually quite a noob. Kind of speaks to how easy ColdFusion is to learn) I was holding down a job cutting glass in a restort beach area in Ocean City, MD. I had a strong background with computers, networking and programming in several languages, but since I didn't have a degree in this field, getting a job was a bit difficult. I wasn't really making enough money to support my family. I was driving 90 miles a day to work and back, the price of gas was killing me, my home was in forclosure, my car was days away from getting reposessed and I just generally wanted to go play in traffic. I was having a conversation with my good friend, and now co-worker, Yancy Wharton about the state of my affairs. I was miserable in my current job and my home life certainly wasn't a ray of sunshine. He queried "Do you think you could learn ColdFusion?". I thought about it for a few seconds. Before I could get a word out, Yancy was explaining what ColdFusion was, the benefits of learning the language and that my background in other languages should make it very easy for me to pick up. I agreed to give it a shot, and Yancy went to talk to his employer about picking me up part time.

A few weeks later, I found myself sitting in front of a laptop setting up a work environment. I was still working my full time days at the glass company, but was working another four hours in the evening as a programmer. Initially, I was hired to write a prototype for an offline/online version of the company's application. Thus I began my journey with Apollo, which most of you know now as Adobe AIR. This was my first experience with MXML and Actionscript. I ended up with an application that did what needed to be done, but my lack of programming experience with the language made for some very hard to read code and lengthy files. The application was later rewritten by some other members of our company, who made it into a very impressive product. My first ColdFusion was writing the components to handle the back end of the AIR application.

Four months later, I was hired full time and moved to strictly ColdFusion code. I was amazed at how simple it was to work in CFML. I spent a lot of the time away from work browsiing through tutorials and training videos, but the real CF addiction started when I attended my first CFUnited. When I got to see the industry professionals creating wonderful things and doing these presentations showing how easily they can be done, I was hooked. I became bound and determined that I was going to immerse myself in this and become the best I could be at it.

So, in a sense, ColdFusion saved me. I very quickly got a job making enough money to save my house, catch up on my car payments and start dragging myself out of debt. My mood improved greatly. I was doing something I truly enjoyed and felt secure in my position. I was watching my creations come to life on the screen.

Today, I'm still happily plugging away at ColdFusion. I'm working full time for a government contractor and I have my own small development and consulting company at home. I still play around with the occasional AIR app, but for the most part, ColdFusion is my language of choice. It has brought me into new technologies like JQuery (which I know works with several languages, but it works DAMN well with CFML), which I'm truly enjoying as well. I'm also the Co-Manager for the Eastern Shore of Maryland User Group.

So I'd like to thank everyone who has helped to mold me into the CFDude I am today. Thank you Ray Camden, Ben Nadel, Simon Free, Yancy Wharton, Sean Corfield, Dan Wilson, Mark Drew, Aaron West, Jason Dean, Dee Sadler, Adam Lehman.....the list goes on and on. Sorry if I forgot anyone, the post was going to become an endless list of names eventually. Thank you to all, and thank you ColdFusion.

More Entries

Copyright © 2008 - Jim Leether BlogCFC was created by Raymond Camden. This blog is running version 5.9.1.001. Contact Jim