Recently we had to look for a solution to the ever growing problem of searching huge amounts of data.

Requirements
Initially the dataset will contain about 1.000.000 records entangled in complex ways using a lot of ternary relations.

To solve the problem of searching I had used Lucene before, but was never really satisfied with the robustness of a 'raw' Lucene installation. When used in complex situations (think heavy load and clustered environments) it is immensely complex to get configured correctly. There is also the matter of keeping the indexes synchronized to the data they are based upon.

While searching for a solution we where bound by the following requirements:

  • Integration with Hibernate
  • Support for running in a clustered environment
  • Good performance
  • Open Source
  • Preferable avoid the use of specific value objects
  • Provide 'fuzzy' search functionality

Selection
After testing some frameworks (hibernate's lucene integration, Spring's Lucene integration), and database solutions (tsearch2). The Lucene frameworks seamed to lack maturity and should be considered experimental. TSearch2 proved to be a bit to slow for the type of searches we wanted to execute. This is when we found out about Compass, a Lucene based framework which is part of the OpenSymphony project.

Compass is a first class open source Java Search Engine Framework, licensed under the Licensed under the Apache License (V2), which enables developers to decoratively add search capabilities to an application. A really strong aspect of Compass is the way it integrates with all the leading ORM and Spring Also, compass does not try to hide Lucene's features - all of Lucene's functionality is available through Compass.

Compass consists of three modules: Compass Core, Compass GPS and Compass Spring integration. Compass Core is the most fundamental part of Compass. It holds Lucene extensions for transactional indexing, search engine abstraction, ORM like API and transaction management integration. The Compass GPS module contains the functionality needed to integrate with all supported data sources. And, as the name probably gave away; the Spring integration modules contains everything needed to configure and integrate Compass with Spring.

The GPS for Hibernate 3 uses the Hibernate event system to provide realtime mirroring of changes to the data in the underlying index. When configured from Spring Compass and Hibernate can use the same transactionmanager (SpringSyncTransaction) to reduce discrepancies between the index and the actual data to this absolute minimum.

Examples
Preparing your objects to be indexed can be done in various ways, since we already used Hibernate annotations we decided to go for the annotations provided by Compass. By default Compass provides a lot of useful annotations, which can be found in the org.compass.annotations package.

The following piece of code (taken from this nice article on infoq) illustrates how the annotations are used:

JAVA:
  1. @Searchable
  2. public class Author {
  3.  
  4.     @SearchableId
  5.     private Long id;
  6.  
  7.     @SearchableComponent
  8.     private String Name;
  9.  
  10.     @SearchableReference
  11.     private List books;
  12.  
  13.     @SearchableProperty(format = "yyyy-MM-dd")
  14.     private Date birthdate;
  15. }
  16.  
  17.  
  18. // ...
  19.  
  20.  
  21. @Searchable
  22. public class Name {
  23.  
  24.     @SearchableProperty
  25.     private String firstName;
  26.  
  27.     @SearchableProperty
  28.     private String lastName;
  29. }

When used in conjunction with Spring searching entities can be done using the CompassDaoSupport/CompassTemplate:

JAVA:
  1. public class ExampleDao extends CompassDaoSupport {
  2.     public Author findFirstMatchingAuthor(final String query) {
  3.         Author author = (Author) getCompassTemplate().execute(new CompassCallback() {
  4.             public Object doInCompass(CompassSession session) {
  5.                 CompassHits hits = session.find(query);
  6.                 Author a = (Author) hits.data(0);
  7.                 return a;
  8.             }
  9.         });
  10.         return author;
  11.     }
  12. }

There is much more to compass then the short examples here are able to demonstrate; most of it can be found in the online documentation or the provided example applications in the download.

Conclusion
although the framework has some quirks (Documentation can be a bit scarce), we are very pleased with the functionality provided by the Compass framework; it really succeeds in helping a developer to use Lucene in a sensible way.

--update--
We had a lot of unexpected errors with compass lately, which where cause by a nasty bug concerning the caching of blobs within compass. My colleague managed to track this down and fix it... but this fix is not yet available in compass itself.


1 Response to “Compass: guiding your applications through data”

  1. 1 Hes

    Sounds pretty cool!

    Especially the annotation stuff, and doInCompass. If it all works, it will speed up integration a lot.

    But if it can manage your distributed data updates in a good way it will be a real winner!

    I will check this out the next ime I need a Lucene-based framework.

Leave a Reply





About

Welcome to the weblog of Peter Maas. Here you'll find various posts related to stuff I like (like my kids and espresso) and stuff I do (like developing software).

JavaOne 2008 Pictures


Golden Gate Moscone Center - JavaOne Rudie smashmouth Cable Car line Joshua Bloch at JavaOne2008 Greenland Okke en Rudie Stage being build in the nearby park Tim Bray introducing the (J)Ruby panel nearby hotel javaone2008 keynote sea_lion Scribbled Sun Logo Java + You on a cab Hotel room golden_gate_warning_sign Community One Keynote Stretched Limo javaone 2008 goodybag
View more photos >

Categories



Meld u aan voor PayPal en begin direct met het accepteren van creditcardbetalingen.