Write QueryParser in JavaScript and use Lucene JAVA Indexer for searching

This is a reminder, as the mail was sent in the weekend! Please go through it and suggest me a solution.

Thanks in advance, /KasunBG

On Sun, Jul 4, 2010 at 3:44 AM, Kasun Gajasinghe wrote:

Lucene Scoring

You are calling the explain method incorrectly. You need something like

System.out.println(indexSearcher.explain(query, 0));

See the javadocs for details.

Lucene and Chinese language

Hi!

We are using lucene in our project to search through information objects which works fine. For indexing we use the StandardAnalyzer. Now, we have to support the Chinese language. I found out that the Chinese words and letters are correctly saved in the index but the query to search for them does not work. Example: in English language the query is “text” which we parse to “*text*”. If we search for Chinese words / phrases like “佛山东方书城”the query is “*佛山东方书城*“ but there are no search results. If the query places blanks between the single letters / symbols like this “*佛 山 东 方 书 城*“ we are getting results. Does the StandardAnalyzer interpret each Chinese letter as one word? What are best practices for this case? Shall we use another analyzer (Chinese analyzer)? Or is it better to replace the query parser in this case?

Regards, Jacqueline.

A question regarding the setSlop method of class PhraseQuery (Lucene version 3.0.1)

Hi,

I know the indexed content contains the following text: “This is a test”. And the search phrase I used is “This is a formal test”, and then I set the slop of the PhraseQuery as 2 with setSlop(2), but I found that I can not get a search result. If I set the search phrase as “This is formal test”, then I can get the search result.

So what is the problem here, thanks in advance.

Attached is the Java doc for the setSlop method:

public void *setSlop*(int s)

Sets the number of other words permitted between words in query phrase. If zero, then this is an exact phrase search. For larger values this works like a WITHIN or NEAR operator.

The slop is in fact an edit-distance, where the units correspond to moves of terms in the query phrase out of position. For example, to switch the order of two words requires two moves (the first move places the words atop one another), so to permit re-orderings of phrases, the slop must be at least two.

More exact matches are scored higher than sloppier matches, thus search results are sorted by exactness.

The slop is zero by default, requiring exact matches.

arguments in favour of lucene over commercial competition

Hi,

I am trying to compile some arguments in favour of lucene as management is deciding weather to standardize on lucene or a competing commercial product (we have a couple of produc, one using lucene, another using commercial product, imagine what am i using). I searched the lists but could not find any post, although I remember seeing such posts in the past.

Does somebody kept such posts linked or something? Or does someone know of some page that would help me?

I would like to show: – traction of lucene, really improving a lot last couple of years – rich ecosystem (solr…) – references of other companies choosing lucene/solr over commercial (be it Fast or whatever)

thanks

Last Call: Lucene Revolution CFP Closes Tomorrow Wednesday, June 23, 2010, 12 Midnight PDT

Lucene Revolution Call For Participation – Boston, Massachusetts October 7 & 8, 2010

The first US conference dedicated to Lucene and Solr is coming to Boston, October 7 & 8, 2010. The conference is sponsored by Lucid Imagination with additional support from community and other commercial co鈥恠ponsors. The audience will include those experienced Solr and Lucene application development, along with those experienced in other enterprise search technologies interested becoming more familiar with Solr and Lucene technologies and the opportunities they present.

We are soliciting 45鈥恗inute presentations for the conference.

Key Dates: May 12, 2010 Call For Participation Open June 23, 2010 Call For Participation Closes June 28, 2010 Speaker Acceptance/Rejection Notification October 5鈥6, 2010 Lucene and Solr Pre鈥恈onference Training Sessions October 7鈥8, 2010 Conference Sessions

Topics of interest include: Lucene and Solr in the Enterprise (case studies, implementation, return on investment, etc.) 鈥淗ow We Did It鈥 Development Case Studies Spatial/Geo search Lucene and Solr in the Cloud (Deployment cases as well as tutorials) Scalability and Performance Tuning Large Scale Search Real Time Search Data Integration/Data Management Lucene & Solr for Mobile Applications

All accepted speakers will qualify for discounted conference admission. Financial assistance is available for speakers that qualify.

To submit a 45鈥恗inute presentation proposal, please send an email to cfp@lucenerevolution.org with Subject containing: , Topic containing the following information in plain text.

If you have more than one topic proposed, send a separate email. Do not attach Word or other text file documents.

Return all fields completed as follows: 1. Your full name, title, and organization 2. Contact information, including your address, email, phone number 3. The name of your proposed session (keep your title simple, interesting, and relevant to the topic) 4. A 75鈥200 word overview of your presentation; in addition to the topic, describe whether your presentation is intended as a tutorial, description of an implementation, an theoretical/academic discussion, etc. 5. A 100鈥200鈥恮ord speaker bio that includes prior conference speaking or related experience To be considered, proposals must be received by 12 Midnight PDT Wednesday, June 23, 2010.

Please email any general questions regarding the conference to info@lucenerevolution.org. To be added to the conference mailing list, please email signup@lucenerevolution.org. If your organization is interested in sponsorship opportunities, email sponsor@lucenerevolution.org.

We look forward to seeing you in Boston!

Lucene 2.4.0 vs 2.9.1 Query Grouping

Came across a 2.4.0 bug. The attached unit test demonstrates that if you have 1000 documents in the query and use grouping, the query doesn’t work as expected (returns 92 results). The same query in 2.9.1 (see pom.xml) works as expected. Is there a documented defect on this issue?

Thanks,

Ivan Provalov

GroupingQueryTest.zip

Finding the position of search hits from Lucene

Hello

How can I find and save the position of search hits from Lucene ?.. Like this: doc1 : 1 doc2: 2 … doc 100: 100

I use lucene 3.0

Thank U

Titus

is there any resources that explain detailed implementation of lucene?

such as the detailed process of store data structures, index, search and sort. not just apis. thanks.

Lucene Newbie Questions

Hello all, I’m considering Lucene for a specific application and am trying to ensure that it is the right tool for what I’m trying to accomplish.

At a high level I have a list of restaurants in a database and a list of tags related to the restaurant (e.g. Italian, Formal, Expensive, etc). Each restaurant also has a location (longitude/latitude).

My primary goal using Lucene is to conduct searches where the user can do things like:

- Misspell the name of the restaurant (by a few chars) – Type “Italian Food” instead of just Italian or perhaps “Great Italian” – Or even use some synonyms (e.g. Deli and Delicatessen) – of course I’d define these terms.

Are these types of use cases something that can be done with Lucene? Or is there a more appropriate API that I haven’t found?

Thanks.