Unsupported operation in TermDocs.next() when migrating from 2.4 to 2.9

That is spooky. It certainly sounds like a regression.

It’s odd that your MultiTermEnum is pulling an AllTermDocs under the hood — this should only happen if you did a .seek(null) on it, but your code seems to first check that term != null, so it should never pass a null term.

Can you add a temporary assert to DirectoryReader.java, in 29x, around line 1191. It should be this method:

protected TermDocs termDocs(IndexReader reader) throws IOException { return term==null ? reader.termDocs(null) : reader.termDocs(); }

Add an assert term != null, and run you code w/ assertions on, and see if it trips (the assert is not safe, in general, but should not trip in how I think you are using it). If it does trip… try to track down how a null term got in there?

Mike

On Tue, Jun 29, 2010 at 5:24 AM, Jerven Bolleman wrote:

MultiPhraseQuery.toString() throws null pointer exception

I opened LUCENE-2526 for this…

Mike

On Thu, Jul 1, 2010 at 2:19 PM, Woolf, Ross wrote:

search for a string which begins with a ‘$’ character

“$” won’t survive StandardAnalyzer. You can use WhitespaceAnalyzer instead.

CFP for Surge Scalability Conference 2010

A quick reminder that there’s one week left to submit your abstract for this year’s Surge Scalability Conference. The event is taking place on Sept 30 and Oct 1, 2010 in Baltimore, MD. Surge focuses on case studies that address production failures and the re-engineering efforts that led to victory in Web Applications or Internet Architectures.

Our Keynote speakers include John Allspaw and Theo Schlossnagle. We are currently accepting submissions for the Call For Papers through July 9th. You can find more information, including suggested topics and our current list of speakers, online:

http://omniti.com/surge/2010

I’d also like to urge folks who are planning to attend, to get your session passes sooner rather than later. We have limited seating and we are on track to sell out early. For more information, including the CFP, sponsorship of the event, or participating as an exhibitor, please visit the Surge website or contact us at surge@omniti.com.

Thanks,

about contrib instantiated

it is said that “At a few thousand ~160 characters long documents InstantiatedIndex outperforms RAMDirectory some 50x, 15x at 100 documents of 2000 characters length, and is linear to RAMDirectory at 10,000 documents of 2000 characters length. “. I have an index of about 8,000,000 document and the current index size is about 30GB. Is it possbile to use this contrib to speed up my search? I have enough memory for it. Thank you.

Lucene and Chinese language

Hi!

We are using lucene in our project to search through information objects which works fine. For indexing we use the StandardAnalyzer. Now, we have to support the Chinese language. I found out that the Chinese words and letters are correctly saved in the index but the query to search for them does not work. Example: in English language the query is “text” which we parse to “*text*”. If we search for Chinese words / phrases like “佛山东方书城”the query is “*佛山东方书城*“ but there are no search results. If the query places blanks between the single letters / symbols like this “*佛 山 东 方 书 城*“ we are getting results. Does the StandardAnalyzer interpret each Chinese letter as one word? What are best practices for this case? Shall we use another analyzer (Chinese analyzer)? Or is it better to replace the query parser in this case?

Regards, Jacqueline.

inverted index in clucene

Hello ,

Can we use subset of documents , for searching .

Lets say I have hash map of

P1 -1,2,3,4

P2 – 3,4,5

P3-7,5,3

Now I have an documents in lucene index stored as

1-P1

2-P1

3-P1,P2,P3

4-P1,P2

5-P2,P3

7-P3

..

..

when i search docs with P2 I get 3,4,5

Now I want my search to b restricted to just 3,4,5 doc only. where by I can search only these docs for further parameters.

1. How to go abt it.

2. Is there any other seraching mechanism I should use, or Lucene is better fit?

3. should i keep my hash map also in lucene indexes and is then thr a method to link it to another lucene indexes.

Regards,

Suman

Document Order in IndexWriter.addIndexes

while calling addindexes or addindexes with no optimize can any gurantee be given about the document order in the new documents given that the order of directories/indexreader is fixed.

So is it that ith document coming from jth indexreader will always have some x(i,j) position in the final merged index ?

AUTO: Paul Magrath is out of the office (returning 01/07/2010)

I am out of the office until 01/07/2010.

Note: This is an automated response to your message “example of processing terms in query results?” sent on 29/6/10 18:09:45.

This is the only notification you will receive while this person is away.

example of processing tokens from query results?

Can someone point me to a code example that demonstrates processing tokens from a query result? I want to iterate over TermPositions but can’t find my way to an object that instantiates that interface.

Thank you for the assistance, Peter