Saturday, September 17, 2011

[CSE 494] Term appears twice in the index

Hello,
  A number of you have emailed me asking why sometimes there are two occurrences of a term in the index.
  In addition to indexing the contents of a document, lucene also indexes metadata like the title, url etc. When terms appear in these metadata, they are indexed separately, hence some terms occur twice. I suggest you either pick the first occurrence of the term only, or check the term.field( ) property (and only consider those terms that return "contents").
 
Thanks and Regards,
Sushovan De

No comments:

Post a Comment