Hello,
A number of you have emailed me asking why sometimes there are two occurrences of a term in the index.
In addition to indexing the contents of a document, lucene also indexes metadata like the title, url etc. When terms appear in these metadata, they are indexed separately, hence some terms occur twice. I suggest you either pick the first occurrence of the term only, or check the term.field( ) property (and only consider those terms that return "contents").
Thanks and Regards,
Sushovan De
Sushovan De
No comments:
Post a Comment