[Save this search]

Status
All
   Fixed (7187)
  Closed (2935)
Open (2324)
   Won't Fix (543)
   Duplicate (297)
   Invalid (217)
   Not A Problem (195)
Issue type
All
  Issue (2117)
  PR (207)
Author relation
All
  None (2111)
  Member (406)
  Contributor (193)
  New contributor (37)
Created
All
  Past day (2)
  Past 2 days (3)
  Past 3 days (3)
  Past week (6)
  Past month (29)
  Past 3 months (98)
  Past 6 months (159)
  Past year (266)
Updated
All
  Past day (9)
  Past 2 days (14)
  Past 3 days (17)
  Past week (28)
  Past month (66)
  Past 3 months (141)
  Past 6 months (210)
  Past year (382)
Updated ago
All
  > 1 day ago (2315)
  > 2 days ago (2310)
  > 3 days ago (2307)
  > 1 week ago (2296)
  > 1 month ago (2258)
  > 3 months ago (2183)
  > 1 year ago (1941)
Comment count
All
  0 (531)
  1 (348)
  2 - 5 (717)
  6 - 10 (373)
  10 - 20 (275)
  > 20 (144)
Reaction count
All
  0 (2061)
  1 (156)
  2 - 5 (86)
  6 - 10 (15)
  10 - 20 (6)
  > 20 (1)
Review Requested
All
  jpountz (10)
  mikemccand (6)
  dweiss (3)
  romseygeek (3)
  rmuir (2)
  msokolov (2)
  gsmiller (2)

See all 20...
Mentioned
All
  jpountz (67)
  mikemccand (55)
  rmuir (34)
  uschindler (30)
  msokolov (22)
  benwtrent (22)
  dweiss (13)

See all 141...
Reviewed
All
  jpountz (28)
  mikemccand (25)
  benwtrent (10)
  msokolov (9)
  uschindler (9)
  dweiss (8)
  rmuir (8)

See all 44...
Commented
All
  asfimport (1394)
  github-actions[bot] (156)
  jpountz (97)
  mikemccand (96)
  rmuir (61)
  msokolov (42)
  benwtrent (41)

See all 197...
User
All
  asfimport (1844)
  mikemccand (165)
  github-actions[bot] (156)
  jpountz (148)
  rmuir (86)
  uschindler (65)
  benwtrent (64)

See all 295...
Last comment user
All
  asfimport (1351)
  github-actions[bot] (148)
  jpountz (28)
  mikemccand (25)
  dweiss (14)
  gsmiller (13)
  stefanvodita (13)

See all 105...
Draft
All
  No (170)
  Yes (37)
Component
All
  core (498)
  analysis (143)
  highlighter (44)
  spatial (38)
  facet (37)
  queryparser (25)
  suggest (14)

See all 20...
Type
All
  enhancement (1103)
  bug (703)
  task (195)
  test (69)
  documentation (21)
Labels
All
  legacy-jira-fix-versio... (214)
  legacy-jira-fix-versio... (169)
  Stale (147)
  affects-version:4.0-ALPHA (81)
  tool:build (54)
  affects-version:6.0 (37)
  affects-version:9.0 (35)

See all 154...
Commits?
All
  No (2324)
Reporter
All
  rmuir (262)
  mikemccand (142)
  jpountz (118)
  dsmiley (68)
  uschindler (49)
  dweiss (49)
  iverase (40)

See all 671...
Assignee
All
  Unassigned (2094)
  mikemccand (35)
  uschindler (26)
  romseygeek (26)
  dsmiley (23)
  rmuir (14)
  dweiss (12)

See all 38...
  Filters: Status (Open),  Issue type,  Author relation,  Created,  Updated,  Updated ago,  Comment count,  Reaction count,  Review Requested,  Mentioned,  Reviewed,  Commented,  User,  Last comment user,  Draft,  Component,  Type,  Labels,  Commits?,  Reporter,  Assignee

#14004: How to configure TieredMergePolicy for very low segment count?
36.4 minutes ago  2 comments  0 votes  0 watches  jpountz
Description I have been experimenting with configuring TieredMergePolicy to keep the segment ... Interestingly, an index that is less than 1GB can still have 10 segments with the above merge ... E.g. consider the following segment sizes: 100kB, 300kB, 800kB, 2MB, 5MB, 12MB, 30MB, 70MB, 150MB, ...
    jpountz 36.4 minutes ago:  I confirmed that #266 seems to help for this case. It does find merges to run with the above example (100kB, 300kB, 800kB, 2MB, 5MB, 12MB, 30MB, 70MB, ... And when I run our simulation tests in BaseMergePolicyTestCase, it ends up with a similar number of ...
    jpountz 5.4 hours ago:  Interestingly, it looks like this PR https://github.com/apache/lucene/pull/266 would do what I'm ...

#13525 PR: [WIP] Multi-Vector support for HNSW search
1.4 hours ago  83 comments  0 votes  0 watches  benwtrentcpoerschkegithub-actions[bot]jimczikrickertmikemccandmsokolovnavneet1vvigyasharma
Adds support for multi-valued vectors to Lucene. In addition to max-similarity aggregations like parent-block joins, this change supports ColBERT ... Documents can have a variable number of vector values, but to support distance function ...
    krickert 1.4 hours ago:  > Chunk-Based Highlighting – Interesting. With getAllVectorValues(), we can find all vector values with similarity above a separate ... Not sure.
    vigyasharma 3.5 hours ago:  Thank you for sharing these use-cases @krickert ! 1. **Aggregate Scoring** – I think we can do this today by joining the child doc hits with their ...

#13985 PR: Introduces IndexInput#updateReadAdvice to change the ReadAdvice while merging vectors
2 hours ago  36 comments  0 votes  0 watches  ChrisHegartyjpountznavneet1vshatejasuschindler
The change is needed to be able to reduce the force merge time. Lucene99FlatVectorsReader is opened with IOContext.RANDOM, this optimizes searches with madvise as ... For merges we need sequential access and ability to preload pages to be able to shorten the merge ...
    shatejas 2.3 hours ago:  > The org.apache.lucene.index.TestConcurrentMergeScheduler.testNoWaitClose test hits a new assert ... I need to look to see if it is a test issue or more of a design issue with finishMerge.
    ChrisHegarty 6.4 hours ago:  The org.apache.lucene.index.TestConcurrentMergeScheduler.testNoWaitClose test hits a new assert ... I need to look to see if it is a test issue or more of a design issue with finishMerge.

#13387: Support for criteria based DWPT selection inside DocumentWriter
3.3 hours ago  23 comments  0 votes  0 watches  RS146BIJAYjpountzmikemccandvigyasharma
Description # Issue Today, Lucene internally creates multiple DocumentWriterPerThread (DWPT) ... When documents are indexed by the same DWPT, they are grouped into the same segment post flush. As DWPT assignment to documents is only concurrency based, it’s not possible to predict or control ...
    vigyasharma 3.3 hours ago:  There is a lot of good work here @RS146BIJAY . Some preliminary questions to understand this better: 1. Does the OpenSearch client directly work with 'n' different log-group specific IndexWriters?
    RS146BIJAY 6.6 hours ago:  @vigyasharma @jpountz @mikemccand Any thoughts on the above approach on using multiple IndexWriter ...

#266 PR: LUCENE-10073: Reduce merging overhead of NRT by using a greater mergeFactor on tiny segments.
4 hours ago  10 comments  0 votes  0 watches  github-actions[bot]jpountzmikemccand
#11111
    jpountz 4 hours ago:  You are right, these were duplicate variables!
    jpountz 4 hours ago:  Thank you, fixed.

#14003 PR: Only consider clauses whose cost is less than the lead cost to compute block boundaries in ...
5.3 hours ago  1 comments  0 votes  0 watches  jpountz
WANDScorer implements block-max WAND and needs to recompute score upper bounds whenever it moves to ... Thus it's important for these blocks to be large enough to avoid re-computing score upper bounds ... With this commit, WANDScorer no longer uses clauses whose cost is higher than the cost of the ...
    jpountz 7.7 hours ago:  The speedup is not as good, but still significant: ` TaskQPS ...

#13998 PR: Add IndexInput isLoaded
5.8 hours ago  19 comments  0 votes  0 watches  ChrisHegartyjpountznavneet1vrmuir
This commit adds IndexInput::isLoaded to help determine if the contents of an input is resident in ... The intent of this new method is to help build inspection and diagnostic infrastructure on top. The initial requirement is to help understand if vector data and more specifically the HNSW graph ...
    ChrisHegarty 5.8 hours ago:  > This works for me. Maybe implement this API on our in-memory index inputs to return true, e.g. ByteBuffersIndexInput?
    ChrisHegarty 5.8 hours ago:  I added a note about the time complexity. I'd like to keep the tri-state of the return type, at least for now. Since I think will be useful to know that the isLoaded-ness or not, is determinable or not.

[21.2 msec search, 22.3 msec total]