`_recovery_source` sometimes remains after merge #82595

jtibshirani · 2022-01-14T04:02:28Z

If _source is disabled or filtered in the mappings, we add a _recovery_source field to support shard recoveries and CCR. Once it's no longer needed, then future merges will drop the _recovery_source field to reclaim space.

In certain cases, it appears that _recovery_source can stick around even after a merge. I noticed this issue through the dense vector rally track. This command indexes 100,000 documents with _source disabled, then force merges to 1 segment:

esrally race --track=dense_vector --challenge=index-and-search --track-params="ingest_percentage:10" --on-error abort

At the end, the shard was larger than expected:

195M	data/indices/gPefBjHjTCCxU_EnbSuGrQ/0/index

Using the disk usage API, we see this is due to recovery source:

   "_recovery_source" : {
        "total" : "149.9mb",
        "total_in_bytes" : 157209753,
        ....

There are no replicas, so the force merge should have removed recovery source. I can reproduce this with both 1 and 2 shards. I haven't found a small-scale reproduction yet.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-01-14T04:02:30Z

Pinging @elastic/es-distributed (Team:Distributed)

ruslaniv · 2022-11-30T08:02:40Z

I'm seeing the exact same behavior when trying to exclude dense_vector field from being stored in the _source:

"mappings": {
        "_source": {"excludes": ["title_vector"]},
        "properties": {
            "title_vector": {
                "type": "dense_vector",
                "dims": 1024,
                "index": true,
                "similarity": "dot_product"
            },        
}

Upon inspecting the index and based on the size of _recovery_source field and amount of documents I indexed, it looks like the field stores vectors as plain floats without any compression.
FYI one 1024-dim vector stored as plain floats takes approximately 21-22 Kb whereas the same vector compressed and optimized by ES takes about 4 Kb. So that's quite a difference!

ruslaniv · 2022-12-06T09:48:30Z

@elasticmachine is there any progress on this issue?
Right now this problem is causing our index to grow to 250Gb instead of estimated 50Gb and the index no longer fits available RAM which severely degrades search performance.

DaveCTurner · 2022-12-06T11:24:33Z

The _recovery_source field should be removed at merge time in all docs that are in the latest safe commit and are not retained for recovery by any retention lease. There is no real coordination between merges and retention lease movements so it's possible for documents including _recovery_source to end up in a large-ish segment that doesn't see another merge for a long time. But there shouldn't normally be very many documents like that, at least not unless there are some retention leases which lag a long way behind the max seqno for some reason.

You can get information about the retention leases and sequence numbers with the following command:

GET /<INDEX>/_stats?level=shards&filter_path=indices.*.shards.*.retention_leases,indices.*.shards.*.seq_no,indices.*.shards.*.commit

Can you share the output of that command here?

Edit to add: could you also share the full breakdown of disk usage for your index:

POST /<INDEX>/_disk_usage?run_expensive_tasks=true

ruslaniv · 2022-12-07T06:50:43Z

David, thank you for your help!

Here is the info on the retention leases on the index in question

Output

{
  "indices": {
    "proposals.proposals.vector_20221119": {
      "shards": {
        "0": [
          {
            "commit": {
              "id": "FOt/dGPF1B6NxwFV1EWNlw==",
              "generation": 682,
              "user_data": {
                "local_checkpoint": "14491789",
                "es_version": "8.4.1",
                "min_retained_seq_no": "14484118",
                "max_seq_no": "14491789",
                "history_uuid": "8ZXxEfCfRVibbeCI0-hD2Q",
                "max_unsafe_auto_id_timestamp": "-1",
                "translog_uuid": "FyoeUjQ1T8yBf-fZQ8gqsQ"
              },
              "num_docs": 3939193
            },
            "seq_no": {
              "max_seq_no": 14491789,
              "local_checkpoint": 14491789,
              "global_checkpoint": 14491789
            },
            "retention_leases": {
              "primary_term": 1,
              "version": 78196,
              "leases": [
                {
                  "id": "peer_recovery/vGiOGPHoQnSNmndhy2Np1A",
                  "retaining_seq_no": 14491790,
                  "timestamp": 1670374822751,
                  "source": "peer recovery"
                },
                {
                  "id": "peer_recovery/23uocumgThOiaraPXP_JNA",
                  "retaining_seq_no": 14491790,
                  "timestamp": 1670374822751,
                  "source": "peer recovery"
                }
              ]
            }
          },
          {
            "commit": {
              "id": "0tJKDmvaz43/5YosYvy9nQ==",
              "generation": 678,
              "user_data": {
                "local_checkpoint": "14491789",
                "es_version": "8.4.1",
                "min_retained_seq_no": "14487455",
                "max_seq_no": "14491789",
                "history_uuid": "8ZXxEfCfRVibbeCI0-hD2Q",
                "max_unsafe_auto_id_timestamp": "-1",
                "translog_uuid": "YU3iTTrPSumU5E7tgGwPlw"
              },
              "num_docs": 3939193
            },
            "seq_no": {
              "max_seq_no": 14491789,
              "local_checkpoint": 14491789,
              "global_checkpoint": 14491789
            },
            "retention_leases": {
              "primary_term": 1,
              "version": 78196,
              "leases": [
                {
                  "id": "peer_recovery/vGiOGPHoQnSNmndhy2Np1A",
                  "retaining_seq_no": 14491790,
                  "timestamp": 1670374822751,
                  "source": "peer recovery"
                },
                {
                  "id": "peer_recovery/23uocumgThOiaraPXP_JNA",
                  "retaining_seq_no": 14491790,
                  "timestamp": 1670374822751,
                  "source": "peer recovery"
                }
              ]
            }
          }
        ],
        "1": [
          {
            "commit": {
              "id": "wlSNJOgD2Jm4Ms4eMN8n1w==",
              "generation": 682,
              "user_data": {
                "local_checkpoint": "14526770",
                "min_retained_seq_no": "14521481",
                "es_version": "8.4.1",
                "max_seq_no": "14526770",
                "translog_uuid": "7txDNt4ITMarfIm-NbPGZw",
                "max_unsafe_auto_id_timestamp": "-1",
                "history_uuid": "E-gKvtUtSTS0Ff5ABOv0lQ"
              },
              "num_docs": 3941107
            },
            "seq_no": {
              "max_seq_no": 14526770,
              "local_checkpoint": 14526770,
              "global_checkpoint": 14526770
            },
            "retention_leases": {
              "primary_term": 2,
              "version": 78123,
              "leases": [
                {
                  "id": "peer_recovery/3fQTuJNpQOeCg9k2zQI6Rg",
                  "retaining_seq_no": 14526771,
                  "timestamp": 1670374822751,
                  "source": "peer recovery"
                },
                {
                  "id": "peer_recovery/vGiOGPHoQnSNmndhy2Np1A",
                  "retaining_seq_no": 14526771,
                  "timestamp": 1670374822751,
                  "source": "peer recovery"
                }
              ]
            }
          },
          {
            "commit": {
              "id": "0tJKDmvaz43/5YosYvy9ng==",
              "generation": 683,
              "user_data": {
                "local_checkpoint": "14526770",
                "es_version": "8.4.1",
                "min_retained_seq_no": "14524098",
                "max_seq_no": "14526770",
                "history_uuid": "E-gKvtUtSTS0Ff5ABOv0lQ",
                "max_unsafe_auto_id_timestamp": "-1",
                "translog_uuid": "1rZPK20OQ66Jk7rHtjAoAg"
              },
              "num_docs": 3941107
            },
            "seq_no": {
              "max_seq_no": 14526770,
              "local_checkpoint": 14526770,
              "global_checkpoint": 14526770
            },
            "retention_leases": {
              "primary_term": 2,
              "version": 78123,
              "leases": [
                {
                  "id": "peer_recovery/3fQTuJNpQOeCg9k2zQI6Rg",
                  "retaining_seq_no": 14526771,
                  "timestamp": 1670374822751,
                  "source": "peer recovery"
                },
                {
                  "id": "peer_recovery/vGiOGPHoQnSNmndhy2Np1A",
                  "retaining_seq_no": 14526771,
                  "timestamp": 1670374822751,
                  "source": "peer recovery"
                }
              ]
            }
          }
        ],
        "2": [
          {
            "commit": {
              "id": "wlSNJOgD2Jm4Ms4eMN8nrQ==",
              "generation": 678,
              "user_data": {
                "local_checkpoint": "14375247",
                "min_retained_seq_no": "14370437",
                "es_version": "8.4.1",
                "max_seq_no": "14375247",
                "translog_uuid": "wmeJxF9lSOaUJ0hDigqE1g",
                "max_unsafe_auto_id_timestamp": "-1",
                "history_uuid": "UCaiMhAXR4-zJHNO54tC4A"
              },
              "num_docs": 3940119
            },
            "seq_no": {
              "max_seq_no": 14375247,
              "local_checkpoint": 14375247,
              "global_checkpoint": 14375247
            },
            "retention_leases": {
              "primary_term": 3,
              "version": 78244,
              "leases": [
                {
                  "id": "peer_recovery/3fQTuJNpQOeCg9k2zQI6Rg",
                  "retaining_seq_no": 14375248,
                  "timestamp": 1670374825724,
                  "source": "peer recovery"
                },
                {
                  "id": "peer_recovery/23uocumgThOiaraPXP_JNA",
                  "retaining_seq_no": 14375248,
                  "timestamp": 1670374825724,
                  "source": "peer recovery"
                }
              ]
            }
          },
          {
            "commit": {
              "id": "FOt/dGPF1B6NxwFV1EWNmA==",
              "generation": 678,
              "user_data": {
                "local_checkpoint": "14375247",
                "es_version": "8.4.1",
                "min_retained_seq_no": "14374549",
                "max_seq_no": "14375247",
                "history_uuid": "UCaiMhAXR4-zJHNO54tC4A",
                "max_unsafe_auto_id_timestamp": "-1",
                "translog_uuid": "hefqLf7iQXaY0ryQuLUAGQ"
              },
              "num_docs": 3940119
            },
            "seq_no": {
              "max_seq_no": 14375247,
              "local_checkpoint": 14375247,
              "global_checkpoint": 14375247
            },
            "retention_leases": {
              "primary_term": 3,
              "version": 78244,
              "leases": [
                {
                  "id": "peer_recovery/3fQTuJNpQOeCg9k2zQI6Rg",
                  "retaining_seq_no": 14375248,
                  "timestamp": 1670374825724,
                  "source": "peer recovery"
                },
                {
                  "id": "peer_recovery/23uocumgThOiaraPXP_JNA",
                  "retaining_seq_no": 14375248,
                  "timestamp": 1670374825724,
                  "source": "peer recovery"
                }
              ]
            }
          }
        ]
      }
    }
  }
}

Unfortunately the disk analysis command would not complete due to 504 Gateway Time-out error after about 60_000 milliseconds.

What I can do is create another index with the exactly same mapping and index say 10_000 documents and then run disk analysis. Because I was able to run this command on smaller indexes, actually that's how I found out about _recovery_source field.

nik9000 · 2022-12-07T16:33:31Z

Unfortunately the disk analysis command would not complete due to 504 Gateway Time-out error after about 60_000 milliseconds.

Darn proxy.

Because I was able to run this command on smaller indexes, actually that's how I found out about _recovery_source field.

Watch out - using smaller indices with _disk_usage can show _recovery_source when in a bigger index it'll have been merged away. At least, that's been my experience with smaller indices - mostly because I can make them so fast and the merge process doesn't clean them up until the merge after the replication.

DaveCTurner · 2024-04-16T07:14:22Z

We haven't seen anything to suggest that there's a problem with the logic to remove the _recovery_source field on merge once it's safe to do so. Instead it seems that this issue comes about because we don't today schedule merges (or even just the rewrite of individual segments) in order to clean this field up once it becomes unnecessary. Since that's more of a low-level Lucene-interaction question related to the merge scheduling logic I'm going to relabel this for the attention of the search team.

elasticsearchmachine · 2024-04-16T07:14:53Z

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine · 2024-04-22T13:18:36Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

jtibshirani added >bug :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Jan 14, 2022

elasticmachine added the Team:Distributed Meta label for distributed team (obsolete) label Jan 14, 2022

ruslaniv mentioned this issue Dec 2, 2022

Synthetic source #85649

Merged

DaveCTurner added :Search/Search Search-related issues that do not fall into other categories and removed :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed Meta label for distributed team (obsolete) labels Apr 16, 2024

elasticsearchmachine added the Team:Search Meta label for search team label Apr 16, 2024

javanna added :StorageEngine/Logs You know, for Logs and removed :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team labels Apr 22, 2024

elasticsearchmachine added the Team:StorageEngine label Apr 22, 2024

salvatore-campagna assigned dnhatn Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`_recovery_source` sometimes remains after merge #82595

`_recovery_source` sometimes remains after merge #82595

jtibshirani commented Jan 14, 2022

elasticmachine commented Jan 14, 2022

ruslaniv commented Nov 30, 2022 •

edited

Loading

ruslaniv commented Dec 6, 2022

DaveCTurner commented Dec 6, 2022 •

edited

Loading

ruslaniv commented Dec 7, 2022

nik9000 commented Dec 7, 2022

DaveCTurner commented Apr 16, 2024

elasticsearchmachine commented Apr 16, 2024

elasticsearchmachine commented Apr 22, 2024

_recovery_source sometimes remains after merge #82595

_recovery_source sometimes remains after merge #82595

Comments

jtibshirani commented Jan 14, 2022

elasticmachine commented Jan 14, 2022

ruslaniv commented Nov 30, 2022 • edited Loading

ruslaniv commented Dec 6, 2022

DaveCTurner commented Dec 6, 2022 • edited Loading

ruslaniv commented Dec 7, 2022

nik9000 commented Dec 7, 2022

DaveCTurner commented Apr 16, 2024

elasticsearchmachine commented Apr 16, 2024

elasticsearchmachine commented Apr 22, 2024

`_recovery_source` sometimes remains after merge #82595

`_recovery_source` sometimes remains after merge #82595

ruslaniv commented Nov 30, 2022 •

edited

Loading

DaveCTurner commented Dec 6, 2022 •

edited

Loading