Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for GCS storage to 'solrbackup' #301

Closed
gerlowskija opened this issue Aug 5, 2021 · 5 comments · Fixed by #302
Closed

Add support for GCS storage to 'solrbackup' #301

gerlowskija opened this issue Aug 5, 2021 · 5 comments · Fixed by #302
Labels
Milestone

Comments

@gerlowskija
Copy link
Contributor

Currently the 'solrbackup' resource assumes that users want backups stored "locally" (i.e. stored on a PV or mounted drive using Solr's LocalFileSystemRepository). These local backups can then optionally be "persisted" - which involves compressing them and shipping them to a different PV or S3 bucket.

But no support exists for using other backup destinations that Solr supports natively, such as GCS (as of 8.9).

We should add this support. Users can configure their GCS-backup settings under solrcloud's backupRestoreOptions object, leaving the solrbackup object relatively untouched (except that any "persistence" section on 'solrbackup' would now be ignored, as we can only easily compress files that are stored locally).

@gerlowskija
Copy link
Contributor Author

Some questions this raises:

  • should backup mechanisms be mutually exclusive in a given 'solrcloud' definition, or should we allow users to configure multiple and choose between them in each 'solrbackup'.
  • What's the best way to surface an error if a user attempts to configure GCS backups but is using a solr version that doesn't support that.
  • should the operator attempt to create any missing GCS paths/buckets using the provided credentials, or require everything to be created up front by users.

gerlowskija added a commit to gerlowskija/solr-operator that referenced this issue Aug 5, 2021
This commit adds first-pass support for exposing Solr's
GcsBackupRepository through our operator configuration.  This WIP
support has a number of caveats and downsides:

  - GCS backups eschew the "persistence" step that currently follows
    normal backups
  - GCS backups are only included in Solr 8.9+, but there's no check for
    this currently.
  - operator logic currently assumes that exactly 1 type of backup
    config will be provided on a given solrcloud object (i.e. GCS
    backups and 'local' PV backups are mutually exclusive for a
    solrcloud.
  - no automated tests have been added
  - no documentation of has been added, beyond the examples on issue
    apache#301
@gerlowskija
Copy link
Contributor Author

gerlowskija commented Aug 5, 2021

I've attached a rough PR that shows how this could be done. Below are an example 'solrcloud' and 'solrbackup' that use the proposed functionality:

SolrCloud

apiVersion: solr.apache.org/v1beta1
kind: SolrCloud
metadata:
  name: jasons_cluster
spec:
  dataStorage:
    persistent:
      reclaimPolicy: Delete
      pvcTemplate:
        spec:
          resources:
            requests:
              storage: "5Gi"
    backupRestoreOptions:
      gcsStorage:
        bucket: "solr-log-test"
        gcsCredentialSecret: "my-gcs-secret"
        baseLocation: "logs"
    ...

The most noteworthy addition in this snippet is .Spec.dataStorage.backupRestoreOptions.gcsStorage.gcsCredentialSecret. This required property holds the name of a secret created by the user. This secret must have a key "service-account-key.json" whose value is the user's Google Service Key.

SolrBackup

apiVersion: solr.apache.org/v1beta1
kind: SolrBackup
metadata:
  name: gcs_techproducts_backup
  namespace: default
spec:
  solrCloud: jasons_cluster
  collections:
    - techproducts

(Note that there's no new configuration in 'solrbackup', just the removal of the 'persistence' section for gcs-backups.)

I'm not wedded to these syntaxes by any means - just wanted to get some examples up here as a concrete starting point for discussion.

@HoustonPutman
Copy link
Contributor

So I just want to make sure I understand correctly.

The only thing the operator should do for these "native" backup options, is to call the Solr API right?
(And possibly setup paths at the resulting location if necessary)

I'll need to think on this a bit more, but to me it sounds like the only real benefit would be to have the operator be able to do this on a schedule. (and possibly delete old backups if necessary). So instead of facilitating the backup mechanism, it would just be in charge of managing the backups. The more I type this out, the more I'm starting to like it. It would also allow the Solr operator, in the future, to do automatic-rollbacks if it detects failures in a Collection.

I think we could change the SolrBackup to do either "managed" or "remote" backups, and in the case of remote, let the user provide the repository and location arguments for the Backup command. I do like the idea of requiring users to manage the directories themselves, as we don't really want to build in S3, GCS, HDFS, etc behavior into the operator. But it would make sense if we wanted to support setting up the location to make sure it is ready initially.

So in that case your example of the backupRestoreOptions in the SolrCloud object would be spot on, but the SolrBackup object would need the backupRepositoryName (I guess unless there is a default one specified...), and additional options such as location.

@gerlowskija
Copy link
Contributor Author

gerlowskija commented Aug 10, 2021

The only thing the operator should do for these "native" backup options, is to call the Solr API right?

That's what I'm proposing, yep - the operator wouldn't be doing any of the compression or relocation features for GCS that it currently supports for 'local' backups. It's "just" calling the Solr API. (Which, I'd contend, isn't "nothing". That still saves Ops folks from crafting their own solr.xml, from needing to learn Solr's backup and async-polling APIs, etc.)

the only real benefit would be to have the operator be able to do this on a schedule

Definitely agree. As I said above, I think there's value in this ticket alone. But GCS-support gets much more appealing as the operator's backup featureset generally gets more robust. I love the idea of a "backupschedule" entity that creates individual solrbackup objects in turn. I'll file an issue for that as a placeholder for discussion.

I think we could change the SolrBackup to do either "managed" or "remote" backups

I think I agree with your suggestions here, but let me restate a few of them to make sure I'm understanding you correctly. There's a point or two I'm unclear on.

  1. I see what you're getting at with the "managed" vs "remote" distinction, but I'm not sure whether and where you imagine that appearing in the yaml configs. Are you suggesting an explicit setting on 'solrbackup'? Or that it be implicit based on the value of the 'repository' setting you mention?
  2. Letting users specify a "backupRepositoryName" on their 'solrbackup' makes sense to me. And further it implies that a user should be able to configure multiple sets of backup configs in their solrcloud's backupRestoreOptions setting. (i.e. configuring local backup settings and gcs backup settings aren't mutually exclusive - we'd support use of both within the same solrcloud.)
  3. It seems like you're on the fence about having the operator bootstrap required buckets/locations, and don't have a strong opinion there. I lean towards skipping that bc of the complexity of bringing in S3, GCS, etc. clients to the operator - at least until we get feedback from users that it'd be worth it, but also have mixed feelings about it.

So taking those suggestions, our new example CRDs would look something like:

solrcloud

...
  dataStorage:
    ...
    backupRestoreOptions:
      gcsRepositories:
        - name: "customer-data"
           bucket: "customer-data-bucket"
           gcsCredentialSecret: "my-gcs-secret"
           defaultLocation: "customer_data"
      managedRepositories:
        - name: "local-log-data"
           volume:
             persistentVolumeClaim:
               claimName: "log-pvc"
    ...

solrbackup (gcs)

apiVersion: solr.apache.org/v1beta1
kind: SolrBackup
metadata:
  name: gcs_techproducts_backup
  namespace: default
spec:
  solrCloud: jasons_cluster
  repository: "customer-data"
  location: "customer_data_alt"
  collections:
    - techproducts

solrbackup (local)

apiVersion: solr.apache.org/v1beta1
kind: SolrBackup
metadata:
  name: local_techproducts_backup
  namespace: default
spec:
  solrCloud: jasons_cluster
  repository: "local-log-data"
  location: "logs_alt"
  persistence:  // Ignored if 'my-local' repository isn't of a type that supports "managed" backups. (i.e. type=local)
    volume:
      source:
        persistentVolumeClaim:
          claimName: "pvc-test"
  collections:
    - applicationlogs

Those could be off a bit based on what you meant about exposing the "managed" v. "remote" flag. Does this look closer to what you were thinking? @HoustonPutman

@gerlowskija
Copy link
Contributor Author

Two additional notes:

  1. If solrcloud's backupRestoreOptions changes to allow configuring multiple backup repositories, how should backcompat be handled? The README mentions that we don't make any strict guarantees being an 0.x release, but I wasn't sure whether we still tried to avoid that or not in practice.
  2. I created a ticket for backupschedules here: Add scheduling support for backups #303

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants