Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(parameters): AppConfigProvider to return the last valid value when the API returns empty value on subsequent calls #1365

Conversation

brnkrygs
Copy link
Contributor

Description of your changes

Fixes #1363

When AppConfig parameters were requested beyond the cache expiration, they were returning empty. This was due to AppConfig GetLatestConfiguration returning an empty response if the active session already has the latest value.

This change introduces a local cache inside the AppConfigProvider that returns the most recent non-empty value in this scenario, matching the behavior of the Python PowerTools Repository.

How to verify this change

I added unit and e2e tests that cover this scenario. The simplest way is to develop a Lambda that uses the Parameters library with the AppConfig Provider, then invoke it a series of times, waiting beyond the maxAge of the configured provider. Confirm that you get the same return value in cold start, within maxAge, and beyond maxAge scenarios.

Related issues, RFCs

Issue number:
#1363

Checklist

  • My changes meet the tenets criteria
  • I have performed a self-review of my own code
  • I have commented my code where necessary, particularly in areas that should be flagged with a TODO, or hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding changes to the examples
  • My changes generate no new warnings
  • The code coverage hasn't decreased
  • I have added tests that prove my change is effective and works
  • New and existing unit tests pass locally and in Github Actions
  • Any dependent changes have been merged and published
  • The PR title follows the conventional commit semantics

Breaking change checklist

Is it a breaking change?: NO

  • I have documented the migration process
  • I have added, implemented necessary warnings (if it can live side by side)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.

brnkrygs added 3 commits March 9, 2023 20:59
Add a local cache to AppConfigProvider to handle the scenario where the maxAge is expired, but the AppConfig session still has the latest configuration. In this case, AppConfig returns an empty value. We don't want to return empty values to our callers.

This uses the same data structure we have in place to track NextPollConfigurationToken.

This approach roughly the Python library's behavior.
Includes update to e2e lambda config type that allows test writers
to specify a test function duration. Needed because to wait for
parameter expiration, we needed a longer-than-default timeout.
@pull-request-size pull-request-size bot added the size/L PRs between 100-499 LOC label Mar 10, 2023
Copy link
Contributor

@dreamorosi dreamorosi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Brian, thank you so much for opening the PR and contributing a fix.

I have left a few comments here and there but essentially I don't think we should add an additional layer of caching and the fix could be as simple as converting the maxAge input to seconds (number * 1000) in the current logic.

@brnkrygs
Copy link
Contributor Author

Hi @dreamorosi, I appreciate the review and feedback.

I can definitely pivot this back to just looking at maxAge ttl calculation if you prefer.

I tried the simple fix and still saw unexpected behavior where until maxAge expired, I received values from the AppConfig Provider, and after it expired I received empty values.

But, maybe this isn't actually unexpected?

You're right that removing the extra cache exposes AppConfig-specific behavior to clients of the library. Maybe that is a good thing.

But as I understood it, part of the purpose of the library is to simplify this exposure, or at least generalize interaction across Parameter providers. If I swap from DynamoDB to AppConfig for example, I'll have to change client code as well and manage my own secondary cache, because they will act very differently when maxAge expires, depending on my AppConfig session lifecycle (which I have no visibility to).

Python implements this secondary cache pattern here, which inspired this approach:
https://github.com/awslabs/aws-lambda-powertools-python/blob/develop/aws_lambda_powertools/utilities/parameters/appconfig.py#L121

@dreamorosi
Copy link
Contributor

Thank you for pushing back on this, after re reading your comment I went back to check the implementation in Python and you're indeed right. The discussion that led to implementing that second cache layer comes from these two messages (here and here).

I now agree with you that we should do the same. However, if you don't mind I'd prefer to keep the PR scoped to the issue at hand, also and in addition to the fact that we are still unsure whether the initial caching is fixed or not.


Once we know that maxAge and the cache behavior in the BaseProvider is fixed, I'm more than open to address this other part. I'll open an issue to track this, and we can add this second caching in a second PR (contribution from you is welcome).

Would this work?

@brnkrygs
Copy link
Contributor Author

Absolutely! I appreciate the careful consideration!

I probably should have broken the changes out in the first place to not confuse the issue, sorry about that.

I'd say we should HOLD on this PR until after we verify maxAge correctness. If we do need a change I'll slim this way down just for that.

Then I'm happy to help with a new PR on a new issue tackling the AppConfig behavior.

@dreamorosi
Copy link
Contributor

As discussed on a quick call, we are going to:

  • Repurpose the original issue & PR to address this second layer of cache
  • We can leave the ttl calculation as-is given it's correct
  • We can add unit tests around ExpirableValue in the same AppConfigProvider test file
  • We refactor the integration test around this new cache layer to have a single log emitted and to use the following (which should work the same):
const param1 = provider.get('hello', { maxAge: 0 });
const param2 = provider.get('hello');

// test that param1 === param2, where param2 is from this second cache
  • Document the new test case in the comments

This reverts commit 3fea0d7.

Further testing shows that the original algorithm was correct.
@shdq
Copy link
Contributor

shdq commented Mar 12, 2023

Hey guys!

I see you've covered the second empty response from AppConfig. It was one of the two open questions for the current implementation. Another one in my notes is whether we should propagate forceFetch and start a new configuration session. Users would expect to get the latest configuration but will get value from the second cache. Should the second cache be pruned in this case?

@brnkrygs
Copy link
Contributor Author

Interesting idea @shdq. It makes sense to me to tackle that, maybe in a separate issue/PR?

It definitely would have implications for what forceFetch means, and what the commitments of the library are.

I do like giving clients a way to completely force-flush out all state, and it makes sense to me to use that parameter to do so.

For what it's worth, I don't see this implemented in the Python repository (unless I'm missing something), so it may need some additional discussion with the wider team.

Drop wait, use maxAge instead.

Add comment for Test case 7.

Fix test case naming (test '4' was skipped on one of the files).
Not needed anymore since we're not waiting for Test 7.
@brnkrygs brnkrygs marked this pull request as draft March 12, 2023 21:06
Doesn't refactor ExpirableValue to take an injection of Date.now, just implements tests based on what we can do with the existing interface.
@brnkrygs brnkrygs changed the title fix(parameters): appconfig provider local cache for maxAge feat(parameters): AppConfigProvider to return the last valid value when the API returns empty value on subsequent calls Mar 12, 2023
@brnkrygs brnkrygs marked this pull request as ready for review March 12, 2023 22:17
@github-actions github-actions bot added the feature PRs that introduce new features or minor changes label Mar 13, 2023
@brnkrygs brnkrygs requested a review from dreamorosi March 13, 2023 12:13
Copy link
Contributor

@dreamorosi dreamorosi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for addressing all the comments.

I think we are very close to getting this in shape to be merged. I have also run the integration tests on your branch and are passing.

I've left a couple of very minor comments and expect this to be the last round of reviews.

@@ -111,4 +111,24 @@ export const handler = async (_event: unknown, _context: Context): Promise<void>
error: err.message
});
}

// Test 7
// get parameter twice, but wait long enough that cache expires count SDK calls and return values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to reword this to reflect that we are no longer using the delay but instead avoiding cache with maxAge: 0.

try {
providerWithMiddleware.clearCache();
middleware.counter = 0;
const expiredResult1 = await providerWithMiddleware.get(freeFormBase64encodedPlainText, { maxAge: 0, transform: 'base64' });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const expiredResult1 = await providerWithMiddleware.get(freeFormBase64encodedPlainText, { maxAge: 0, transform: 'base64' });
const expiredResult1 = await providerWithMiddleware.get(
freeFormBase64encodedPlainText, {
maxAge: 0,
transform: 'base64'
}
);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I've just put the line on multiple lines so that it's not too long

const expiredResult2 = await providerWithMiddleware.get(freeFormBase64encodedPlainText, { maxAge: 0, transform: 'base64' });
logger.log({
test: 'get-expired',
value: middleware.counter, // should be 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance that we can keep the original test/value structure? i.e.

logger.log({
  test: 'get-expired',
  value: {
    counter: middleware.counter,
    results: [
      expiredResult1,
      expiredResult2,
    ]
  },
});

or similar


test('when session returns an empty configuration on the second call, it returns the last value', async () => {

client.reset();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only test using this reset. We should possibly add a afterEach() block to the whole test fixture that does it so that it gets applied after every test.

Maybe we can do this for this file, and then possibly in a future PR we can also apply the change to all other tests under packages/parameters/tests/unit/*.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that works for me. I added it here to be surgical - the resolvesOnce chain was acting weirdly without it. I like the consistency of using an afterEach instead.

@dreamorosi
Copy link
Contributor

I see you've covered the second empty response from AppConfig. It was one of the two open questions for the current implementation.

Indeed, this was an open topic/question and I should take ownership of the fact that I misunderstood both the problem and the question originally. Brian was conducting some internal testing before the imminent beta launch and noticed this behavior. So after he compared the implementation with the one in Python he proposed this PR.

Another one in my notes is whether we should propagate forceFetch and start a new configuration session. Users would expect to get the latest configuration but will get value from the second cache. Should the second cache be pruned in this case?

I think it's an interesting proposal. At the moment I'd like us to focus on launching the utility with feature parity and equivalent behavior with its Python corresponding one. After getting enough usage and feedback we can start iterating and proposing new features.

@brnkrygs brnkrygs requested a review from dreamorosi March 13, 2023 18:53
Copy link
Contributor

@dreamorosi dreamorosi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work Brian, appreciate the time you've put into this.

Let's 🚢 it!

@brnkrygs
Copy link
Contributor Author

Sounds great to me! Happy to help, and appreciate the feedback and patience!

@dreamorosi dreamorosi merged commit 97339d9 into aws-powertools:main Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature PRs that introduce new features or minor changes size/L PRs between 100-499 LOC
Projects
None yet
3 participants