Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.24.0 conftest regression: invalid character 't' after top-level value #3438

Closed
jrobison-sb opened this issue May 23, 2023 · 24 comments
Closed
Labels
bug Something isn't working conftest-policy never-stale regression Bug introduced in a new version

Comments

@jrobison-sb
Copy link

jrobison-sb commented May 23, 2023

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

After upgrading to v0.24.0, conftest policy checking results in this error: invalid character 't' after top-level value. Policy checking worked fine in v0.23.5 and lower.

Screenshot 2023-05-23 at 9 46 17 AM

Reproduction Steps

workflows:
  init-plan-apply:
    plan:
      steps:
        - init
        - plan:
            extra_args: ["-lock-timeout=300s"]
    policy_check:
      steps:
        - show # this is the equivalent of `terraform show -json $PLANFILE > $SHOWFILE`
        - run: >
            conftest test -p ../../modules/aws/application/policies/ --all-namespaces $SHOWFILE
    apply:
      steps:
        - init
        - show # this is the equivalent of `terraform show -json $PLANFILE > $SHOWFILE`
        - run: >
            conftest test -p ../../modules/aws/application/policies/ --all-namespaces $SHOWFILE
        - apply:
            extra_args: ["-lock-timeout=300s"]

Logs

{
  "caller": "terraform/terraform_client.go:422",
  "json": {
    "pull": "4478",
    "repo": "SOME_COMPANY/devops"
  },
  "level": "info",
  "msg": "Successfully ran \"/usr/local/bin/terraform show -json /home/atlantis/.atlantis/repos/SOME_COMPANY/devops/4478/default/terraform/SOME_COMPANY/environments/THIS_ENVIRONMENT/THIS_ENVIRONMENT-default.tfplan\" in \"/home/atlantis/.atlantis/repos/SOME_COMPANY/devops/4478/default/terraform/SOME_COMPANY/environments/THIS_ENVIRONMENT\"",
  "ts": "2023-05-23T13:40:21.563Z"
}
{
  "caller": "events/instrumented_project_command_runner.go:77",
  "json": {
    "pull": "4478",
    "repo": "SOME_COMPANY/devops"
  },
  "level": "error",
  "msg": "Error running policy check operation: invalid character 't' after top-level value",
  "stacktrace": "github.com/runatlantis/atlantis/server/events.RunAndEmitStats\n\tgithub.com/runatlantis/atlantis/server/events/instrumented_project_command_runner.go:77\ngithub.com/runatlantis/atlantis/server/events.(*InstrumentedProjectCommandRunner).PolicyCheck\n\tgithub.com/runatlantis/atlantis/server/events/instrumented_project_command_runner.go:42\ngithub.com/runatlantis/atlantis/server/events.runProjectCmds\n\tgithub.com/runatlantis/atlantis/server/events/project_command_pool_executor.go:48\ngithub.com/runatlantis/atlantis/server/events.(*PolicyCheckCommandRunner).Run\n\tgithub.com/runatlantis/atlantis/server/events/policy_check_command_runner.go:65\ngithub.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).run\n\tgithub.com/runatlantis/atlantis/server/events/plan_command_runner.go:282\ngithub.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).Run\n\tgithub.com/runatlantis/atlantis/server/events/plan_command_runner.go:290\ngithub.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunCommentCommand\n\tgithub.com/runatlantis/atlantis/server/events/command_runner.go:298",
  "ts": "2023-05-23T13:40:22.944Z"
}
{
  "caller": "models/shell_command_runner.go:156",
  "json": {
    "pull": "4478",
    "repo": "SOME_COMPANY/devops"
  },
  "level": "info",
  "msg": "successfully ran \"conftest test -p ../../modules/aws/application/policies/ --all-namespaces $SHOWFILE\\n\" in \"/home/atlantis/.atlantis/repos/SOME_COMPANY/devops/4478/default/terraform/SOME_COMPANY/environments/THIS_ENVIRONMENT\"",
  "ts": "2023-05-23T13:40:22.944Z"
}

Environment details

If not already included, please provide the following:

  • Atlantis version: v0.24.0 (and v0.23.5 works fine)
  • Deployment method: AWS ECS (Fargate)
  • If not running the latest Atlantis version have you tried to reproduce this issue on the latest version: NA
  • Atlantis flags: We don't specify any command for running the Atlantis server, so our usage of Atlantis would inherit whatever the default ENTRYPOINT is in the Docker image.
@jrobison-sb jrobison-sb added the bug Something isn't working label May 23, 2023
@nitrocode nitrocode added the regression Bug introduced in a new version label May 23, 2023
@rkstrickland
Copy link
Contributor

Fairly confident that this error block is being hit: https://github.com/runatlantis/atlantis/blob/main/server/events/project_command_runner.go#L467

@jrobison-sb
Copy link
Author

I was able to unblock this with a big assist from @rkstrickland on Slack.

I added a policy_sets list to my ATLANTIS_REPO_CONFIG_JSON with a dummy path which doesn't contain any real policies at all:

        {
          name = "ATLANTIS_REPO_CONFIG_JSON",
          value = jsonencode({
            policies = {
              owners = {
                users = [...],
              },
              policy_sets = [
                {
                  name   = "policies"
                  path   = "/tmp" # This is a dummy path, the actual policies are specified in atlantis.yml in each repository
                  source = "local"
                }
              ]
            },
...

Then in atlantis.yml in each repo which uses Atlantis, I got rid of running conftest directly as a Linux command and replaced it with a proper policy_check step:

    policy_check:
      steps:
        - show # this is the equivalent of `terraform show -json $PLANFILE > $SHOWFILE`
        - policy_check:
            extra_args:
              - "-p ../../modules/aws/application/policies/"
              - "--all-namespaces"

I'm unblocked now so this issue no longer directly affects me. But I'll leave this ticket open in case @rkstrickland or others consider this to be actionable. Ya'll can close this if there is no value in fixing/documenting this on the behalf of the wider user community.

@albertorm95
Copy link
Contributor

albertorm95 commented May 24, 2023

I want to verify that my metric fix is not the one causing this issue.
This is my first commit: 96cf388

The bug was introduced in:

{
    "level": "error",
    "ts": "2023-05-24T15:22:09.013+0200",
    "caller": "events/instrumented_project_command_runner.go:77",
    "msg": "Error running policy check operation: unexpected end of JSON input",
    "json": {
        "repo": "foo/atlantis-test",
        "pull": "7"
    },
    "stacktrace": "github.com/runatlantis/atlantis/server/events.RunAndEmitStats\n\t/Users/alberto.rojas/Projects/Personal/atlantis/server/events/instrumented_project_command_runner.go:77\ngithub.com/runatlantis/atlantis/server/events.(*InstrumentedProjectCommandRunner).PolicyCheck\n\t/Users/alberto.rojas/Projects/Personal/atlantis/server/events/instrumented_project_command_runner.go:42\ngithub.com/runatlantis/atlantis/server/events.runProjectCmds\n\t/Users/alberto.rojas/Projects/Personal/atlantis/server/events/project_command_pool_executor.go:48\ngithub.com/runatlantis/atlantis/server/events.(*PolicyCheckCommandRunner).Run\n\t/Users/alberto.rojas/Projects/Personal/atlantis/server/events/policy_check_command_runner.go:65\ngithub.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).run\n\t/Users/alberto.rojas/Projects/Personal/atlantis/server/events/plan_command_runner.go:282\ngithub.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).Run\n\t/Users/alberto.rojas/Projects/Personal/atlantis/server/events/plan_command_runner.go:290\ngithub.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunCommentCommand\n\t/Users/alberto.rojas/Projects/Personal/atlantis/server/events/command_runner.go:297"
}
github.com/runatlantis/atlantis/server/events.RunAndEmitStats
    atlantis/server/events/instrumented_project_command_runner.go:77
github.com/runatlantis/atlantis/server/events.(*InstrumentedProjectCommandRunner).PolicyCheck
    atlantis/server/events/instrumented_project_command_runner.go:42
github.com/runatlantis/atlantis/server/events.runProjectCmds
    atlantis/server/events/project_command_pool_executor.go:48
github.com/runatlantis/atlantis/server/events.(*PolicyCheckCommandRunner).Run
    atlantis/server/events/policy_check_command_runner.go:65
github.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).run
    atlantis/server/events/plan_command_runner.go:282
github.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).Run
    atlantis/server/events/plan_command_runner.go:290
github.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunCommentCommand
    atlantis/server/events/command_runner.go:297

It doesn't show up in d88857b which is the previous commit.

@jamengual
Copy link
Contributor

@pseudomorph any change you can look at this?

@pseudomorph
Copy link
Contributor

@jamengual - That's right. The metric fix is not the cause of this issue.

The root of this boils down to the fact that changes introduced with granular policies depend upon data being passed between the internal conftest step runner and the project command runner. When conftest is invoked using a custom run command, this falls apart.

I think it would be difficult to account for both cases where conftest is invoked directly, and where atlantis supports granular policies. Also, I'm not sure how common a direct invocation pattern is for most use cases.

I wanted to open a conversation about the best path forward here. We could try to account for both scenarios, or put an opinion into Atlantis that conftest cannot be invoked outside of the step runner for check_policies.

Thoughts?

@jamengual
Copy link
Contributor

@nitrocode @GenPage what do you guys think?

@nitrocode
Copy link
Member

This isn't a feature i have used before so i defer to the subject matter experts on this

@nitrocode
Copy link
Member

It would be nice to have a solution for now that can get around this error and a solution for later for a better implementation for both use cases

@pseudomorph
Copy link
Contributor

We were able to solve for the previous use-case without relying on the custom invocation of conftest. So, I'm not sure there is any immediate issue to solve here, other than looking at aesthetics and/or a long-term plan. The main driver for the previous custom invocation is due to policies residing in-line with the repo code itself. Again, I'm not sure how common of a pattern this is.

The major drawback of the current workaround is having to specify a dummy directory for the policy sets which are conditionally enabled per-repo. This would be less of an issue if policies resided in a separate repo/package and namespaced for their respective repos/use cases. The namespaces could then be configured as extra args directly in the workflows.

I could see it as possible to enforce not allowing a custom run invocation of conftest for check_policies if policy set configuration was made a bit more flexible. Perhaps add per-repo enablement or whitelist/blacklist on policy sets, and clear up documentation to convey that relpaths can be used for in-line policies, while highlighting the security and other implications of doing so. This could solve for use cases like this one.

@albertorm95
Copy link
Contributor

albertorm95 commented May 30, 2023

This is still failing for us :(
It looks to be a race condition since it works sometimes

server-side:

policies:
  conftest_version: 0.38.0
  owners:
    users:
      - albertorm95
  policy_sets:
    - name: policya
      path: /home/atlantis/policya.rego
      source: local
    - name: policyb
      path: /home/atlantis/policyb.rego
      source: local
workflows:
  test:
        - init
        - run: date
        - plan:
            extra_args:
              - '-var-file=vars/${ENV_NAME}.tfvars'
              - '-var=sts_session_name=${AWS_ROLE_SESSION_NAME}'
              - '-var=created_by=${CREATED_BY}'
        - run: date
    policy_check:
      steps:
        - show
        - policy_check
        - run: date
{
    "level": "error",
    "ts": "2023-05-30T13:00:55.749Z",
    "caller": "events/instrumented_project_command_runner.go:78",
    "msg": "Error running policy_check operation: invalid character 'T' after top-level value",
    "json": {
        "repo": "test/foo",
        "pull": "1234"
    },
    "stacktrace": "github.com/runatlantis/atlantis/server/events.RunAndEmitStats\n\tgithub.com/runatlantis/atlantis/server/events/instrumented_project_command_runner.go:78\ngithub.com/runatlantis/atlantis/server/events.(*InstrumentedProjectCommandRunner).PolicyCheck\n\tgithub.com/runatlantis/atlantis/server/events/instrumented_project_command_runner.go:42\ngithub.com/runatlantis/atlantis/server/events.runProjectCmds\n\tgithub.com/runatlantis/atlantis/server/events/project_command_pool_executor.go:48\ngithub.com/runatlantis/atlantis/server/events.(*PolicyCheckCommandRunner).Run\n\tgithub.com/runatlantis/atlantis/server/events/policy_check_command_runner.go:65\ngithub.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).runAutoplan\n\tgithub.com/runatlantis/atlantis/server/events/plan_command_runner.go:165\ngithub.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).Run\n\tgithub.com/runatlantis/atlantis/server/events/plan_command_runner.go:288\ngithub.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunAutoplanCommand\n\tgithub.com/runatlantis/atlantis/server/events/command_runner.go:174"
}

@pseudomorph
Copy link
Contributor

@albertorm95 - This one makes sense. The custom run command at the end is appending output to json string which is used to send data between conftest and the doPolicyCheck step. I understand the desire to be able to add custom information which is visible in the PR output. Though, I'm not sure what exactly the best path forward for addressing this is.

If I understand correctly, you're trying to observe how long each workflow step takes. Is this something that could be achieved through log parsing?

@GenPage @jamengual @nitrocode - There seem to be a couple usage patterns here which I failed to account for in crafting this feature. I'm not sure what the best path forward is. Addressing some of these likely invoke a re-thinking of how this is implemented or how the project command runner works. Are any of you available in the next few days to discuss on a video call? We can of course continue async.

@albertorm95
Copy link
Contributor

albertorm95 commented May 31, 2023

@pseudomorph we are now facing this error: we are using v0.24.2

{
    "level": "error",
    "ts": "2023-05-31T12:09:53.791Z",
    "caller": "events/instrumented_project_command_runner.go:78",
    "msg": "Error running policy_check operation: unexpected end of JSON input",
    "json": {
        "repo": "foo/test",
        "pull": "1234"
    },
    "stacktrace": "github.com/runatlantis/atlantis/server/events.RunAndEmitStats
        github.com/runatlantis/atlantis/server/events/instrumented_project_command_runner.go:78
    github.com/runatlantis/atlantis/server/events.(*InstrumentedProjectCommandRunner).PolicyCheck
        github.com/runatlantis/atlantis/server/events/instrumented_project_command_runner.go:42
    github.com/runatlantis/atlantis/server/events.runProjectCmds
        github.com/runatlantis/atlantis/server/events/project_command_pool_executor.go:48
    github.com/runatlantis/atlantis/server/events.(*PolicyCheckCommandRunner).Run
        github.com/runatlantis/atlantis/server/events/policy_check_command_runner.go:65
    github.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).runAutoplan
        github.com/runatlantis/atlantis/server/events/plan_command_runner.go:165
    github.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).Run
        github.com/runatlantis/atlantis/server/events/plan_command_runner.go:288
    github.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunAutoplanCommand
        github.com/runatlantis/atlantis/server/events/command_runner.go:174"
}

This show up after:
terraform0.12.31 show -json /home/atlantis/.atlantis/repos/foo/test/1234/default/test/value.tfplan

Which generates a valid .json plan with the path and name of:
/home/atlantis/.atlantis/repos/foo/test/1234/default/test/ value.json

(repo-side config) The workflow running this was:

    test:
        plan:
            steps:
                - run: python3 code_that_runs_tfplan.py
        policy_check:
            steps:
                - show
                - policy_check:
                    extra_args: ["--update", "[email protected]:foo/test2.git//atlantis/policies"]

server-side config:

policies:
    owners:
      users:
        - albertorm95

We fixed adding the policy_sets instead of only relying on extra_args: ["--update", "[email protected]:foo/test2.git//atlantis/policies"]
server-side config:

policies:
    owners:
      users:
        - albertorm95
  policy_sets:
    - name: test
      path: /home/atlantis/test.rego
      source: local

@pseudomorph
Copy link
Contributor

@albertorm95 - The bugfix in 24.2 only addressed a panic due to a separate issue. It would not have addressed issues with additional content being added to the output from conftest.

In the above example, are there still custom run workflow steps following policy_check?

@albertorm95
Copy link
Contributor

@albertorm95 - The bugfix in 24.2 only addressed a panic due to a separate issue. It would not have addressed issues with additional content being added to the output from conftest.

In the above example, are there still custom run workflow steps following policy_check?

Hello, not really, the following was the apply stage, so looks like you can't have a empty policy_set if you want to use policy_check stage/step in the workflow.

Our scenario was, no policy_set in the server-side config but instead pull custom policies from an external repo as shown in the extra_args

@albertorm95
Copy link
Contributor

Hello, me again.

we have another scenario :awesome:

        policy_check:
            steps:
                - run: >
                    if [ ! -f "$PLANFILE" ]; then
                        echo "PLAN FAILED."
                        exit 1
                    fi
                - show
                - policy_check

We have a process where we always want the plan to succeed,
When the plan process fails (there is not plan file) we are checking if the $PLANFILE does not exists and if thats the case then we echo something and make the process exit

Even tho we are exit 1 the process we get this message from the policy_check:

Policy Check Error

unexpected end of JSON input

Before v0.24.0:
Policy Check Error

running "if [ ! -f \"$PLANFILE\" ]; then\n    echo \"PLAN FAILED\"\n    exit 1\nfi\n" in "/home/atlantis/.atlantis/repos/foo/test/1234/default/test_dir": exit status 1: running "if [ ! -f \"$PLANFILE\" ]; then\n    echo \"PLAN FAILED\"\n    exit 1\nfi\n" in "/home/atlantis/.atlantis/repos/foo/test/1234/default/test_dir": 
PLAN FAILED


@nitrocode
Copy link
Member

We might need to revert the prs if this is still an issue

@pseudomorph
Copy link
Contributor

@nitrocode - I think I can put together a quick fix for output not getting parsed when custom run steps are used.

Further discussion is needed for whether custom conftest invocations should be supported, and if so, how.

@Banders2
Copy link

We are having a similar issue to @albertorm95 in 0.24.2, where we are using custom policy_check and getting following stacktrace:

{
    "level": "error",
    "ts": "2023-06-13T11:39:16.320Z",
    "caller": "events/instrumented_project_command_runner.go:78",
    "msg": "Error running policy_check operation: invalid character 'D' looking for beginning of value",
    "json": {
        "repo": "Main/XXX/YYYY",
        "pull": "10000"
    },
    "stacktrace":  "
github.com/runatlantis/atlantis/server/events.RunAndEmitStats
  /usr/src/app/server/events/instrumented_project_command_runner.go:78
github.com/runatlantis/atlantis/server/events.(*InstrumentedProjectCommandRunner).PolicyCheck
  /usr/src/app/server/events/instrumented_project_command_runner.go:42
github.com/runatlantis/atlantis/server/events.runProjectCmds
  /usr/src/app/server/events/project_command_pool_executor.go:48
github.com/runatlantis/atlantis/server/events.(*PolicyCheckCommandRunner).Run
  /usr/src/app/server/events/policy_check_command_runner.go:65
github.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).run
  /usr/src/app/server/events/plan_command_runner.go:282
github.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).Run
  /usr/src/app/server/events/plan_command_runner.go:290
github.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunCommentCommand
  /usr/src/app/server/events/command_runner.go:298
"
}

With the following atlantis server config setup:

policies:
  owners:
  policy_sets:
    - name: opa-policies
      path: policy
      source: local

workflows:
  terraform-atlantis-infra:
    policy_check:
      steps:
        - show
        - policy_check:
            extra_args: ["--update", "oci://opa.xxxxxx.net/opa", "--all-namespaces"]

@nitrocode
Copy link
Member

Thank you @Banders2 for showing another example scenario

@pseudomorph since we want to take a step back to figure out how to best implement this feature, we have 2 options

  1. Keep existing changes and keep applying fixes for each of the use cases in this thread, prior to the next release
  2. Revert feat(policies): Add granular policy_sets #3086 (if this is possible since a few policy prs went in) and rethink on the implementation with each of the above scenarios in mind

If we go with option 1 then we'd need some prs to go in within a certain timeframe. If that's not possible then we need to look into option 2 as this is affecting a lot of users.

cc @jamengual @GenPage

@pseudomorph
Copy link
Contributor

@Banders2 - Does this happen on every policy run? Could this be an issue where the show step fails? If so, the linked fix I'm working on should address the issue.

@nitrocode - At least one of these issues stems from output from other steps not being parsed into JSON separately. I have a fix linked here which just needs some unit tests. It currently only has e2e tests, and I'm unsure of how to implement the unit tests.

@pseudomorph
Copy link
Contributor

With regard to the other matter in this thread - what to do about running Conftest as a custom run step, rather than with policy_check, perhaps we can discuss during the office hours tomorrow?

@Banders2
Copy link

@pseudomorph - Yes, it happens on every run.

I just tested out your branch and that does indeed solve our issue running with above custom policy_check.
It also blocks the 2*apply bug so users can no longer bypass conftest, which was our primary reason for upgrading.

In below screenshots "OPA policies can be found here: xxxxx" this is from run echo command between show and policy_check, so that also looks perfect from my point of view.

Passing:
image
Failing:
image

@GenPage
Copy link
Member

GenPage commented Aug 1, 2023

@pseudomorph Ping me on slack and let's setup a call to talk through this. Idk how I was not getting notified of pings on this but we definitely need to talk this out so that this regression is mitigated.

@GenPage
Copy link
Member

GenPage commented Aug 7, 2023

Should be resolved by #3502

@GenPage GenPage closed this as completed Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working conftest-policy never-stale regression Bug introduced in a new version
Projects
None yet
Development

No branches or pull requests

8 participants