Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] -TypeError: Object of type KeyValueDict is not JSON serializable #2819

Open
mcg1969 opened this issue Nov 1, 2024 · 7 comments
Open
Labels
area: nebari-cli area: schema needs: investigation 🔍 Someone in the team needs to find the root cause and replicate this bug provider: Existing type: bug 🐛 Something isn't working

Comments

@mcg1969
Copy link

mcg1969 commented Nov 1, 2024

Describe the bug

Attempting to do a nebari deploy on an existing k3s cluster. I had a separate issue with the Traefik CRDs that I will raise separately. But once I get past that, I see this:

[terraform]: After stage=03-kubernetes-initialize kubernetes initialized successfully
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/_nebari/subcommands/deploy.py:92 │
│ in deploy                                                                                        │
│                                                                                                  │
│   89 │   │   │   msg = "Digital Ocean support is currently being deprecated and will be remov    │
│   90 │   │   │   typer.confirm(msg)                                                              │
│   91 │   │                                                                                       │
│ ❱ 92 │   │   deploy_configuration(                                                               │
│   93 │   │   │   config,                                                                         │
│   94 │   │   │   stages,                                                                         │
│   95 │   │   │   disable_prompt=disable_prompt,                                                  │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/_nebari/deploy.py:53 in          │
│ deploy_configuration                                                                             │
│                                                                                                  │
│   50 │   │   with contextlib.ExitStack() as stack:                                               │
│   51 │   │   │   for stage in stages:                                                            │
│   52 │   │   │   │   s = stage(output_directory=pathlib.Path.cwd(), config=config)               │
│ ❱ 53 │   │   │   │   stack.enter_context(s.deploy(stage_outputs, disable_prompt))                │
│   54 │   │   │   │                                                                               │
│   55 │   │   │   │   if not disable_checks:                                                      │
│   56 │   │   │   │   │   s.check(stage_outputs, disable_prompt)                                  │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/contextlib.py:526 in enter_context             │
│                                                                                                  │
│   523 │   │   except AttributeError:                                                             │
│   524 │   │   │   raise TypeError(f"'{cls.__module__}.{cls.__qualname__}' object does "          │
│   525 │   │   │   │   │   │   │   f"not support the context manager protocol") from None         │
│ ❱ 526 │   │   result = _enter(cm)                                                                │
│   527 │   │   self._push_cm_exit(cm, _exit)                                                      │
│   528 │   │   return result                                                                      │
│   529                                                                                            │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/contextlib.py:137 in __enter__                 │
│                                                                                                  │
│   134 │   │   # they are only needed for recreation, which is not possible anymore               │
│   135 │   │   del self.args, self.kwds, self.func                                                │
│   136 │   │   try:                                                                               │
│ ❱ 137 │   │   │   return next(self.gen)                                                          │
│   138 │   │   except StopIteration:                                                              │
│   139 │   │   │   raise RuntimeError("generator didn't yield") from None                         │
│   140                                                                                            │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/_nebari/stages/base.py:72 in     │
│ deploy                                                                                           │
│                                                                                                  │
│    69 │   │   │   deploy_config["terraform_import"] = True                                       │
│    70 │   │   │   deploy_config["state_imports"] = state_imports                                 │
│    71 │   │                                                                                      │
│ ❱  72 │   │   self.set_outputs(stage_outputs, terraform.deploy(**deploy_config))                 │
│    73 │   │   self.post_deploy(stage_outputs, disable_prompt)                                    │
│    74 │   │   yield                                                                              │
│    75                                                                                            │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/_nebari/provider/terraform.py:59 │
│ in deploy                                                                                        │
│                                                                                                  │
│    56 │   │   mode="w", encoding="utf-8", suffix=".tfvars.json"                                  │
│    57 │   ) as f:                                                                                │
│    58 │   │   print("INPUT_VARS:", input_vars)                                                   │
│ ❱  59 │   │   json.dump(input_vars, f.file)                                                      │
│    60 │   │   f.file.flush()                                                                     │
│    61 │   │                                                                                      │
│    62 │   │   if terraform_init:                                                                 │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/json/__init__.py:179 in dump                   │
│                                                                                                  │
│   176 │   │   │   default=default, sort_keys=sort_keys, **kw).iterencode(obj)                    │
│   177 │   # could accelerate with writelines in some versions of Python, at                      │
│   178 │   # a debuggability cost                                                                 │
│ ❱ 179 │   for chunk in iterable:                                                                 │
│   180 │   │   fp.write(chunk)                                                                    │
│   181                                                                                            │
│   182                                                                                            │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/json/encoder.py:432 in _iterencode             │
│                                                                                                  │
│   429 │   │   elif isinstance(o, (list, tuple)):                                                 │
│   430 │   │   │   yield from _iterencode_list(o, _current_indent_level)                          │
│   431 │   │   elif isinstance(o, dict):                                                          │
│ ❱ 432 │   │   │   yield from _iterencode_dict(o, _current_indent_level)                          │
│   433 │   │   else:                                                                              │
│   434 │   │   │   if markers is not None:                                                        │
│   435 │   │   │   │   markerid = id(o)                                                           │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/json/encoder.py:406 in _iterencode_dict        │
│                                                                                                  │
│   403 │   │   │   │   │   chunks = _iterencode_dict(value, _current_indent_level)                │
│   404 │   │   │   │   else:                                                                      │
│   405 │   │   │   │   │   chunks = _iterencode(value, _current_indent_level)                     │
│ ❱ 406 │   │   │   │   yield from chunks                                                          │
│   407 │   │   if newline_indent is not None:                                                     │
│   408 │   │   │   _current_indent_level -= 1                                                     │
│   409 │   │   │   yield '\n' + _indent * _current_indent_level                                   │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/json/encoder.py:406 in _iterencode_dict        │
│                                                                                                  │
│   403 │   │   │   │   │   chunks = _iterencode_dict(value, _current_indent_level)                │
│   404 │   │   │   │   else:                                                                      │
│   405 │   │   │   │   │   chunks = _iterencode(value, _current_indent_level)                     │
│ ❱ 406 │   │   │   │   yield from chunks                                                          │
│   407 │   │   if newline_indent is not None:                                                     │
│   408 │   │   │   _current_indent_level -= 1                                                     │
│   409 │   │   │   yield '\n' + _indent * _current_indent_level                                   │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/json/encoder.py:439 in _iterencode             │
│                                                                                                  │
│   436 │   │   │   │   if markerid in markers:                                                    │
│   437 │   │   │   │   │   raise ValueError("Circular reference detected")                        │
│   438 │   │   │   │   markers[markerid] = o                                                      │
│ ❱ 439 │   │   │   o = _default(o)                                                                │
│   440 │   │   │   yield from _iterencode(o, _current_indent_level)                               │
│   441 │   │   │   if markers is not None:                                                        │
│   442 │   │   │   │   del markers[markerid]                                                      │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/json/encoder.py:180 in default                 │
│                                                                                                  │
│   177 │   │   │   │   return super().default(o)                                                  │
│   178 │   │                                                                                      │
│   179 │   │   """                                                                                │
│ ❱ 180 │   │   raise TypeError(f'Object of type {o.__class__.__name__} '                          │
│   181 │   │   │   │   │   │   f'is not JSON serializable')                                       │
│   182 │                                                                                          │
│   183 │   def encode(self, o):                                                                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

I hacked the terraform.py module to see what JSON was struggling with; it is this dictionary, with the KeyValueDict objects.

{'traefik-image': {'image': 'traefik', 'tag': '2.9.1'}, 'name': 'k3s', 'environment': 'nebari', 'node_groups': {'general': KeyValueDict(key='kubernetes.io/os', value='linux'), 'user': KeyValueDict(key='kubernetes.io/os', value='linux'), 'worker': KeyValueDict(key='kubernetes.io/os', value='linux')}, 'certificate-service': <CertificateEnum.selfsigned: 'self-signed'>}

Those were generated by nebari init though! Here is the existing section of the config yaml:

existing:
  kube_context: default
  node_selectors:
    general:
      key: kubernetes.io/os
      value: linux
    user:
      key: kubernetes.io/os
      value: linux
    worker:
      key: kubernetes.io/os
      value: linux

Expected behavior

It should make it through this stage without this error.

OS and architecture in which you are running Nebari

centos stream 8

How to Reproduce the problem?

installed a stock version of k3s. In order to get to this stage, I had to remove some of the Traefik CRDs that k3s installs for me, because they conflict with some that Terraform is trying to install. But once I let Terraform handle those, I was able to get to this point

Command output

nebari deploy -c nebari-config.yaml


### Versions and dependencies used.

conda 24.9.2
Client Version: v1.30.3+k3s1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.3+k3s1
Nebari version 2024.7.1

### Compute environment

None

### Integrations

_No response_

### Anything else?

_No response_
@mcg1969 mcg1969 added needs: triage 🚦 Someone needs to have a look at this issue and triage type: bug 🐛 Something isn't working labels Nov 1, 2024
@mcg1969
Copy link
Author

mcg1969 commented Nov 2, 2024

I hacked around that particular issue by editing _nebari/provider/terraform.py and creating a simple function to convert Pydantic objects to a dict:

def _to_dict(sd):
    if isinstance(sd, dict):
        return {k: _to_dict(v) for k, v in sd.items()}
    elif isinstance(sd, (list, tuple)):
        return [_to_dict(v) for v in sd]
    elif hasattr(sd, 'model_dump'):
        return sd.model_dump()
    else:
        return sd

Then used that to wrap the input to json.dump:

    with tempfile.NamedTemporaryFile(
        mode="w", encoding="utf-8", suffix=".tfvars.json"
    ) as f:
        json.dump(_to_dict(input_vars), f.file)
        f.file.flush()

That got me farther. However, it ended up failing later in the deployment process with a very similar issue:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/_nebari/subcommands/deploy.py:92 │
│ in deploy                                                                                        │
│                                                                                                  │
│   89 │   │   │   msg = "Digital Ocean support is currently being deprecated and will be remov    │
│   90 │   │   │   typer.confirm(msg)                                                              │
│   91 │   │                                                                                       │
│ ❱ 92 │   │   deploy_configuration(                                                               │
│   93 │   │   │   config,                                                                         │
│   94 │   │   │   stages,                                                                         │
│   95 │   │   │   disable_prompt=disable_prompt,                                                  │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/_nebari/deploy.py:53 in          │
│ deploy_configuration                                                                             │
│                                                                                                  │
│   50 │   │   with contextlib.ExitStack() as stack:                                               │
│   51 │   │   │   for stage in stages:                                                            │
│   52 │   │   │   │   s = stage(output_directory=pathlib.Path.cwd(), config=config)               │
│ ❱ 53 │   │   │   │   stack.enter_context(s.deploy(stage_outputs, disable_prompt))                │
│   54 │   │   │   │                                                                               │
│   55 │   │   │   │   if not disable_checks:                                                      │
│   56 │   │   │   │   │   s.check(stage_outputs, disable_prompt)                                  │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/contextlib.py:526 in enter_context             │
│                                                                                                  │
│   523 │   │   except AttributeError:                                                             │
│   524 │   │   │   raise TypeError(f"'{cls.__module__}.{cls.__qualname__}' object does "          │
│   525 │   │   │   │   │   │   │   f"not support the context manager protocol") from None         │
│ ❱ 526 │   │   result = _enter(cm)                                                                │
│   527 │   │   self._push_cm_exit(cm, _exit)                                                      │
│   528 │   │   return result                                                                      │
│   529                                                                                            │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/contextlib.py:137 in __enter__                 │
│                                                                                                  │
│   134 │   │   # they are only needed for recreation, which is not possible anymore               │
│   135 │   │   del self.args, self.kwds, self.func                                                │
│   136 │   │   try:                                                                               │
│ ❱ 137 │   │   │   return next(self.gen)                                                          │
│   138 │   │   except StopIteration:                                                              │
│   139 │   │   │   raise RuntimeError("generator didn't yield") from None                         │
│   140                                                                                            │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/_nebari/stages/kubernetes_keyclo │
│ ak/__init__.py:302 in deploy                                                                     │
│                                                                                                  │
│   299 │   def deploy(                                                                            │
│   300 │   │   self, stage_outputs: Dict[str, Dict[str, Any]], disable_prompt: bool = False       │
│   301 │   ):                                                                                     │
│ ❱ 302 │   │   with super().deploy(stage_outputs, disable_prompt):                                │
│   303 │   │   │   with keycloak_provider_context(                                                │
│   304 │   │   │   │   stage_outputs["stages/" + self.name]["keycloak_credentials"]["value"]      │
│   305 │   │   │   ):                                                                             │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/contextlib.py:137 in __enter__                 │
│                                                                                                  │
│   134 │   │   # they are only needed for recreation, which is not possible anymore               │
│   135 │   │   del self.args, self.kwds, self.func                                                │
│   136 │   │   try:                                                                               │
│ ❱ 137 │   │   │   return next(self.gen)                                                          │
│   138 │   │   except StopIteration:                                                              │
│   139 │   │   │   raise RuntimeError("generator didn't yield") from None                         │
│   140                                                                                            │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/_nebari/stages/base.py:65 in     │
│ deploy                                                                                           │
│                                                                                                  │
│    62 │   ):                                                                                     │
│    63 │   │   deploy_config = dict(                                                              │
│    64 │   │   │   directory=str(self.output_directory / self.stage_prefix),                      │
│ ❱  65 │   │   │   input_vars=self.input_vars(stage_outputs),                                     │
│    66 │   │   )                                                                                  │
│    67 │   │   state_imports = self.state_imports()                                               │
│    68 │   │   if state_imports:                                                                  │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/_nebari/stages/kubernetes_keyclo │
│ ak/__init__.py:227 in input_vars                                                                 │
│                                                                                                  │
│   224 │   │   ]                                                                                  │
│   225 │                                                                                          │
│   226 │   def input_vars(self, stage_outputs: Dict[str, Dict[str, Any]]):                        │
│ ❱ 227 │   │   return InputVars(                                                                  │
│   228 │   │   │   name=self.config.project_name,                                                 │
│   229 │   │   │   environment=self.config.namespace,                                             │
│   230 │   │   │   endpoint=stage_outputs["stages/04-kubernetes-ingress"]["domain"],              │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/pydantic/main.py:164 in __init__ │
│                                                                                                  │
│    161 │   │   """                                                                               │
│    162 │   │   # `__tracebackhide__` tells pytest and some other tools to omit this function fr  │
│    163 │   │   __tracebackhide__ = True                                                          │
│ ❱  164 │   │   __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__p  │
│    165 │                                                                                         │
│    166 │   # The following line sets a flag that we use to determine when `__init__` gets overr  │
│    167 │   __init__.__pydantic_base_init__ = True                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValidationError: 1 validation error for InputVars
node_group
  Input should be a valid dictionary [type=dict_type, input_value=KeyValueDict(key='kuberne...s.io/os', value='linux'), input_type=KeyValueDict]
    For further information visit https://errors.pydantic.dev/2.4/v/dict_type

@marcelovilla marcelovilla added area: nebari-cli area: schema and removed needs: triage 🚦 Someone needs to have a look at this issue and triage labels Nov 4, 2024
@dcmcand
Copy link
Contributor

dcmcand commented Nov 4, 2024

Hi @mcg1969 , thanks for reporting this.

For local deploys, we use Kind and test with it. Using K3s would be essentially the same as using an existing cluster, which is the least tested and documented part of Nebari.

Could you validate that the local deploy with kind does work for you? That would let us narrow this down to the existing provider.

Thanks!

@mcg1969
Copy link
Author

mcg1969 commented Nov 4, 2024

I was indeed using the existing approach, not the local approach. And that choice is deliberate—kind is not an option for the use case being considered here. This isn't actually intended to be a local deployment.

@dcmcand dcmcand added the needs: investigation 🔍 Someone in the team needs to find the root cause and replicate this bug label Nov 4, 2024
@mcg1969
Copy link
Author

mcg1969 commented Nov 4, 2024

@dcmcand Confirming: I do not encounter this with the AWS target.

@dcmcand
Copy link
Contributor

dcmcand commented Nov 5, 2024

thanks @mcg1969, that is helpful.

@dcmcand
Copy link
Contributor

dcmcand commented Nov 5, 2024

@mcg1969 I was able to reproduce this issue when deploying to k3s from 2024.7.1, but not from the current main branch. There may be other issues, but this error is not occurring.

I believe this issue is related to #2767 and was likely fixed by #2797. We will have a new release here within a couple of days. Once the new release is out, can you retry?

The traefik CRD's are still an issue, but that is essentially a new feature request, where this is a bug.

@mcg1969
Copy link
Author

mcg1969 commented Nov 5, 2024

Yes, happy to test. I totally understand about the other issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: nebari-cli area: schema needs: investigation 🔍 Someone in the team needs to find the root cause and replicate this bug provider: Existing type: bug 🐛 Something isn't working
Projects
Status: No status
Development

No branches or pull requests

3 participants