Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metadata] If EnableValueDescriptorManagment=true, error is returned on first device profile creation #2580

Closed
chr1shung opened this issue Jun 5, 2020 · 22 comments
Assignees
Labels
bug Something isn't working hanoi Hanoi release

Comments

@chr1shung
Copy link

chr1shung commented Jun 5, 2020

🐞 Bug Report

Affected Services

The issue is located in: Metadata

Is this a regression?

Yes, the previous version in which this bug was not present was: Fuji

Description and Minimal Reproduction

EnableValueDescriptorManagement = false

If we set this to true, there will be error when device service trying to update the very first device profile to metadata during bootstrap. (However the device profile did successfully be updated on metadata, while device service getting the EOF error)

🔥 Exception or Error




...
level=INFO ts=2020-06-05T01:33:55.2163551Z app=device-virtual source=registry.go:79 msg="Using Registry (consul) from http://edgex-core-consul:8500"
level=INFO ts=2020-06-05T01:33:55.2247826Z app=device-virtual source=httpserver.go:89 msg="Web server starting (edgex-device-virtual:49990)"
level=INFO ts=2020-06-05T01:33:55.2249661Z app=device-virtual source=init.go:134 msg="Check Metadata service's status via Registry..."
level=INFO ts=2020-06-05T01:33:55.2249966Z app=device-virtual source=init.go:134 msg="Check Data service's status via Registry..."
level=ERROR ts=2020-06-05T01:33:55.2294863Z app=device-virtual source=init.go:149 msg=" edgex-core-data service not healthy..."
level=INFO ts=2020-06-05T01:34:00.2304191Z app=device-virtual source=init.go:134 msg="Check Data service's status via Registry..."
level=INFO ts=2020-06-05T01:34:00.2350735Z app=device-virtual source=init.go:42 msg="Service clients initialize successful."
level=INFO ts=2020-06-05T01:34:00.256736Z app=device-virtual source=service.go:83 msg="Device Service device-virtual doesn't exist, creating a new one"
level=INFO ts=2020-06-05T01:34:00.2575472Z app=device-virtual source=service.go:140 msg="Addressable device-virtual doesn't exist, creating a new one"
level=ERROR ts=2020-06-05T01:34:15.2670623Z app=device-virtual source=profiles.go:88 msg="Add Device Profile: /res/device.virtual.binary.yaml to Core Metadata failed: Post http://edgex-core-metadata:48081/api/v1/deviceprofile: EOF"
level=INFO ts=2020-06-05T01:34:15.3230042Z app=device-virtual source=device.go:58 msg="Added device: Random-Boolean-Device"
level=INFO ts=2020-06-05T01:34:15.325888Z app=device-virtual source=device.go:58 msg="Added device: Random-Integer-Device"
level=ERROR ts=2020-06-05T01:34:15.3262665Z app=device-virtual source=devices.go:43 msg="Device Profile Random-Binary-Device doesn't exist for Device Random-Binary-Device"
level=ERROR ts=2020-06-05T01:34:15.3263025Z app=device-virtual source=devices.go:31 msg="creating Device from config failed: Random-Binary-Device"
level=INFO ts=2020-06-05T01:34:15.3266977Z app=device-virtual source=httpserver.go:106 msg="Web server shutting down"
level=INFO ts=2020-06-05T01:34:15.32685Z app=device-virtual source=httpserver.go:97 msg="Web server stopped"
level=INFO ts=2020-06-05T01:34:15.8274455Z app=device-virtual source=httpserver.go:108 msg="Web server shut down"
level=INFO ts=2020-06-05T01:34:15.8275351Z app=device-virtual source=bootstrap.go:130 msg="Un-Registering service from the Registry"
level=INFO ts=2020-06-05T01:34:15.8337617Z app=device-virtual source=virtualdriver.go:156 msg="VirtualDriver.Stop: device-virtual driver is stopping..."

🌍 Your Environment

Deployment Environment: macOS

EdgeX Version:
Geneva release non-security redis

Anything else relevant?
after further investigation, this issue is related to #2526

@chr1shung chr1shung added the bug Something isn't working label Jun 5, 2020
@tsconn23
Copy link
Member

tsconn23 commented Jun 5, 2020

I've tried three different ways to reproduce this and I cannot. I am confused by the issue's content as the author refers to updating a device profile when the error message shows an add is occurring. The error above is being thrown from the device service.

** As indicated in the issue, I have set EnableValueDescriptorManagement = true **
1.) Using Postman with clean Redis database, I POST'ed the device.virtual.binary.yaml file to the http://localhost:48081/api/v1/deviceprofile/uploadfile endpoint to create the device profile. I then re-posted the same profile in an attempt to recreate the update described above and received the correct response from core-metadata
409 Confict Duplicate name for device profile

2.) I then built the example device service in the device-sdk-go project. With a clean database I started it to create the device profile. I then stopped it and re-started it. The device-simple service correctly identified that it already existed.

level=INFO ts=2020-06-05T19:26:33.139476Z app=device-simple source=service.go:90 msg="Device Service device-simple exists"

3.) I then built the device-virtual-go service and attempted to start it after cleaning the Redis database. I received the following error when the binary profile YAML was being uploaded for the first time. This is NOT the same error as the one above.

level=INFO ts=2020-06-05T19:18:34.914168Z app=device-virtual source=virtualdriver.go:249 msg="Insert one row into db failed: EncodeScalars: cannot encode string of length 33554432 (limit 65536)"
level=ERROR ts=2020-06-05T19:18:34.914206Z app=device-virtual source=device.go:70 msg="Invoked driver.AddDevice callback failed for Random-Binary-Device: EncodeScalars: cannot encode string of length 33554432 (limit 65536)"

Please provide additional instructions for how to duplicate the error reported in the issue above assuming it's not something I've already tried here.

@tsconn23
Copy link
Member

tsconn23 commented Jun 5, 2020

In addition I just tried the following:

1.)

  • With clean Redis database start device-simple (example in device-sdk-go). Profile is created in core-metadata.
  • Stop device-simple and edit profile YAML to change the name from Simple-Device to Simple-Device1.
  • Restart service. New profile is created however value descriptor is not because it already exists.
level=ERROR ts=2020-06-05T19:48:44.173096Z app=device-simple source=profiles.go:88 msg="Add Device Profile: /device-sdk-go/example/cmd/device-simple/res/Simple-Driver.yaml to Core Metadata failed: 409 - 409 - duplicate name for value descriptor: SwitchButton\n\n"

2.)

  • Following from above, stop device-simple service. Edit YAML to restore the original profile name of Simple-Device. Now edit one of the value descriptors such as SwitchButton to SwitchButton1.
  • Restart service. Service correctly identifies that profile already exists and does not proceed to create the new SwitchButton1 value descriptor

@tsconn23
Copy link
Member

tsconn23 commented Jun 5, 2020

@hahattan In the error you have an absolute path to your device profile as /res/device.virtual.binary.yaml If that file doesn't exist on disk, that's the most likely cause of the EOF.

@cloudxxx8
Copy link
Member

@tsconn23 we've investigated this issue, and found out the root cause is the as #2526
The Consul discovery between core-metadata and core-data cause this issue.
I am writing the clear re-create steps and will post later.

@cloudxxx8
Copy link
Member

cloudxxx8 commented Jun 5, 2020

reproduce steps:

  1. Retrieve the Geneva docker-compose file and edit it, for example https://github.com/edgexfoundry/developer-scripts/blob/master/releases/geneva/compose-files/docker-compose-geneva-redis-no-secty.yml
  2. Add an environment variable override in core-metadata Writable_EnableValueDescriptorManagement: "true"
  metadata:
    image: edgexfoundry/docker-core-metadata-go:1.2.0
    ports:
      - "127.0.0.1:48081:48081"
    container_name: edgex-core-metadata
    hostname: edgex-core-metadata
    networks:
      - edgex-network
    environment:
      <<: *common-variables
      Service_Host: edgex-core-metadata
      Notifications_Sender: edgex-core-metadata
      Writable_EnableValueDescriptorManagement: "true"
    depends_on:
      - consul
#      - logging  # uncomment if re-enabled remote logging
      - redis
      - notifications
  1. docker-compose -f docker-compose-geneva-redis-no-secty.yml up -d
  2. see the log in device-virtual
    docker logs edgex-device-virtual

It happens whenever create a new Device Profile at first time only, because core-metadata takes too much time to get URL to create valuedescriptors in core-data

@lenny-goodell
Copy link
Member

I was able to recreated the issue. Device Vitual failed to start with the following error in the logs:

level=ERROR ts=2020-06-05T23:34:00.327837442Z app=device-virtual source=profiles.go:88 msg="Add Device Profile: /res/device.virtual.binary.yaml to Core Metadata failed: Post http://edgex-core-metadata:48081/api/v1/deviceprofile: EOF"

Interestingly, core metadata has this error:

level=ERROR ts=2020-06-05T23:34:00.449605063Z app=edgex-core-metadata source=request.go:63 msg="Post http://edgex-device-virtual:49990/api/v1/callback: dial tcp 172.21.0.9:49990: connect: connection refused"

@jpwhitemn
Copy link
Member

jpwhitemn commented Jun 6, 2020 via email

@lenny-goodell
Copy link
Member

lenny-goodell commented Jun 8, 2020

Adding Service_Timeout: "20000" to Device Virtual seems to resolve this.

@tonyespy tonyespy changed the title Post device profile failed: EOF [metadata] If EnableValueDescriptorManagment=true, EOF error is returned on first device profile creation Jun 8, 2020
@tonyespy tonyespy changed the title [metadata] If EnableValueDescriptorManagment=true, EOF error is returned on first device profile creation [metadata] If EnableValueDescriptorManagment=true, error is returned on first device profile creation Jun 8, 2020
@lenny-goodell
Copy link
Member

I have spent some time root causing this issue & #2526. The root cause for both is the refactoring of the Client Monitoring now has the client URL un-initialized until the first request to Registry which is done at the start of the go func, but the go func has to be spun up before other thread tries to access the client. If the URL is un-initialized when the client URL is accessed that thread is made to wait for the monitor interval which is set to 15secs .

The quick fix and @tonyespy will like it is to disable the monitoring by always returning the local.New(url) from this factory function.
https://github.com/edgexfoundry/edgex-go/blob/master/internal/pkg/urlclient/factory.go#L55

@tonyespy
Copy link
Member

tonyespy commented Jun 9, 2020

My reaction aside, I thought the plan in the short term was to increase the timeout in device-sdk-go per @cloudxxx8's suggestion? Disabling the monitoring (which I'd really like seen done) right now seems like a big hammer without some more discussion.

In the long run, I think we should consider two approaches:

  • add config support to disable monitoring (although if we make the change in Hanoi, is that a breaking change?)

OR

@lenny-goodell
Copy link
Member

I thought the plan in the short term was to increase the timeout

Yes, for the point release we discussed increasing the timeout using override in the compose file for Device Virtual. All other Device services will also need this as we don't know which will trigger the issue first...

@iain-anderson
Copy link
Member

Interestingly, core metadata has this error: level=ERROR ts=2020-06-05T23:34:00.449605063Z app=edgex-core-metadata source=request.go:63 msg="Post http://edgex-device-virtual:49990/api/v1/callback: dial tcp 172.21.0.9:49990: connect: connection refused"

Does this indicate a startup sequencing issue in the sdk - the http server should be started before posting new profiles to metadata?

@lenny-goodell
Copy link
Member

@iain-anderson , that error is a side effect of the Device Service exiting during startup due to timeout on call to Core Metadata. It is Core Metadata's Client for CoreData that is stuck in a sleep waiting for the URL that is causing the Device Service's call to timeout.

@chr1shung
Copy link
Author

Adding Service_Timeout: "20000" to Device Virtual seems to resolve this.

still have the same error with this ENV override

@lenny-goodell
Copy link
Member

still have the same error with this ENV override

@hahattan , can you try increasing the timeout to see if your system just needs more time?

@chr1shung
Copy link
Author

still have the same error with this ENV override

@hahattan , can you try increasing the timeout to see if your system just needs more time?

I've tried increase to 60000 but seems the same (it sometimes passed, but most of the case the error still shows up)

@weichou1229
Copy link
Member

@hahattan @lenny-intel @cloudxxx8
Adding Service_Timeout: "20000" to Device Virtual core-metadata can resolve this.
Because core-metadata's HTTP server timeout causes the device-virtual's HTTP client EOF error.

https://github.com/edgexfoundry/go-mod-bootstrap/blob/master/bootstrap/handlers/httpserver/httpserver.go#L86-L91

@lenny-goodell
Copy link
Member

Thanks @hahattan , I will add this to the Geneva 1.2.1 compose files.

@jpwhitemn
Copy link
Member

Should be cleared up once we remove client monitoring; relook before Hanoi release.

@jpwhitemn
Copy link
Member

Issue being worked as part of larger client monitoring removal. See #2595 . @hahattan - please retest when you can.

@jpwhitemn jpwhitemn added the hanoi Hanoi release label Jun 25, 2020
@chr1shung
Copy link
Author

tested with latest nightly-build image and the error no longer seen, should be good to close this issue.

@cloudxxx8
Copy link
Member

thanks, @hahattan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working hanoi Hanoi release
Projects
None yet
Development

No branches or pull requests

8 participants