[metadata] If EnableValueDescriptorManagment=true, error is returned on first device profile creation #2580

chr1shung · 2020-06-05T02:02:00Z

🐞 Bug Report

Affected Services

The issue is located in: Metadata

Is this a regression?

Yes, the previous version in which this bug was not present was: Fuji

Description and Minimal Reproduction

edgex-go/cmd/core-metadata/res/configuration.toml

Line 3 in 525c654

EnableValueDescriptorManagement = false

If we set this to true, there will be error when device service trying to update the very first device profile to metadata during bootstrap. (However the device profile did successfully be updated on metadata, while device service getting the EOF error)

🔥 Exception or Error




...
level=INFO ts=2020-06-05T01:33:55.2163551Z app=device-virtual source=registry.go:79 msg="Using Registry (consul) from http://edgex-core-consul:8500"
level=INFO ts=2020-06-05T01:33:55.2247826Z app=device-virtual source=httpserver.go:89 msg="Web server starting (edgex-device-virtual:49990)"
level=INFO ts=2020-06-05T01:33:55.2249661Z app=device-virtual source=init.go:134 msg="Check Metadata service's status via Registry..."
level=INFO ts=2020-06-05T01:33:55.2249966Z app=device-virtual source=init.go:134 msg="Check Data service's status via Registry..."
level=ERROR ts=2020-06-05T01:33:55.2294863Z app=device-virtual source=init.go:149 msg=" edgex-core-data service not healthy..."
level=INFO ts=2020-06-05T01:34:00.2304191Z app=device-virtual source=init.go:134 msg="Check Data service's status via Registry..."
level=INFO ts=2020-06-05T01:34:00.2350735Z app=device-virtual source=init.go:42 msg="Service clients initialize successful."
level=INFO ts=2020-06-05T01:34:00.256736Z app=device-virtual source=service.go:83 msg="Device Service device-virtual doesn't exist, creating a new one"
level=INFO ts=2020-06-05T01:34:00.2575472Z app=device-virtual source=service.go:140 msg="Addressable device-virtual doesn't exist, creating a new one"
level=ERROR ts=2020-06-05T01:34:15.2670623Z app=device-virtual source=profiles.go:88 msg="Add Device Profile: /res/device.virtual.binary.yaml to Core Metadata failed: Post http://edgex-core-metadata:48081/api/v1/deviceprofile: EOF"
level=INFO ts=2020-06-05T01:34:15.3230042Z app=device-virtual source=device.go:58 msg="Added device: Random-Boolean-Device"
level=INFO ts=2020-06-05T01:34:15.325888Z app=device-virtual source=device.go:58 msg="Added device: Random-Integer-Device"
level=ERROR ts=2020-06-05T01:34:15.3262665Z app=device-virtual source=devices.go:43 msg="Device Profile Random-Binary-Device doesn't exist for Device Random-Binary-Device"
level=ERROR ts=2020-06-05T01:34:15.3263025Z app=device-virtual source=devices.go:31 msg="creating Device from config failed: Random-Binary-Device"
level=INFO ts=2020-06-05T01:34:15.3266977Z app=device-virtual source=httpserver.go:106 msg="Web server shutting down"
level=INFO ts=2020-06-05T01:34:15.32685Z app=device-virtual source=httpserver.go:97 msg="Web server stopped"
level=INFO ts=2020-06-05T01:34:15.8274455Z app=device-virtual source=httpserver.go:108 msg="Web server shut down"
level=INFO ts=2020-06-05T01:34:15.8275351Z app=device-virtual source=bootstrap.go:130 msg="Un-Registering service from the Registry"
level=INFO ts=2020-06-05T01:34:15.8337617Z app=device-virtual source=virtualdriver.go:156 msg="VirtualDriver.Stop: device-virtual driver is stopping..."

🌍 Your Environment

Deployment Environment: macOS

EdgeX Version:
Geneva release non-security redis

Anything else relevant?
after further investigation, this issue is related to #2526

The text was updated successfully, but these errors were encountered:

tsconn23 · 2020-06-05T19:29:46Z

I've tried three different ways to reproduce this and I cannot. I am confused by the issue's content as the author refers to updating a device profile when the error message shows an add is occurring. The error above is being thrown from the device service.

** As indicated in the issue, I have set EnableValueDescriptorManagement = true **
1.) Using Postman with clean Redis database, I POST'ed the device.virtual.binary.yaml file to the http://localhost:48081/api/v1/deviceprofile/uploadfile endpoint to create the device profile. I then re-posted the same profile in an attempt to recreate the update described above and received the correct response from core-metadata
409 Confict Duplicate name for device profile

2.) I then built the example device service in the device-sdk-go project. With a clean database I started it to create the device profile. I then stopped it and re-started it. The device-simple service correctly identified that it already existed.

level=INFO ts=2020-06-05T19:26:33.139476Z app=device-simple source=service.go:90 msg="Device Service device-simple exists"

3.) I then built the device-virtual-go service and attempted to start it after cleaning the Redis database. I received the following error when the binary profile YAML was being uploaded for the first time. This is NOT the same error as the one above.

level=INFO ts=2020-06-05T19:18:34.914168Z app=device-virtual source=virtualdriver.go:249 msg="Insert one row into db failed: EncodeScalars: cannot encode string of length 33554432 (limit 65536)"
level=ERROR ts=2020-06-05T19:18:34.914206Z app=device-virtual source=device.go:70 msg="Invoked driver.AddDevice callback failed for Random-Binary-Device: EncodeScalars: cannot encode string of length 33554432 (limit 65536)"

Please provide additional instructions for how to duplicate the error reported in the issue above assuming it's not something I've already tried here.

tsconn23 · 2020-06-05T19:58:45Z

In addition I just tried the following:

1.)

With clean Redis database start device-simple (example in device-sdk-go). Profile is created in core-metadata.
Stop device-simple and edit profile YAML to change the name from Simple-Device to Simple-Device1.
Restart service. New profile is created however value descriptor is not because it already exists.

level=ERROR ts=2020-06-05T19:48:44.173096Z app=device-simple source=profiles.go:88 msg="Add Device Profile: /device-sdk-go/example/cmd/device-simple/res/Simple-Driver.yaml to Core Metadata failed: 409 - 409 - duplicate name for value descriptor: SwitchButton\n\n"

2.)

Following from above, stop device-simple service. Edit YAML to restore the original profile name of Simple-Device. Now edit one of the value descriptors such as SwitchButton to SwitchButton1.
Restart service. Service correctly identifies that profile already exists and does not proceed to create the new SwitchButton1 value descriptor

tsconn23 · 2020-06-05T20:06:11Z

@hahattan In the error you have an absolute path to your device profile as /res/device.virtual.binary.yaml If that file doesn't exist on disk, that's the most likely cause of the EOF.

cloudxxx8 · 2020-06-05T23:04:46Z

@tsconn23 we've investigated this issue, and found out the root cause is the as #2526
The Consul discovery between core-metadata and core-data cause this issue.
I am writing the clear re-create steps and will post later.

cloudxxx8 · 2020-06-05T23:22:18Z

reproduce steps:

Retrieve the Geneva docker-compose file and edit it, for example https://github.com/edgexfoundry/developer-scripts/blob/master/releases/geneva/compose-files/docker-compose-geneva-redis-no-secty.yml
Add an environment variable override in core-metadata Writable_EnableValueDescriptorManagement: "true"

  metadata:
    image: edgexfoundry/docker-core-metadata-go:1.2.0
    ports:
      - "127.0.0.1:48081:48081"
    container_name: edgex-core-metadata
    hostname: edgex-core-metadata
    networks:
      - edgex-network
    environment:
      <<: *common-variables
      Service_Host: edgex-core-metadata
      Notifications_Sender: edgex-core-metadata
      Writable_EnableValueDescriptorManagement: "true"
    depends_on:
      - consul
#      - logging  # uncomment if re-enabled remote logging
      - redis
      - notifications

docker-compose -f docker-compose-geneva-redis-no-secty.yml up -d
see the log in device-virtual
docker logs edgex-device-virtual

It happens whenever create a new Device Profile at first time only, because core-metadata takes too much time to get URL to create valuedescriptors in core-data

lenny-goodell · 2020-06-05T23:37:46Z

I was able to recreated the issue. Device Vitual failed to start with the following error in the logs:

level=ERROR ts=2020-06-05T23:34:00.327837442Z app=device-virtual source=profiles.go:88 msg="Add Device Profile: /res/device.virtual.binary.yaml to Core Metadata failed: Post http://edgex-core-metadata:48081/api/v1/deviceprofile: EOF"

Interestingly, core metadata has this error:

level=ERROR ts=2020-06-05T23:34:00.449605063Z app=edgex-core-metadata source=request.go:63 msg="Post http://edgex-device-virtual:49990/api/v1/callback: dial tcp 172.21.0.9:49990: connect: connection refused"

jpwhitemn · 2020-06-06T01:01:55Z

Thanks Cloud (and Taipei team) and Lenny. Good work guys. Sent from Yahoo Mail on Android On Fri, Jun 5, 2020 at 6:38 PM, Lenny Goodell<[email protected]> wrote: I was able to recreated the issue. Device Vitual failed to start with the following error in the logs: level=ERROR ts=2020-06-05T23:34:00.327837442Z app=device-virtual source=profiles.go:88 msg="Add Device Profile: /res/device.virtual.binary.yaml to Core Metadata failed: Post http://edgex-core-metadata:48081/api/v1/deviceprofile: EOF" Interestingly, core metadata has this error: level=ERROR ts=2020-06-05T23:34:00.449605063Z app=edgex-core-metadata source=request.go:63 msg="Post http://edgex-device-virtual:49990/api/v1/callback: dial tcp 172.21.0.9:49990: connect: connection refused" — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

lenny-goodell · 2020-06-08T15:19:07Z

Adding Service_Timeout: "20000" to Device Virtual seems to resolve this.

lenny-goodell · 2020-06-09T22:12:56Z

I have spent some time root causing this issue & #2526. The root cause for both is the refactoring of the Client Monitoring now has the client URL un-initialized until the first request to Registry which is done at the start of the go func, but the go func has to be spun up before other thread tries to access the client. If the URL is un-initialized when the client URL is accessed that thread is made to wait for the monitor interval which is set to 15secs .

The quick fix and @tonyespy will like it is to disable the monitoring by always returning the local.New(url) from this factory function.
https://github.com/edgexfoundry/edgex-go/blob/master/internal/pkg/urlclient/factory.go#L55

tonyespy · 2020-06-09T23:39:13Z

My reaction aside, I thought the plan in the short term was to increase the timeout in device-sdk-go per @cloudxxx8's suggestion? Disabling the monitoring (which I'd really like seen done) right now seems like a big hammer without some more discussion.

In the long run, I think we should consider two approaches:

add config support to disable monitoring (although if we make the change in Hanoi, is that a breaking change?)

OR

re-work the monitoring such that instead of polling the registry, we use leverage some sort of notification from Consul (e.g. https://www.consul.io/docs/agent/watches.html)

lenny-goodell · 2020-06-09T23:50:34Z

I thought the plan in the short term was to increase the timeout

Yes, for the point release we discussed increasing the timeout using override in the compose file for Device Virtual. All other Device services will also need this as we don't know which will trigger the issue first...

iain-anderson · 2020-06-10T09:35:55Z

Interestingly, core metadata has this error: level=ERROR ts=2020-06-05T23:34:00.449605063Z app=edgex-core-metadata source=request.go:63 msg="Post http://edgex-device-virtual:49990/api/v1/callback: dial tcp 172.21.0.9:49990: connect: connection refused"

Does this indicate a startup sequencing issue in the sdk - the http server should be started before posting new profiles to metadata?

lenny-goodell · 2020-06-10T15:58:09Z

@iain-anderson , that error is a side effect of the Device Service exiting during startup due to timeout on call to Core Metadata. It is Core Metadata's Client for CoreData that is stuck in a sleep waiting for the URL that is causing the Device Service's call to timeout.

chr1shung · 2020-06-11T11:07:47Z

Adding Service_Timeout: "20000" to Device Virtual seems to resolve this.

still have the same error with this ENV override

lenny-goodell · 2020-06-11T21:18:44Z

still have the same error with this ENV override

@hahattan , can you try increasing the timeout to see if your system just needs more time?

chr1shung · 2020-06-12T05:24:15Z

still have the same error with this ENV override

@hahattan , can you try increasing the timeout to see if your system just needs more time?

I've tried increase to 60000 but seems the same (it sometimes passed, but most of the case the error still shows up)

weichou1229 · 2020-06-15T12:42:31Z

@hahattan @lenny-intel @cloudxxx8
Adding Service_Timeout: "20000" to ~~Device Virtual~~ core-metadata can resolve this.
Because core-metadata's HTTP server timeout causes the device-virtual's HTTP client EOF error.

https://github.com/edgexfoundry/go-mod-bootstrap/blob/master/bootstrap/handlers/httpserver/httpserver.go#L86-L91

lenny-goodell · 2020-06-15T15:34:36Z

Thanks @hahattan , I will add this to the Geneva 1.2.1 compose files.

jpwhitemn · 2020-06-25T15:47:21Z

Should be cleared up once we remove client monitoring; relook before Hanoi release.

jpwhitemn · 2020-06-25T22:19:21Z

Issue being worked as part of larger client monitoring removal. See #2595 . @hahattan - please retest when you can.

chr1shung · 2020-06-29T17:07:51Z

tested with latest nightly-build image and the error no longer seen, should be good to close this issue.

cloudxxx8 · 2020-06-30T03:43:56Z

thanks, @hahattan

chr1shung added the bug Something isn't working label Jun 5, 2020

tonyespy changed the title ~~Post device profile failed: EOF~~ [metadata] If EnableValueDescriptorManagment=true, EOF error is returned on first device profile creation Jun 8, 2020

tonyespy changed the title ~~[metadata] If EnableValueDescriptorManagment=true, EOF error is returned on first device profile creation~~ [metadata] If EnableValueDescriptorManagment=true, error is returned on first device profile creation Jun 8, 2020

jpwhitemn added the hanoi Hanoi release label Jun 25, 2020

cloudxxx8 assigned chr1shung Jun 29, 2020

cloudxxx8 closed this as completed Jun 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[metadata] If EnableValueDescriptorManagment=true, error is returned on first device profile creation #2580

[metadata] If EnableValueDescriptorManagment=true, error is returned on first device profile creation #2580

chr1shung commented Jun 5, 2020 •

edited

Loading

tsconn23 commented Jun 5, 2020 •

edited

Loading

tsconn23 commented Jun 5, 2020

tsconn23 commented Jun 5, 2020

cloudxxx8 commented Jun 5, 2020

cloudxxx8 commented Jun 5, 2020 •

edited

Loading

lenny-goodell commented Jun 5, 2020

jpwhitemn commented Jun 6, 2020 via email

lenny-goodell commented Jun 8, 2020 •

edited

Loading

lenny-goodell commented Jun 9, 2020

tonyespy commented Jun 9, 2020

lenny-goodell commented Jun 9, 2020

iain-anderson commented Jun 10, 2020

lenny-goodell commented Jun 10, 2020

chr1shung commented Jun 11, 2020

lenny-goodell commented Jun 11, 2020

chr1shung commented Jun 12, 2020

weichou1229 commented Jun 15, 2020

lenny-goodell commented Jun 15, 2020

jpwhitemn commented Jun 25, 2020

jpwhitemn commented Jun 25, 2020

chr1shung commented Jun 29, 2020

cloudxxx8 commented Jun 30, 2020

[metadata] If EnableValueDescriptorManagment=true, error is returned on first device profile creation #2580

[metadata] If EnableValueDescriptorManagment=true, error is returned on first device profile creation #2580

Comments

chr1shung commented Jun 5, 2020 • edited Loading

🐞 Bug Report

Affected Services

Is this a regression?

Description and Minimal Reproduction

🔥 Exception or Error

🌍 Your Environment

tsconn23 commented Jun 5, 2020 • edited Loading

tsconn23 commented Jun 5, 2020

tsconn23 commented Jun 5, 2020

cloudxxx8 commented Jun 5, 2020

cloudxxx8 commented Jun 5, 2020 • edited Loading

lenny-goodell commented Jun 5, 2020

jpwhitemn commented Jun 6, 2020 via email

lenny-goodell commented Jun 8, 2020 • edited Loading

lenny-goodell commented Jun 9, 2020

tonyespy commented Jun 9, 2020

lenny-goodell commented Jun 9, 2020

iain-anderson commented Jun 10, 2020

lenny-goodell commented Jun 10, 2020

chr1shung commented Jun 11, 2020

lenny-goodell commented Jun 11, 2020

chr1shung commented Jun 12, 2020

weichou1229 commented Jun 15, 2020

lenny-goodell commented Jun 15, 2020

jpwhitemn commented Jun 25, 2020

jpwhitemn commented Jun 25, 2020

chr1shung commented Jun 29, 2020

cloudxxx8 commented Jun 30, 2020

chr1shung commented Jun 5, 2020 •

edited

Loading

tsconn23 commented Jun 5, 2020 •

edited

Loading

cloudxxx8 commented Jun 5, 2020 •

edited

Loading

lenny-goodell commented Jun 8, 2020 •

edited

Loading