Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add aspects to VALUE model of datasets #1940

Merged
merged 1 commit into from
Oct 23, 2020

Conversation

jywadhwani
Copy link
Contributor

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

@mars-lan mars-lan added the hacktoberfest-accepted Acceptance for hacktoberfest https://hacktoberfest.com/participation/ label Oct 14, 2020
@jywadhwani
Copy link
Contributor Author

jywadhwani commented Oct 16, 2020

Tested this e2e by looking at the search response. With this change one can see the underlying metadata aspects being returned as part of search response, as shown below.

{
  "start": 0,
  "count": 10,
  "total": 3,
  "elements": [{
    "urn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)",
    "ownership": {
      "owners": [{
        "type": "DATAOWNER",
        "owner": "urn:li:corpuser:jdoe"
      }, {
        "type": "DATAOWNER",
        "owner": "urn:li:corpuser:datahub"
      }],
      "lastModified": {
        "actor": "urn:li:corpuser:jdoe",
        "time": 1581407189000
      }
    },
    "origin": "PROD",
    "name": "SampleHdfsDataset",
    "institutionalMemory": {
      "elements": [{
        "createStamp": {
          "actor": "urn:li:corpuser:jdoe",
          "time": 1581407189000
        },
        "description": "Sample doc",
        "url": "https://www.linkedin.com"
      }]
    },
    "upstreamLineage": {
      "upstreams": [{
        "type": "TRANSFORMED",
        "auditStamp": {
          "actor": "urn:li:corpuser:jdoe",
          "time": 1581407189000
        },
        "dataset": "urn:li:dataset:(urn:li:dataPlatform:kafka,SampleKafkaDataset,PROD)"
      }]
    },
    "schemaMetadata": {
      "platformSchema": {
        "com.linkedin.schema.KafkaSchema": {
          "documentSchema": "{\"type\":\"record\",\"name\":\"SampleHdfsSchema\",\"namespace\":\"com.linkedin.dataset\",\"doc\":\"Sample HDFS dataset\",\"fields\":[{\"name\":\"field_foo\",\"type\":[\"string\"]},{\"name\":\"field_bar\",\"type\":[\"boolean\"]}]}"
        }
      },
      "created": {
        "actor": "urn:li:corpuser:jdoe",
        "time": 1581407189000
      },
      "lastModified": {
        "actor": "urn:li:corpuser:jdoe",
        "time": 1581407189000
      },
      "fields": [{
        "fieldPath": "field_foo",
        "description": "Foo field description",
        "type": {
          "type": {
            "com.linkedin.schema.StringType": {}
          }
        },
        "nullable": false,
        "recursive": false,
        "nativeDataType": "string"
      }, {
        "fieldPath": "field_bar",
        "description": "Bar field description",
        "type": {
          "type": {
            "com.linkedin.schema.BooleanType": {}
          }
        },
        "nullable": false,
        "recursive": false,
        "nativeDataType": "boolean"
      }],
      "schemaName": "SampleHdfsSchema",
      "version": 0,
      "platform": "urn:li:dataPlatform:hdfs",
      "hash": ""
    },
    "platform": "urn:li:dataPlatform:hdfs"
  }, {
    "urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
    "ownership": {
      "owners": [{
        "type": "DATAOWNER",
        "owner": "urn:li:corpuser:jdoe"
      }, {
        "type": "DATAOWNER",
        "owner": "urn:li:corpuser:datahub"
      }],
      "lastModified": {
        "actor": "urn:li:corpuser:jdoe",
        "time": 1581407189000
      }
    },
    "origin": "PROD",
    "name": "SampleHiveDataset",
    "institutionalMemory": {
      "elements": [{
        "createStamp": {
          "actor": "urn:li:corpuser:jdoe",
          "time": 1581407189000
        },
        "description": "Sample doc",
        "url": "https://www.linkedin.com"
      }]
    },
    "upstreamLineage": {
      "upstreams": [{
        "type": "TRANSFORMED",
        "auditStamp": {
          "actor": "urn:li:corpuser:jdoe",
          "time": 1581407189000
        },
        "dataset": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)"
      }]
    },
    "schemaMetadata": {
      "platformSchema": {
        "com.linkedin.schema.KafkaSchema": {
          "documentSchema": "{\"type\":\"record\",\"name\":\"SampleHiveSchema\",\"namespace\":\"com.linkedin.dataset\",\"doc\":\"Sample Hive dataset\",\"fields\":[{\"name\":\"field_foo\",\"type\":[\"string\"]},{\"name\":\"field_bar\",\"type\":[\"boolean\"]}]}"
        }
      },
      "created": {
        "actor": "urn:li:corpuser:jdoe",
        "time": 1581407189000
      },
      "lastModified": {
        "actor": "urn:li:corpuser:jdoe",
        "time": 1581407189000
      },
      "fields": [{
        "fieldPath": "field_foo",
        "description": "Foo field description",
        "type": {
          "type": {
            "com.linkedin.schema.StringType": {}
          }
        },
        "nullable": false,
        "recursive": false,
        "nativeDataType": "string"
      }, {
        "fieldPath": "field_bar",
        "description": "Bar field description",
        "type": {
          "type": {
            "com.linkedin.schema.BooleanType": {}
          }
        },
        "nullable": false,
        "recursive": false,
        "nativeDataType": "boolean"
      }],
      "schemaName": "SampleHiveSchema",
      "version": 0,
      "platform": "urn:li:dataPlatform:hive",
      "hash": ""
    },
    "platform": "urn:li:dataPlatform:hive"
  }, {
    "urn": "urn:li:dataset:(urn:li:dataPlatform:kafka,SampleKafkaDataset,PROD)",
    "ownership": {
      "owners": [{
        "type": "DATAOWNER",
        "owner": "urn:li:corpuser:jdoe"
      }, {
        "type": "DATAOWNER",
        "owner": "urn:li:corpuser:datahub"
      }],
      "lastModified": {
        "actor": "urn:li:corpuser:jdoe",
        "time": 1581407189000
      }
    },
    "origin": "PROD",
    "name": "SampleKafkaDataset",
    "institutionalMemory": {
      "elements": [{
        "createStamp": {
          "actor": "urn:li:corpuser:jdoe",
          "time": 1581407189000
        },
        "description": "Sample doc",
        "url": "https://www.linkedin.com"
      }]
    },
    "schemaMetadata": {
      "platformSchema": {
        "com.linkedin.schema.KafkaSchema": {
          "documentSchema": "{\"type\":\"record\",\"name\":\"SampleKafkaSchema\",\"namespace\":\"com.linkedin.dataset\",\"doc\":\"Sample Kafka dataset\",\"fields\":[{\"name\":\"field_foo\",\"type\":[\"string\"]},{\"name\":\"field_bar\",\"type\":[\"boolean\"]}]}"
        }
      },
      "created": {
        "actor": "urn:li:corpuser:jdoe",
        "time": 1581407189000
      },
      "lastModified": {
        "actor": "urn:li:corpuser:jdoe",
        "time": 1581407189000
      },
      "fields": [{
        "fieldPath": "field_foo",
        "description": "Foo field description",
        "type": {
          "type": {
            "com.linkedin.schema.StringType": {}
          }
        },
        "nullable": false,
        "recursive": false,
        "nativeDataType": "string"
      }, {
        "fieldPath": "field_bar",
        "description": "Bar field description",
        "type": {
          "type": {
            "com.linkedin.schema.BooleanType": {}
          }
        },
        "nullable": false,
        "recursive": false,
        "nativeDataType": "boolean"
      }],
      "schemaName": "SampleKafkaSchema",
      "version": 0,
      "platform": "urn:li:dataPlatform:kafka",
      "hash": ""
    },
    "platform": "urn:li:dataPlatform:kafka"
  }],
  "searchResultMetadatas": [{
    "name": "platform",
    "aggregations": {
      "hive": 1,
      "hdfs": 1,
      "kafka": 1
    }
  }, {
    "name": "origin",
    "aggregations": {
      "prod": 3
    }
  }]
}

@jywadhwani
Copy link
Contributor Author

Tested the mid-tier change particularly DatasetUtil change by ingesting Status aspect with removed=true for a dataset and ensuring both search response and the response for api/v2/datasets/<urn> returns the correct removed field.

Ingestion
curl 'http://localhost:8080/datasets?action=ingest' -X POST -H 'X-RestLi-Protocol-Version:2.0.0' --data '{ "snapshot": { "aspects": [ { "com.linkedin.common.Status": { "removed": true } } ], "urn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)" } }'

Response for request /api/v2/datasets/urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD) reflects the correct removed field
{"platform":"hdfs","nativeName":"SampleHdfsDataset","fabric":"PROD","uri":"urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)","description":"","nativeType":null,"properties":null,"tags":[],"removed":true,"deprecated":null,"deprecationNote":null,"decommissionTime":null,"createdTime":null,"modifiedTime":null,"customProperties":null}

@mars-lan mars-lan merged commit 4bfcb4b into datahub-project:master Oct 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hacktoberfest-accepted Acceptance for hacktoberfest https://hacktoberfest.com/participation/
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants