Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge from RC into stable for DataX release v1.2 #155

Merged
merged 79 commits into from
Oct 8, 2019

Conversation

kjcho-msft
Copy link
Contributor

No description provided.

rohit489 and others added 30 commits April 15, 2019 23:57
Data Accelerator whitepaper
Adding Introductory Whitepaper to complement the Architecture Whitepaper
Add GitHub links to each web package
* remove eval when fethcing the environment variable

* add check to see if localServices are defined in webComposition
s-tuli and others added 24 commits May 15, 2019 17:49
* rev maven package version

* update few more files

* remove filter to include only jar
… Service. This code also adds in the required DockerFile and finalrun.sh for creating the docker container for Kubernetes scneario
* NPOT: add Kafka support

* NPOT: add Kafka support

* update to support the native Kafka

* change AutoOffsetReset to Latest for kafka sampling

* Move the hard-coded cacert source to the keyvault

* update not to create consumer groups for EventHub Kafka
* Enable http post functions and keyvault retrival fix.

* Take function baseUrl instead of host name in config. Also updated pom files.

* Add null checks for function params
…nd update labels for Azure Function (#64)

* remove subscription field for kafka input and update azure function labels

* remove subscription field for kafka input and update azure function labels

* remove subscription field for kafka input and update azure function labels

* update the pepeline version in the website package.json
* Support for EventHub for Kafka as input

* Eventhub for kakfa input changes

* Support for native kafka

* Support for native kafka

* kafka producer

* Updated pom file and few minor updates to comments.

* Add license header to new files.

* pom file updates to revert spark version.

* Adding more headers

* Add comments to new files.

* Revert Sprk 2.4 signature change for UDF.

* ignore codesign

* Define constants and convert return null to throwing exception.
S tuli/stateless: Converting the DataX.Metrics stateful ServiceFabric app to be stateless ServiceFabric app. Tested it by deploying to a test cluster. Unit tests are passing.
…to include them (#69)

* Add missing dependencies and update nuspec to include them

* Add missing dependencies and update nuspec to include them
* add rel="noopener noreferrer" to links to prevent Reverse Tabnabbing

* rev versions of all packages
* ARM deployment: add kafka support

* update based on feedback

* update proton version for kafkaDataXDirect template which is missing from a previous commit
Remove snapshots from Spark.nuspec
* Adding a configurationsource for use in ASP

* Adding dependencies

* Fixing name

* Modifying for new settings and optional SF

* Added Configuration parameter to StartUpUtil, not being used yet

* Moved service fabric configuration to servicehost; added settings constants; modified startuputil for DataXSettings

* Reducing surface area of SF config calls

* Adding SF to config builder

* Correcting errorneous default value

* Runs and apis are hittable

* Removing dev strings

* Adding gitignore entry

* Modifying dev default auth connection string

* Modifying launch api to simple get call

* Switching connection string

* Adding onebox

* Uppdating dev settings

* Modified the sf configuration source to better conform to customizable settings; began work on local scenario

* Using constant instead of literal

* Enabling more of the local scenario

* Modified mef extension to generate ILoggers for classes automatically

* Updating settings for onebox scenario

* Adding authentication

* Begin adding auth attribute; Not working yet

* Adding auth scenario; needs cleaning

* Disable RequireAuthenticatedUser if in OneBox mode

* Revert "Disable RequireAuthenticatedUser if in OneBox mode"

This reverts commit 547f49b.

* Removing authentication requirement from OneBox scenario

* Temporary: modified for rerouteService scenario

* Minor code improvements

* Removing unnecessary project dependency

* Added a gateway policy for compatibility in SF auth scenario; Moved some auth classes to be better accessed; RoleCheck changed to consider SF environment

* Cleanup

* Reverting auto changes

* Renaming appinsights setting to name in SF

* Modified gateway policy for fix in SF

* Updating SF xmls with EnableOneBox

* Added varying documentation by request

* Refactored settings into contract; added todos for consolidation

* Some extensibility improvements

* Reconfigured to be IStartupFilter instead of a simple startup

* Tylake/configuration (#51)

* Adding a configurationsource for use in ASP

* Adding dependencies

* Fixing name

* Modifying for new settings and optional SF

* Added Configuration parameter to StartUpUtil, not being used yet

* Moved service fabric configuration to servicehost; added settings constants; modified startuputil for DataXSettings

* Reducing surface area of SF config calls

* Adding SF to config builder

* Correcting errorneous default value

* Runs and apis are hittable

* Removing dev strings

* Adding gitignore entry

* Modifying dev default auth connection string

* Modifying launch api to simple get call

* Switching connection string

* Adding onebox

* Uppdating dev settings

* Modified the sf configuration source to better conform to customizable settings; began work on local scenario

* Using constant instead of literal

* Enabling more of the local scenario

* Modified mef extension to generate ILoggers for classes automatically

* Updating settings for onebox scenario

* Adding authentication

* Begin adding auth attribute; Not working yet

* Adding auth scenario; needs cleaning

* Disable RequireAuthenticatedUser if in OneBox mode

* Revert "Disable RequireAuthenticatedUser if in OneBox mode"

This reverts commit 547f49b.

* Removing authentication requirement from OneBox scenario

* Temporary: modified for rerouteService scenario

* Minor code improvements

* Removing unnecessary project dependency

* Added a gateway policy for compatibility in SF auth scenario; Moved some auth classes to be better accessed; RoleCheck changed to consider SF environment

* Cleanup

* Reverting auto changes

* Renaming appinsights setting to name in SF

* Modified gateway policy for fix in SF

* Updating SF xmls with EnableOneBox

* Added varying documentation by request

* Adding onebox to settings

* Renaming based on feedback

* Improved startup experience

* Added documentation; removed original startup for Flow

* Converting startup and settings

* Updating appsettings

* Modifying startups

* Adding auth attributes

* Fixing usings

* Deleting old settings classes

* Simplifying call

* Adding EnableOneBox setting for SF

* Added comments by request

* Removing inaccurate comment

* Missed override

* Fixing the backend services post the refactoring for the various startups in Datax.Flow. Basically, updating the AppSettings.json, appsettings.Development.json and adding the app.UseAuthentication(); in the DataXServiceStartup.cs and tweaking the way we are getting the settings in StartUpUtil.cs

* Removed the environment variable value from the Services/DataX.Flow/Flow.SchemaInferenceService/Properties/launchSettings.json

* reverting the typo from /Flow.InteractiveQueryService/appsettings.Development.json

* 1. Adding IConfiguration in each of the controller such that the appSettings properties/parameters are available for the EngineEnvrionment call. Passing this IConfiguration to other classes as needed. 2. Fixing DataX.Flow.sln since it was giving an error previously as one of the projects was missing from this solution

* Remove the auth connecting string

* Adding support for an optional secret for setting the json object which is a key value pair for each servicename and the ip address where the service can be listened at.

* Enable batch processing from blob input

* Batch processing from blob input

* Adding the json parsing logic in securedSettings.js for the optional secret for kubernetes services

* Updating readme.md such that customers know how to listen to services deployed on the AKS cluster.

* Adding a few checks for when the Kubernetes secret may not be present altogether

* Updating the comments and readme along with some of the checks

* Making metrics.sln compile by adding a missing project. Also fixing a merge conflict.

* Adding the missing project DataX.ServiceHost to DataX.Gateway.sln

* Nuget Restore on DataX.Gateway.sln fixed

* Adding the licensing header for some the newly added files since it was missing. Making a few tweaks based on PR feedback

* Adding EnableOneBox parameter under ConfigOverrides for DataX.Flow. This is needed for ServiceFabric.

* Updating FinalRun.sh and appsettings.json for Flow.ManagementService to enable the OneBox Scenario

* Removing the local values for LocalRoot and SparkHome 'cause this will vary depending upon the environment.

* Fixing the signing issue related to DataX.ServiceHost project and removing the folders DataX.ServiceHost and SolutionItems from DataX.Flow solution

* Adding the Microbuild NuGet package to DataX.ServiceHost project

* Add sql output

* - Refactor output manager to enable sync'ing non-json data
- Add SQL server output

* Kjcho/revert (#77)

* Revert "- Refactor output manager to enable sync'ing non-json data"

This reverts commit 3d3c546.

* Revert "Add sql output"

This reverts commit d143bef.

* Rev spark version to 2.4 and add suport for secret scope (#79)

* Rev scala version to 2.4 and add suport for secret scope

* Synchronize calls to dbutils

* change artifact id to 2.4

* rev dependencies version

* ARM: remove snapshot from the templates to unblock ARM deployment from master branch (#82)

* Enable databricks support on Web (#80)

* Updating the website code to extract out the Query package named dataX-query. This datax-query package content used to be part of the package datax-pipeline. The reason that this package is being extracted out is such that this package can be used by othercustomers who do not want a dependency on the datax-pipeline package.

* Updating the versions for all the packages

* cleaning the code and removing the comments

* Updating the package.json for datax-common since we don't require jsoneditor and monaco editor packages

* Refactored the Dockerfile and finalrun.sh such that CICD can be enabled easily by passing in the service name as parameter. Also adding the yaml files for each service which will need the parameters to be passed in prior to deploying the service to the Kubernetes cluster (just as we do for when deploying to the service fabric cluster.)

* Making a few tweaks: updating all files to be LF instead of CRLF. Adding quotes for servicedllname and adding a new line for each of the Dockerfile and finalrun.sh

* Adding a Helper function ConvertFlowToQueryMetadata. This creates the object that contains all the parameters as needed by datax-query package. Also cleaning up the code a bit and addressing PR feedback.

* Adding comment header for the new fucntion: ConvertFlowToQueryMetadata

* Removing the dupe style under datax-pipeline.

* Refactor OutputManager to configure outputing non-json data easily (#78)

Add SqlServer output

* SQL Output: UI and config gen service updates (#86)

* SQL Output: UI and config gen service updates

* Minor updates to UI based on review feedback.

* Flatterner template update and minor UI tweaks

* Update package version.

* Fix a typo

* Fixing a few bugs I found while testing: The query was not getting updated when calling codegen in the UI and The deploy button was not getting enabled when the query was dirty

* Fixing a few bugs I found while testing: The query was not getting updated when calling codegen in the UI and The deploy button was not getting enabled when the query was dirty

* Removing the redundant code from datax-pipeline. Removing the term flow for datax-query.

* Rev'ing the package versions for each of the packages

* Fix pom for datax-host (#88)

* Removing some redundant code and calling into the QueryActions initQuery function

* Updating the version of packages in package.json for datax-home

* Removing the style.css import from website to datax-pipleline

* Fixing the memory heap issue because monacoeditor was being imported twice for the datax-pipeline package. The solution is to create a common control MonacoEditorControl that can be consumed by both datax-query and by datax-pipeline package. This commit also removes the need to include monaco editor and jsoneditor in other packages.

* Rev'ing the version of all website packages

* Adding react-monaco-editor and the Monaco Editor plugin to peerDependencies in package.json for datax-query and datax-pipeline

* Enable batch processing of blob input (#90)

* Updating the package versions. Updating the code to use MonacoEditorControl as defined in datax-query package. Removing the dependency on react-monaco-editor and the plugin in datax-pipeline

* pass in databricks token for live query (#92)

* Databricks support in services (#87)

* Databricks support in services

* Fix autoscale and use flow specific token to send requests to databricks

* Fix live query

* refactor uriPrefix to be handled by keyVaultClient

* Codesign DatabricksClient (#93)

* Fixing and adding a check for Databricks vs HDInsight when saving and resolving the spark job params

* Migrate to latest jackson.core databind artifact.

* Updating all projects to use .Net Core 2.2 to resolve the component governance issue

* Updating a few more references and NuGets to 2.2

* Updating LivyClient.Test project as well

* Updating Gatewaycloud.xml

* reverting the signing unintentional change

* Update resource files to have unit tests pass again.

* Adding support for reading the white listed ClientID AAD app from KeyVault

* Adding the Whitelisting logic to RolesCheck for the ScenarioTester

* Renaming the helper and tweaking th logic

* more tweaks to the code

* Removing the redundant project dependencies in DataX.Utilities.Web

* Adding the paramter in appsettings.json as well. This will be useful for when we add support for kubernetes. Addressing some PR feedback: Adding header and renaming the helper method that adds the whitelisted client user id for testing purposes.

* Updating the white listed clientId value and the code for handling a list of whitelisted clientIds which would essentially be of the format {objectIdentifier}.{tenantId} such that it is unique

* Updating the SimulatedData service to .net core 2.2

* provide a scenario tester to run through actions on a host in sequence and parallelly. Enables creating simulated test loads.

* fixed namespaces and nuspec as per PR feedback

* Adding dependency for NewtonSoft.Json in the nuspec

* Adding the signing requirements and updating .nuspec for the ScenarioTester such that it is packaged with its own NuGet dependencies.

* bug fixes: spark nuspec, iothub sku, simulator service num events (#102)

* Fix unit tests (#104)

* Fix unit tests

* make sparkType property optional, add end to end test for databricks, fix config.local test

* Flow service: Add Blob support (#89)

* The flow service: Add Blob support

* The flow service: Add Blob support

* move the kv secret resolution from the FlattenJobConfig to GenerateJobConfigBatching

* add tests for batch and update based on feedback

* enable sign for the new projects

* merge w/ master

* updated based on feedback

* update based on feedback

* fixing tests

* fixing tests

* fixing tests

* clean up code

* For databricks access azure storage account using account key from keyvault (#105)

* For databricks access azure storage account using account key from keyvault

* added comment

* remove fileUrl as global variable

* Refactor DataX.Flow to support batch scenarios better (#106)

* refactor DataX.Flow to support batch better

* refactor DataX.Flow to support batch better

* Updated based on feedback

* update based on feedback

* Web: add batching support (#91)

* initial commit for Blob input support

* initial commit for Blob input support

* Web: add batch support

* update the logic for save buttton and the lable for schedule tab

* use a datetime picker control for starttime and endtime

* update based on feedback

* merge with master and clean up code

* update the version for all packages

* Update the microbuild signing cert to sha2 (#107)

* Update the microbuild signing cert to sha2

* Update the microbuild signing cert to sha2

* ARM: assign the writer role to the service AAD app (#109)

* ARM: assign the writer role to the service AAD app

* Refactor Set-AzureAADApiPermission function not to pass the roleId as a param

* Update to use .netcore 2.2 and aspnetcore to 2.2.6 (#111)

* Update .netcore 2.2 and aspnetcore to 2.2.6

* update netcoreapp version in nuspec

* Migrate to latest databind component. (#113)

* Fix the way to read the blob for GetSchema feature (#112)

* Fix the way to read the blob for GetSchema feature

* Fix the way to read the blob for GetSchema feature

* update based on feedback

* add header comments and update based on feedback

* add one more test for this pattern: {yyyy/MM/dd}

* update based on feedback

* update based on feedback

* update datax spark jar version (#114)

* ARM support for databricks (#108)

* ARM support for databricks

* PR feedback

* create databricks in existing vnet, update hdinsight kafka zookeeper vmsize to standard_a4_v2, add port 443 rule for hdinsight

* Update GetSampleEvents for Kafka to run asynchronously (#115)

* Add null checks for guiConfig() as some custom configs don't have a gui section (#116)

* add null checks for guiConfig() before we use it as not all configs have a gui section

* add null checks for guiConfig() before we use it as not all configs have a gui section

* clean up code

* updated the sample data for EndToEndGenerationCustom test

* For databricks livequery mount storage account container (#118)

* For databricks livequery mount storage account container

* PR feedback

* add license header and move methods to helper

* Add code coverage options for gathering CC results in validation (#119)

* Add stop and get jobs unit tests

* update to test

* add coverage settings

* fix path in cc

* add livy test project

* merge solution

* Feedback and revert .sln file

* Change parameters for MountStorage method to only use the required properties instead of complete flowConfigObject (#123)

* Update proj to force pdb and exclude other tests binaries (#121)

* avoid creation of datax-host-with-dependency jar (#125)

* reset query each time a new flow is opened (#129)

* Enable deploy button even after saving the flow (#130)

* Remove restriction to deploy if Flow has been saved.

Remove restriction to deploy if file has already been saved.

* SNAPSHOT-

* Rev package and dependencies

* Update the settings name for sql output and fail fast if null. (#131)

* Update the settings name and fail fast if null.

* Update to use the right get funtion

* fix live query involving udf in databricks and add dependency jars for various outputs  (#133)

* fix live query involving udf in databricks and add dependency jars for various outputs

* PR feedback

* Adding JobRunner Service and the first DataX mainline job that calls … (#127)

* Adding JobRunner Service and the first DataX mainline job that calls into the ScenarioTester. All sensitive info is in the KeyVault

* Refactoring the storage utility files to be part of DataX.Utility.Blob project and updating the gitignore file to include the appsettings.Development.json

* Removing duplicate code for InstanceExportDescriptorProvider. Creating a new utility project: DataX.Utility.Composition.

* Updating the method GetExportDescriptors. Also updating the namespace for the storage utility classes.

* Adding signing for DataX.Utilities.Composition project (#137)

* return job status after job has been stopped (#136)

* return job status after job has been stopped

* add unit tests and add max retries to fetch job state when job is in process of termination

* fix bugs-metrics dashboard and switching mode, and also enable scro… (#132)

* fix bugs-metrics dashboard and switching mode, and also enabling scroll for job page

* updated the package versions

* update based on feedback

* added some unit tests for the helper functions in ConfigDeleter API

* Enable scrollbar for Input panel

* Databricks fix output to blobs (#138)

* Databricks fix output to blobs

* PR feedback

* create function to create broadcast variable and change return type of resolveStorageAccount method to option[string]

* create a method to set Storage Account Key On Hadoop Conf

* Fix batch job for databricks (#141)

* Fix batch job for databricks

* move createBlobStorageKeyBroadcastVariable to its own class, refactor resolve storage account key retrieval, add new default storage account environment variable

* rename file

* PR feedback

* ARM: for the sample deployment script, pass the servicefabric cluster name to the utility module (#142)

* Enable rerunning jobs that were previously in error state (#143)

* Enable rerunning jobs that were previously in error state

* Add unit test

* - Do a better job of handling the case where there is no batch job to deploy (#144)

- Clean up job names for the samples as for a job which hasn't been deployed, it should be null

* Change bulkInsert UI flag data type to bool (#147)

* Change bulkInsert UI flag data type to bool

* Change UseBulkInsert datatype is sql output model

* Make the bool nullable since its optional.

* Set internal transaction to false by default for bulk insert, The API we are using doesn't accept this to be set to true. (#146)

* Adding steps for a new JobRunner job calling into ScenarioTester.  (#145)

* Adding steps for a new JobRunner job calling into ScenarioTester. These steps are essentially calling into and testing the apis within InteractiveQueryService, LiveDataService and  SchemaGeneratorService. Also adding support for running the JobRunner on both DataBricks and HDInsight clusters.

* Updating the code per PR feedback. Essentially updating one of the parameters' names. Removing some redundant code

* Refactoring the code a little to create a helper class and a helper mehtod for constructing Initialization Kernel json object

* Remove databricks token from APIs (#149)

* Remove databricks token from APIs. Create new save button for databricks token. Fix delete kernel API

* PR feedback

* Enable databricks/HDInsight env validation check for save button

* remove isDatabricksSparkType state from flowDefinitionPanel

* create secretScopePrefix constant

* Extract status code and error message from response string

* add try catch

* update jackson bit (#150)

* Fetch value from promise of isDatabricksSparkType (#151)

* Fetch value from promise of isDatabricksSparkType

* update package version

* PR feedback

* For databricks by default disable autoscale (#152)

* updated the version datax packages
Copy link
Contributor

@carlbrochu carlbrochu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@kjcho-msft kjcho-msft merged commit c692787 into stable Oct 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.