Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export with history and soft deletes #3519

Merged
merged 67 commits into from
Nov 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
43b972a
Added initial logic for Cosmos history/delete in $export
mikaelweave Sep 7, 2023
9468cf1
Fixed build issue with SQL export orchestrator
mikaelweave Sep 7, 2023
cba2aa1
Fixed cosmos export bugs found via manual testing
mikaelweave Sep 7, 2023
6f6e1d2
Added soft delete extension to export
mikaelweave Sep 8, 2023
2f2bd6b
Add first pass on Export E2E Tests
mikaelweave Sep 8, 2023
0e3f95e
Updated export E2E tests to use test data from fixture.
mikaelweave Sep 11, 2023
6809ccf
Updated long export tests for individual history/delete scenarios
mikaelweave Sep 11, 2023
2762608
Updated gitignore to ignore azurite temp files
mikaelweave Sep 11, 2023
ab207e4
Small azure storage explorer / azurite comment update for export tests
mikaelweave Sep 11, 2023
3c1970c
Fixed ExportJobTaskTests to not rely on position of queryParameterList
mikaelweave Sep 11, 2023
c713863
Added ExportJobTask unit test for history/delete
mikaelweave Sep 11, 2023
a2b6c44
Removing changed settings.json file
mikaelweave Sep 12, 2023
d139419
Added historical and soft delete export to rest file
mikaelweave Sep 12, 2023
bc04057
Added import of soft deleted resources
mikaelweave Sep 15, 2023
ac395ed
Updates per PR review
mikaelweave Sep 20, 2023
eba2976
Removed import changes - going to another PR
mikaelweave Sep 20, 2023
59b0d05
Removed launch.json changes accidentally committed
mikaelweave Sep 20, 2023
60b4fbc
Added initial SQL export with soft delete / history
mikaelweave Sep 20, 2023
e40516d
oopsies undo change to importresourceprocessor breaking tests
mikaelweave Sep 21, 2023
f6c0538
Fixed logical error in new SQL historical search
mikaelweave Sep 21, 2023
5d4b5fd
Added searchoptionsfactory test for include history/deleted
mikaelweave Sep 21, 2023
ba97d88
Fixed failing export test failure.
mikaelweave Sep 27, 2023
87d7dc6
Removed long running export flag to see if it runs in pipeline
mikaelweave Sep 27, 2023
8391445
restructured export test location for readability
mikaelweave Sep 28, 2023
ab46e87
Fixed export issues found in testing
mikaelweave Sep 29, 2023
b81d871
Changed SQL script version for merge
mikaelweave Sep 29, 2023
2403097
Merge branch 'main' into feature/export/include-history-soft-delete
mikaelweave Sep 29, 2023
da05a52
Updated SQL schema version
mikaelweave Sep 29, 2023
4708e4b
Code style cleanup
mikaelweave Sep 29, 2023
ccc95cb
Updated export history/deleted query params
mikaelweave Oct 5, 2023
224d153
Fixed export included data test
mikaelweave Oct 5, 2023
8cb91ac
Changed SQL exporter to use export configuration vs magic numbers in …
mikaelweave Oct 6, 2023
b156374
Rolling back unneeded SQL changes
mikaelweave Oct 12, 2023
09c8d2f
Merge branch 'feature/export/include-history-soft-delete' of github.c…
mikaelweave Oct 12, 2023
7233c58
Fixed parallel export with history/soft delete
mikaelweave Oct 12, 2023
db0b66f
Updated export E2E tests for parallel export multi-job
mikaelweave Oct 12, 2023
51ad4aa
Merge branch 'main' into feature/export/include-history-soft-delete
mikaelweave Oct 12, 2023
d8c9d60
fixed merge regression
mikaelweave Oct 13, 2023
b53813e
merged with main
mikaelweave Oct 16, 2023
4246319
Removed unnecesary usings
mikaelweave Oct 16, 2023
7ec234a
Fixed tx issue in export data tests
mikaelweave Oct 17, 2023
848bc12
testing central export perf
mikaelweave Oct 17, 2023
1d66c1b
Rolled back central execution of export tests
mikaelweave Oct 17, 2023
7c060f7
Merge branch 'main' into feature/export/include-history-soft-delete
mikaelweave Oct 18, 2023
c26393b
Removing exportlongrunning for pipeline perf test
mikaelweave Oct 18, 2023
8726ebc
Fixed build
mikaelweave Oct 18, 2023
6f3550a
Optimized test structure
mikaelweave Oct 18, 2023
d9e512e
Merge branch 'main' into feature/export/include-history-soft-delete
mikaelweave Oct 30, 2023
515e49b
merged from main
mikaelweave Oct 31, 2023
4a2b539
Updated schema version to iterate after merge
mikaelweave Oct 31, 2023
61e7efa
Merge branch 'main' into feature/export/include-history-soft-delete
mikaelweave Oct 31, 2023
17fe214
decoupled search options from export
mikaelweave Nov 6, 2023
17fcf96
Merge branch 'feature/export/include-history-soft-delete' of github.c…
mikaelweave Nov 6, 2023
f5be1d5
Fixed test issue
mikaelweave Nov 6, 2023
24412eb
Removing NA SQL comment
mikaelweave Nov 7, 2023
c6401d4
PR comments on tests fix
mikaelweave Nov 7, 2023
f0fb0df
"un-fancified" ResourceTypeVersion enum
mikaelweave Nov 7, 2023
b43b447
Merge branch 'main' into feature/export/include-history-soft-delete
mikaelweave Nov 7, 2023
846660c
Fix STU3 build error
mikaelweave Nov 7, 2023
f9709c6
Fixed unit tests that no longer use export/history query params
mikaelweave Nov 7, 2023
de16f08
fixed test mistake for resourcetypeversion
mikaelweave Nov 7, 2023
6adfe70
Fixed cosmos export search functoin
mikaelweave Nov 8, 2023
228a425
Fixed export includeAssociatedData error message
mikaelweave Nov 8, 2023
b295ca5
Default ResourceVersionType to latest
mikaelweave Nov 8, 2023
7731f6f
Fixed history search - added deleted
mikaelweave Nov 8, 2023
8fb1a5f
fix export count issue?
mikaelweave Nov 9, 2023
c44bdda
small code cleanups
mikaelweave Nov 9, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/rest/ExportRequests.http
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,13 @@ Accept: application/fhir+json
Prefer: respond-async
Authorization: Bearer {{bearer.response.body.access_token}}

### Export with history and soft deleted records
# @name export
GET https://{{hostname}}/$export?includeAssociatedData=_history,_deleted
Accept: application/fhir+json
Prefer: respond-async
Authorization: Bearer {{bearer.response.body.access_token}}

### Get Export request
GET {{exportLocation}}
Authorization: Bearer {{bearer.response.body.access_token}}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ public ValidateExportRequestFilterAttribute()
KnownQueryParameterNames.Format,
KnownQueryParameterNames.TypeFilter,
KnownQueryParameterNames.IsParallel,
KnownQueryParameterNames.IncludeAssociatedData,
KnownQueryParameterNames.MaxCount,
KnownQueryParameterNames.AnonymizationConfigurationCollectionReference,
KnownQueryParameterNames.AnonymizationConfigurationLocation,
Expand Down
18 changes: 18 additions & 0 deletions src/Microsoft.Health.Fhir.Api/Resources.Designer.cs

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 7 additions & 1 deletion src/Microsoft.Health.Fhir.Api/Resources.resx
Original file line number Diff line number Diff line change
Expand Up @@ -408,4 +408,10 @@
<value>Invalid combination of processing logic and bundle type: {0} and {1}.</value>
<comment>Error message when there is a invalid/unknown combination of a bundle type and a processing logic.</comment>
</data>
</root>
<data name="TypeFilterNotSupportedWithHistoryOrDeletedExport" xml:space="preserve">
<value>The request "_typeFilter" cannot be used with an export request with historical or soft deleted resources.</value>
</data>
<data name="InvalidExportAssociatedDataParameter" xml:space="preserve">
<value>The export parameter "includeAssociatedData" contains an invalid value. Supported values are: {0}. </value>
</data>
</root>
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
using System;
using System.Collections.Generic;
using System.Globalization;
using System.Linq;
using System.Linq.Expressions;
using System.Net;
using System.Threading;
Expand All @@ -20,6 +21,7 @@
using Microsoft.Health.Extensions.DependencyInjection;
using Microsoft.Health.Fhir.Core.Configs;
using Microsoft.Health.Fhir.Core.Exceptions;
using Microsoft.Health.Fhir.Core.Features;
using Microsoft.Health.Fhir.Core.Features.Context;
using Microsoft.Health.Fhir.Core.Features.Operations;
using Microsoft.Health.Fhir.Core.Features.Operations.Export;
Expand Down Expand Up @@ -147,7 +149,8 @@ public async Task GivenThereAreTwoPagesOfSearchResults_WhenExecuted_ThenCorrectS
null,
Arg.Is(CreateQueryParametersExpression(KnownResourceTypes.Patient)),
_cancellationToken,
true)
true,
ResourceVersionType.Latest)
.Returns(CreateSearchResult(continuationToken: continuationToken));

bool capturedSearch = false;
Expand All @@ -157,7 +160,8 @@ public async Task GivenThereAreTwoPagesOfSearchResults_WhenExecuted_ThenCorrectS
null,
Arg.Is(CreateQueryParametersExpressionWithContinuationToken(ContinuationTokenConverter.Encode(continuationToken), KnownResourceTypes.Patient)),
_cancellationToken,
true)
true,
ResourceVersionType.Latest)
.Returns(x =>
{
capturedSearch = true;
Expand Down Expand Up @@ -321,40 +325,70 @@ public async Task GivenThereAreMultiplePagesOfSearchResultsWithSinceParameter_Wh
Assert.True(secondCapturedSearch);
}

private Expression<Predicate<IReadOnlyList<Tuple<string, string>>>> CreateQueryParametersExpression(string resourceType)
[Fact]
public async Task GivenAnExportJobWithHistoryAndSoftDeletes_WhenExecuted_ThenAllResourcesAreExportedToTheProperLocation()
{
bool capturedSearch = false;

var exportJobRecordIncludeHistory = CreateExportJobRecord(
exportJobType: ExportJobType.Patient,
includeHistory: true,
includeDeleted: true,
maximumNumberOfResourcesPerQuery: 1);
SetupExportJobRecordAndOperationDataStore(exportJobRecordIncludeHistory);

_searchService.SearchAsync(
null,
Arg.Is(CreateQueryParametersExpression(KnownResourceTypes.Patient, includeHistory: true, includeDeleted: true)),
_cancellationToken,
true,
ResourceVersionType.Latest | ResourceVersionType.Histoy | ResourceVersionType.SoftDeleted)
.Returns(x =>
{
capturedSearch = true;

return CreateSearchResult();
});

await _exportJobTask.ExecuteAsync(_exportJobRecord, _weakETag, _cancellationToken);

Assert.True(capturedSearch);
}

private Expression<Predicate<IReadOnlyList<Tuple<string, string>>>> CreateQueryParametersExpression(string resourceType, bool includeHistory = false, bool includeDeleted = false)
{
return arg => arg != null &&
Tuple.Create("_count", "1").Equals(arg[0]) &&
Tuple.Create("_lastUpdated", $"le{_exportJobRecord.Till}").Equals(arg[1]) &&
Tuple.Create("_type", resourceType).Equals(arg[2]);
arg.Any(x => x.Item1 == "_count" && x.Item2 == "1") &&
arg.Any(x => x.Item1 == "_lastUpdated" && x.Item2 == $"le{_exportJobRecord.Till}") &&
arg.Any(x => x.Item1 == "_type" && x.Item2 == resourceType);
}

private Expression<Predicate<IReadOnlyList<Tuple<string, string>>>> CreateQueryParametersExpression(PartialDateTime since, string resourceType)
{
return arg => arg != null &&
Tuple.Create("_count", "1").Equals(arg[0]) &&
Tuple.Create("_lastUpdated", $"le{_exportJobRecord.Till}").Equals(arg[1]) &&
Tuple.Create("_lastUpdated", $"ge{since}").Equals(arg[2]) &&
Tuple.Create("_type", resourceType).Equals(arg[3]);
arg.Any(x => x.Item1 == "_count" && x.Item2 == "1") &&
arg.Any(x => x.Item1 == "_lastUpdated" && x.Item2 == $"le{_exportJobRecord.Till}") &&
arg.Any(x => x.Item1 == "_lastUpdated" && x.Item2 == $"ge{since}") &&
arg.Any(x => x.Item1 == "_type" && x.Item2 == resourceType);
}

private Expression<Predicate<IReadOnlyList<Tuple<string, string>>>> CreateQueryParametersExpressionWithContinuationToken(string continuationToken, string resourceType)
{
return arg => arg != null &&
Tuple.Create("_count", "1").Equals(arg[0]) &&
Tuple.Create("_lastUpdated", $"le{_exportJobRecord.Till}").Equals(arg[1]) &&
Tuple.Create("_type", resourceType).Equals(arg[2]) &&
Tuple.Create("ct", continuationToken).Equals(arg[3]);
arg.Any(x => x.Item1 == "_count" && x.Item2 == "1") &&
arg.Any(x => x.Item1 == "_lastUpdated" && x.Item2 == $"le{_exportJobRecord.Till}") &&
arg.Any(x => x.Item1 == "_type" && x.Item2 == resourceType) &&
arg.Any(x => x.Item1 == "ct" && x.Item2 == continuationToken);
}

private Expression<Predicate<IReadOnlyList<Tuple<string, string>>>> CreateQueryParametersExpressionWithContinuationToken(string continuationToken, PartialDateTime since, string resourceType)
{
return arg => arg != null &&
Tuple.Create("_count", "1").Equals(arg[0]) &&
Tuple.Create("_lastUpdated", $"le{_exportJobRecord.Till}").Equals(arg[1]) &&
Tuple.Create("_lastUpdated", $"ge{since}").Equals(arg[2]) &&
Tuple.Create("_type", resourceType).Equals(arg[3]) &&
Tuple.Create("ct", continuationToken).Equals(arg[4]);
arg.Any(x => x.Item1 == "_count" && x.Item2 == "1") &&
arg.Any(x => x.Item1 == "_lastUpdated" && x.Item2 == $"le{_exportJobRecord.Till}") &&
arg.Any(x => x.Item1 == "_lastUpdated" && x.Item2 == $"ge{since}") &&
arg.Any(x => x.Item1 == "_type" && x.Item2 == resourceType) &&
arg.Any(x => x.Item1 == "ct" && x.Item2 == continuationToken);
}

[Fact]
Expand Down Expand Up @@ -834,7 +868,9 @@ public async Task GivenAnExportJobWithTheTypeParameter_WhenExecuted_ThenOnlyReso
true)
.Returns(x =>
{
string[] types = x.ArgAt<IReadOnlyList<Tuple<string, string>>>(1)[3].Item2.Split(',');
string[] types = x.ArgAt<IReadOnlyList<Tuple<string, string>>>(1)
.Where(x => x.Item1 == KnownQueryParameterNames.Type)
.Select(x => x.Item2).First().Split(',');
SearchResultEntry[] entries = new SearchResultEntry[types.Length];

for (int index = 0; index < types.Length; index++)
Expand Down Expand Up @@ -889,7 +925,9 @@ public async Task GivenAPatientExportJobWithTheTypeParameter_WhenExecuted_ThenOn
true)
.Returns(x =>
{
string[] types = x.ArgAt<IReadOnlyList<Tuple<string, string>>>(3)[3].Item2.Split(',');
string[] types = x.ArgAt<IReadOnlyList<Tuple<string, string>>>(3)
.Where(x => x.Item1 == KnownQueryParameterNames.Type)
.Select(x => x.Item2).First().Split(',');
SearchResultEntry[] entries = new SearchResultEntry[types.Length];

for (int index = 0; index < types.Length; index++)
Expand Down Expand Up @@ -1107,7 +1145,9 @@ public async Task GivenAGroupExportJob_WhenExecuted_ThenAllPatientResourcesInThe
true)
.Returns(x =>
{
string[] ids = x.ArgAt<IReadOnlyList<Tuple<string, string>>>(1)[2].Item2.Split(',');
string[] ids = x.ArgAt<IReadOnlyList<Tuple<string, string>>>(1)
.Where(x => x.Item1 == Core.Features.KnownQueryParameterNames.Id)
.Select(x => x.Item2).First().Split(',');
SearchResultEntry[] entries = new SearchResultEntry[ids.Length];

for (int index = 0; index < ids.Length; index++)
Expand Down Expand Up @@ -1175,7 +1215,9 @@ public async Task GivenAGroupExportJobWithMultiplePagesOfPatients_WhenExecuted_T
true)
.Returns(x =>
{
string[] ids = x.ArgAt<IReadOnlyList<Tuple<string, string>>>(1)[2].Item2.Split(',');
string[] ids = x.ArgAt<IReadOnlyList<Tuple<string, string>>>(1)
.Where(x => x.Item1 == KnownQueryParameterNames.Id)
.Select(x => x.Item2).First().Split(',');

countOfSearches++;

Expand Down Expand Up @@ -1250,7 +1292,9 @@ public async Task GivenAGroupExportJobToResume_WhenExecuted_ThenAllPatientResour

if (countOfSearches == 1)
{
ids = x.ArgAt<IReadOnlyList<Tuple<string, string>>>(1)[2].Item2.Split(',');
ids = x.ArgAt<IReadOnlyList<Tuple<string, string>>>(1)
.Where(x => x.Item1 == Core.Features.KnownQueryParameterNames.Id)
.Select(x => x.Item2).First().Split(',');
continuationTokenIndex = 0;
}
else if (countOfSearches == 2)
Expand All @@ -1261,7 +1305,10 @@ public async Task GivenAGroupExportJobToResume_WhenExecuted_ThenAllPatientResour
{
// The ids aren't in the query parameters because of the reset
ids = new string[] { "1", "2", "3" };
continuationTokenIndex = int.Parse(ContinuationTokenConverter.Decode(x.ArgAt<IReadOnlyList<Tuple<string, string>>>(1)[2].Item2).Substring(2));
continuationTokenIndex = int.Parse(ContinuationTokenConverter.Decode(
x.ArgAt<IReadOnlyList<Tuple<string, string>>>(1)
.Where(x => x.Item1 == Core.Features.KnownQueryParameterNames.ContinuationToken)
.Select(x => x.Item2).First())[2..]);
}

return CreateSearchResult(
Expand Down Expand Up @@ -1342,7 +1389,10 @@ public async Task GivenAGroupExportJobWithTheTypeParameter_WhenExecuted_ThenAllP
true)
.Returns(x =>
{
string[] ids = x.ArgAt<IReadOnlyList<Tuple<string, string>>>(1)[2].Item2.Split(',');
string[] ids = x.ArgAt<IReadOnlyList<Tuple<string, string>>>(1)
.Where(x => x.Item1 == KnownQueryParameterNames.Id)
.Select(x => x.Item2).First().Split(',');

SearchResultEntry[] entries = new SearchResultEntry[ids.Length];

for (int index = 0; index < ids.Length; index++)
Expand All @@ -1363,7 +1413,9 @@ public async Task GivenAGroupExportJobWithTheTypeParameter_WhenExecuted_ThenAllP
.Returns(x =>
{
string parentId = x.ArgAt<string>(1);
string[] resourceTypes = x.ArgAt<IReadOnlyList<Tuple<string, string>>>(3)[2].Item2.Split(',');
string[] resourceTypes = x.ArgAt<IReadOnlyList<Tuple<string, string>>>(3)
.Where(x => x.Item1 == KnownQueryParameterNames.Type)
.Select(x => x.Item2).First().Split(',');

SearchResultEntry[] entries = new SearchResultEntry[resourceTypes.Length];

Expand Down Expand Up @@ -2076,7 +2128,9 @@ private ExportJobRecord CreateExportJobRecord(
uint numberOfPagesPerCommit = 0,
string containerName = null,
string anonymizationConfigurationLocation = null,
string anonymizationConfigurationFileEtag = null)
string anonymizationConfigurationFileEtag = null,
bool includeHistory = false,
bool includeDeleted = false)
{
return new ExportJobRecord(
new Uri(requestEndpoint),
Expand All @@ -2094,7 +2148,9 @@ private ExportJobRecord CreateExportJobRecord(
numberOfPagesPerCommit: numberOfPagesPerCommit == 0 ? _exportJobConfiguration.NumberOfPagesPerCommit : numberOfPagesPerCommit,
storageAccountContainerName: containerName,
anonymizationConfigurationLocation: anonymizationConfigurationLocation,
anonymizationConfigurationFileETag: anonymizationConfigurationFileEtag);
anonymizationConfigurationFileETag: anonymizationConfigurationFileEtag,
includeHistory: includeHistory,
includeDeleted: includeDeleted);
}

private ExportJobTask CreateExportJobTask(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,16 @@ public class ExportJobConfiguration
/// </summary>
public uint MaximumNumberOfResourcesPerQuery { get; set; } = 10000;

/// <summary>
/// For SQL export, controlls the number of parallel id ranges to gather to be used for parallel export.
/// </summary>
public int NumberOfParallelRecordRanges { get; set; } = 100;

/// <summary>
/// For SQL export, controlls the DOP (degree of parallelization) used by the coordinator to build sub-jobs.
/// </summary>
public int CoordinatorMaxDegreeOfParallelization { get; set; } = 4;
mikaelweave marked this conversation as resolved.
Show resolved Hide resolved

/// <summary>
/// Number of pages to be iterated before committing the export progress.
/// </summary>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ public static async Task<CreateExportResponse> ExportAsync(
string containerName,
string formatName,
bool isParallel,
bool includeDeleted,
bool includeHistory,
uint maxCount,
string anonymizationConfigurationCollectionReference,
string anonymizationConfigLocation,
Expand All @@ -37,7 +39,23 @@ public static async Task<CreateExportResponse> ExportAsync(
EnsureArg.IsNotNull(mediator, nameof(mediator));
EnsureArg.IsNotNull(requestUri, nameof(requestUri));

var request = new CreateExportRequest(requestUri, requestType, resourceType, since, till, filters, groupId, containerName, formatName, isParallel, maxCount, anonymizationConfigurationCollectionReference, anonymizationConfigLocation, anonymizationConfigFileETag);
var request = new CreateExportRequest(
requestUri: requestUri,
requestType: requestType,
resourceType: resourceType,
since: since,
till: till,
filters: filters,
groupId: groupId,
containerName: containerName,
formatName: formatName,
isParallel: isParallel,
maxCount: maxCount,
includeDeleted: includeDeleted,
includeHistory: includeHistory,
anonymizationConfigurationCollectionReference: anonymizationConfigurationCollectionReference,
anonymizationConfigurationLocation: anonymizationConfigLocation,
anonymizationConfigurationFileETag: anonymizationConfigFileETag);

CreateExportResponse response = await mediator.Send(request, cancellationToken);
return response;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,14 @@ public static class KnownQueryParameterNames

public const string PurgeHistory = "_purgeHistory";

/// <summary>
/// Used by $export as a comma-separated list of parameters instructing which initial data should be included.
/// </summary>
public const string IncludeAssociatedData = "includeAssociatedData";
mikaelweave marked this conversation as resolved.
Show resolved Hide resolved

/// <summary>
/// Used by export to specify the number of resources to be processed by the search engine.
/// </summary>
public const string MaxCount = "_maxCount";
}
}
Loading