Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ees 5660 replace versioned draft folder with static folder #5403

Conversation

duncan-at-hiveit
Copy link
Collaborator

@duncan-at-hiveit duncan-at-hiveit commented Nov 19, 2024

UI test results

image

Overview

This PR:

  • fixes a bug around draft DataSetVersions whereby a draft (unpublished) DataSetVersion that was being deleted left its fileshare folder behind if its version changed to a major version, due to the delete code not expecting to look for a folder with a minor version name.

This is the second implementation to fix this issue, the first being an attempt whereby the delete code would attempt to look for both a minor AND major version-named folder and delete either if they exist, which felt untidy.

More importantly, it introduced complexity to an area of the codebase that has already generated 2 bugs, and the crux of the issue seemed to be that it was difficult to rely on a folder name that is version-based during a time where a DataSetVersion's version can fluctuate between minor and major at any time.

The implementation

High-level design

To overcome this, the idea here was to move away from a version-based name entirely for a draft DataSetVersion's folder, and instead have a single folder named "draft" which will contain the working files for the current draft version of a DataSet. This "draft" folder would remain up until the version is finalised and can no longer be changed. In this case, it is when the DataSetVersion is published.

The lifecycle of a DataSetVersion and its folder would be:

  1. A new DataSet is created from Admin.
  2. The Data Processor is called to create a brand new DataSet, and its very first v1.0 DataSetVersion.
  3. The v1.0 draft DataSetVersion is imported in its entirety. The files go into a draft folder within the DataSet's fileshare folder.
  4. When the Release that the DataSetVersion's linked Subject belongs to is published, the Publisher renames the draft folder to v1.0.0.
  5. A new Release is created with a new Subject, which is then chosen to be the next version of the DataSet.
  6. The Data Processor is called to partially import the next DataSetVersion, prior to carrying out mapping. The files go into a newly created draft folder again.
  7. During mapping, the DataSetVersion's version can fluctuate between v1.1 and v2.0 depending on the mapping choices, but the fileshare folder will remain the same - draft.
  8. When the new Release is published, the Publisher renames the draft folder to the appropriate value that is based upon the final version of the new DataSetVersion, to either v1.1.0 or v2.0.0.

Implementation details

To facilitate the above flow, the following changes have been put in place:

  1. The DataSetVersionPathResolver code has been updated to resolve to the draft folder path if the DataSetVersion in question is in one of the private statuses (the non-public ones), or the standard version-based folder path if the DataSetVersion is public.
  2. For a subsequent DataSetVersion import, the need to update the folder name after completing the DataSetVersion mapping activities is no longer necessary, so that code has been removed.
  3. In order to allow the Publisher to carry out the folder rename during its publishing of DataSetVersions, the Publisher project now has access to the Public.Data.Services project that holds the DataSetVersionPathResolver.
  4. Additionally, the Publisher is also provided the PublicData:BasePath configuration from the ARM template.
  5. Additionally, the Public API fileshare is also mounted into the Publisher Function App in the ARM template. This is conditional on the Public API existing in the target environment during deploy.
  6. A migration is supplied that will run on the Public API startup, that finds all draft DataSetVersions and renames their current version-based folder names to the new draft folder name.

Miscellaneous changes

  1. ICustomMigration has been moved to Common, to allow the Public.Data.Api project to leverage it as well as Admin.
  2. Custom migration support is now added to the Public.Data.Api startup, and both this and the Admin's custom migration code will now look up any ICustomMigration implementations that are registered in DI and run them on startup.
  3. A number of areas of the code are now annotated with // TODO EES-5660 to highlight areas of the code that can be removed once this code has been deployed once to each environment that currently has the Public API enabled.

@duncan-at-hiveit duncan-at-hiveit force-pushed the EES-5660-replace-versioned-draft-folder-with-static-folder branch from 5157854 to e36dcbf Compare November 19, 2024 17:06
…ion of a data set prior to its publishing rather than including version in the folder name, as the version can change multiple times prior to publishing.
…ers to be "draft" rather than version-based names.
… if running integration tests, as otherwise they fire on every individual integration test method run.
@duncan-at-hiveit duncan-at-hiveit force-pushed the EES-5660-replace-versioned-draft-folder-with-static-folder branch from 4eb268f to fb0426b Compare November 19, 2024 17:54
@duncan-at-hiveit duncan-at-hiveit marked this pull request as ready for review November 19, 2024 17:54
Comment on lines +1201 to +1202
"publicDataFileshareMountPath": "/data/public-api-data",
"publicDataFileshareName": "[concat(parameters('subscription'), '-ees-papi-fs-data')]",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a nitpick, but why have we contracted the term 'fileshare'?

Azure refers to it as 'file share' across the board so it doesn't really make sense to contract it into one word.

If it's not too much trouble, would suggest we re-case this as two words i.e. publicDataFileShareMountPath and publicDataFileShareName

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea, but I'd prefer not to muddy the waters of this PR with it. Instead, I'll raise a separate PR and find all occurrences that we can change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #5427 which targets this PR.

If you're happy with that PR and this one, I'll firstly merge that one into this one and then this one into dev.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, makes sense. However, would suggest that we merge this in first and then get the other one merged in separately to keep things cleaner

Comment on lines 479 to 482
new Tuple<int, string>(1, "3"),
new Tuple<int, string>(1, "4"),
new Tuple<int, string>(2, "3"),
new Tuple<int, string>(2, "4")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to specify the type - can just be new(...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done along with the change of tuple types ta.

Comment on lines 512 to 519
new Tuple<int, string, char>(1, "3", '5'),
new Tuple<int, string, char>(1, "3", '6'),
new Tuple<int, string, char>(1, "4", '5'),
new Tuple<int, string, char>(1, "4", '6'),
new Tuple<int, string, char>(2, "3", '5'),
new Tuple<int, string, char>(2, "3", '6'),
new Tuple<int, string, char>(2, "4", '5'),
new Tuple<int, string, char>(2, "4", '6'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to specify the type - can just be new(...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done along with the change of tuple types ta.

@@ -319,5 +319,30 @@ public static IOrderedEnumerable<T> NaturalThenBy<T>(
{
return source.ThenBy(keySelector, comparison.WithNaturalSort());
}

public static List<Tuple<T1, T2>> Cartesian<T1, T2>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason we're using the Tuple class over value tuples like List<(T1, T2)>?

There's a bit of a distinction between the two, but it seems like value tuples are newer and seem to be recommended for use over the Tuple class. Value tuples importantly are structs and allocate memory more efficiently.

Could switch to using value tuples?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea! Done.

.ToList();
}

public static List<Tuple<T1, T2, T3>> Cartesian<T1, T2, T3>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above - could use value tuples instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea! Done.

Comment on lines +61 to +62
// Assert that the original folder has moved to use the new special "draft" folder.
var expectedDraftFolder = pathResolver.DirectoryPath(dataSetVersion);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess it's unlikely, but this test seems a little fragile as we're not actually asserting that it's a drafts directory and are just assuming that the output from pathResolver.DirectoryPath is correct.

A similar problem arises with the other test cases below as well.

Copy link
Collaborator Author

@duncan-at-hiveit duncan-at-hiveit Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this test was gonna hang around for a while I'd be inclined to make a change here, but given we'll be deleting this migration after it gets deployed to Dev and Test, I don't think it's worth the effort in changing now if you're happy with that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also, I have tested manually!)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, yeh fine with me!

GetMigration().Apply();

// Assert that the existing draft folder is left untouched.
Assert.True(Directory.Exists(draftFolder));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess it might be overkill - but we might want to assert that the directory with the version number doesn't exist as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be inclined not to bother, as this pre-existing folder would need to be set up manually in the test setup by us. Given this test'll go as soon as this is deployed to Dev and Test, I'd be inclined to leave as is if that's OK.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, yeh fine with me!

Comment on lines +30 to +31
public static bool IsPrivateStatus(this DataSetVersion dataSetVersion)
=> IsPrivateStatus().Compile()(dataSetVersion);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting - does this get around cases where you can't normally use a static method within a LINQ call or something similar?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's to allow code to be shareable between normal service code and within LINQ queries I believe (@jack-hive having created the original example for the IsPublicStatus() method). While the Linq / EF bridge code might otherwise complain when given a DataSetVersion.IsPrivateStatus() method that it can't compile the expression down to SQL, this provides a way to do that, and additionally allows you to use this method as an extension method on the domain model so you can say dataSetVersion.IsPrivateStatus().

This way we're much more assured that the logic for determining public / private statuses in queries also matches the same logic as we use in normal code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I'll bear this one in mind, might come in handy in the future!

…ces with named tuples.
@duncan-at-hiveit duncan-at-hiveit merged commit 4c5bea1 into dev Dec 3, 2024
8 checks passed
@duncan-at-hiveit duncan-at-hiveit deleted the EES-5660-replace-versioned-draft-folder-with-static-folder branch December 3, 2024 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants