Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persist development dependency information in the DB #5666

Closed
joelverhagen opened this issue Mar 22, 2018 · 14 comments
Closed

Persist development dependency information in the DB #5666

joelverhagen opened this issue Mar 22, 2018 · 14 comments

Comments

@joelverhagen
Copy link
Member

This is useful for the <PackageReference> tab on the package details since the code snippet is has some additional properties when the package is a development dependency.

See: #5656

To roll this out we would need to turn it on for existing packages then perform some kind of data fix for the existing development dependencies.

Note that this data is in the catalog leaves.

@khellang
Copy link
Contributor

I've picked this up again after #5656 was merged. I think I should be able to get the metadata into the DB for new packages, but I might need some assistance or at least some pointers on how to "fill in" the metadata for existing packages. I don't think both issues necessarily have to be solved at the same time, though.

The work is at dev...khellang:developement-dependency-metadata

@joelverhagen
Copy link
Member Author

I don't think both issues necessarily have to be solved at the same time, though.

Agreed. We are already in the inconsistent state so we are free to separate the "new package" case and "existing package" case.

The "reflow" behavior should be able to fill this in if you put the metadata population in the right place (haven't looked at your branch).

We should consider doing just a database fix-up though since reflow also produces a catalog entry which is unnecessary in this case.

Consider an approach like this, which was used for repository URL info.
https://github.com/NuGet/NuGetGallery/blob/master/src/GalleryTools/Commands/BackfillRepositoryMetadataCommand.cs

@joelverhagen
Copy link
Member Author

Yes, your change will make reflow fill in the property. See in the ReflowPackageService.

package = _packageService.EnrichPackageFromNuGetPackage(
package,
packageArchive,
packageMetadata,
packageStreamMetadata,
package.User);

You will need a migration and ensure the assoicated SQL generates a default value on the column of false.

@joelverhagen
Copy link
Member Author

Oh, and I think there is a bit more metadata needed for the .csproj. This is produced by VS:

    <PackageReference Include="Nerdbank.GitVersioning" Version="3.0.6-beta">
      <PrivateAssets>all</PrivateAssets>
      <IncludeAssets>runtime; build; native; contentfiles; analyzers</IncludeAssets>
    </PackageReference>

@khellang
Copy link
Contributor

khellang commented Apr 26, 2019

Thanks for the pointers @joelverhagen!

I think this would solve #2836 as well.

a default value on the column of false

Ah, had it as nullable bool and just coalesced it to false, that way you could see which entries actually have developmentDependency set and which entries are missing. Do you think it's OK to just default to false?

The "reflow" behavior should be able to fill this in if you put the metadata population in the right place

This reflow-flow, is that triggered manually? Per package? 🤔

@joelverhagen
Copy link
Member Author

Do you think it's OK to just default to false?

Yes, after we have completed the data fix, it makes no sense to have a null value for the bool. In the generic mental model for a NuGet package, it is either a dev dependency or it is not. No null state exists. Representing this state doesn't give us anything extra except for perhaps a marker that the package needs to be reflowed... but this can be determined in a different method such as comparing the bool determined by the .nuspec with the bool in the DB.

For an example of what the migration should roughly look like:

AddColumn("dbo.Packages", "PackageStatusKey", c => c.Int(
nullable: false,
defaultValue: 0));

Of course we want a bool instead of an int but you get the idea.

This reflow-flow, is that triggered manually? Per package? 🤔

I think the tool should be written to have a "discover" phase that finds all dev dependency packages (perhaps via the catalog) and a "update" phase which sets the true bool on the entity. The "update" phase is very easy. Just EF update. You should consider whatever batching approach used in the tool I linked above.

The "discover" phase is a bit more interesting. The example I gave you above for the repository URL backfill checks all of the .nuspec to determine the Truth. You could go this route or you could consider a forward scan of the catalog. You can look for the developmentDependency bool, e.g.
https://api.nuget.org/v3/catalog0/data/2019.02.04.00.03.43/nerdbank.gitversioning.3.0.6-beta.json

Personally I would just copy and adapt the BackfillRepositoryMetadataCommand which has these two phases and gets the developmentDependency value by downloading and reading the .nuspec for all packages, via the .nuspec endpoint on the package content protocol.

@khellang
Copy link
Contributor

Yes, after we have completed the data fix, it makes no sense to have a null value for the bool.

Agreed. I was just thinking if it makes sense to do it in a three-step process: add nullable bool column, backfill dev dependency data, then make the column non-nullable. But if we don't need the nullable-ness for not-yet-backfilled entries, it makes sense to just have false as a default value 👍

@khellang
Copy link
Contributor

khellang commented Apr 30, 2019

Should DevelopmentDependency be added to/used in the following types?

  • PackageHistory - I'm thinking yes, but it should probably be nullable, as I don't know how to back-fill those values.
  • AuditedPackage - This is just a matter of copying over the value from Package. Probably wouldn't be used anywhere (yet), but it's nice for consistency.
  • V1FeedPackage/V2FeedPackage - Is there any point?
  • PackageIndexEntity (and stored/retrieved in LuceneIndexingService/LuceneSearchService/ExternalSearchService) - It's possible to round-trip the value in Lucene, but I'm not sure how to handle missing values in the index. I'm not really sure how the ExternalSearchService works and where the data comes from.

@joelverhagen
Copy link
Member Author

PackageHistory - I'm thinking yes, but it should probably be nullable, as I don't know how to back-fill those values.

I would leave that one alone. It was used for package edit and has not been cleaned up yet.

AuditedPackage - This is just a matter of copying over the value from Package. Probably wouldn't be used anywhere (yet), but it's nice for consistency.

Yes, I would add it here. Some auditing implementations just JSON serialize the package so it will automatically pick up more properties.

V1FeedPackage/V2FeedPackage - Is there any point?

No, we are trying to leave the V1 and V2 protocols as-is from now on.

PackageIndexEntity (and stored/retrieved in LuceneIndexingService/LuceneSearchService/ExternalSearchService) - It's possible to round-trip the value in Lucene, but I'm not sure how to handle missing values in the index. I'm not really sure how the ExternalSearchService works and where the data comes from.

I wouldn't worry about this one. The search results do not use this value.

@joelverhagen joelverhagen self-assigned this May 6, 2019
@joelverhagen joelverhagen added this to the S152 - 2019.04.22 milestone May 6, 2019
joelverhagen pushed a commit that referenced this issue May 7, 2019
* Add developmentDependency metadata to package
* Add filldevdeps tool command
* Change VerifyPackageRequest.DevelopmentDependency to bool
* Tweaked install command display
* Added unit tests for true/false/missing developmentDependency metadata
* Updated GalleryTools readme

Address #5666
@joelverhagen
Copy link
Member Author

This is deployed to PROD but I haven't run the backfill tool yet. This means new packages have show the proper thing in the PackageReference tab but old package don't yet.
Congratulations to DotNet.Build.EazfuscatorNet 2.1.0-preview.3.0.19274.7 for being the first to get this feature 🥇.

I'll close this issue when I've completed running the backfill tool.

@khellang
Copy link
Contributor

Awesome! Let's hope the backfill command works as it should 😅

I guess the logical next question is; should we show this metadata anywhere else? It it interesting information?

@joelverhagen
Copy link
Member Author

The tool worked great on our DEV and INT environments 👍.

Perhaps we could plumb this to the VS UI. This would mean enhancing the package metadata endpoint and surfacing it in the client.
https://docs.microsoft.com/en-us/nuget/api/registration-base-url-resource

Perhaps we should come up with some concrete ideas and make new issues.

@loic-sharma loic-sharma removed this from the S153 - 2019.05.13 milestone May 29, 2019
@loic-sharma loic-sharma added this to the S154 - 2019.06.03 milestone May 29, 2019
@joelverhagen
Copy link
Member Author

The backfill on PROD is complete.

@khellang
Copy link
Contributor

khellang commented Jun 1, 2019

W00t! 🎉 Thanks for walking me through this 👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants