Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Final Refactoring - Syw UID2-4158 token gen code refactoring user identity #1123

Open
wants to merge 69 commits into
base: main
Choose a base branch
from

Conversation

sunnywu
Copy link
Contributor

@sunnywu sunnywu commented Oct 28, 2024

  1. Read https://thetradedesk.atlassian.net/browse/UID2-4159
  2. Refactor EncryptedTokenEncoder#encodeIntoIdentityResponse
  3. Moved toJson methods into IdentityResponse from UIDOperatorVerticle class
  4. Use EncryptedTokenEncoder directly inside UIDOperatorService class instead of ITokenEncoder interfact so it's easier to jump to references and we haven't really created any other ITokenEncoder varantsi
  5. Removed the UserIdentity's refreshedAt field as that's redundant and currently set to the same value as the AdvertisingTokenInput/RefreshTokenInput#createdAt timestamp.
  6. Renamed some token variable names to adTokenInput and refreshTokenInput to make it clearer
  7. Removed privacyBits and establishedAt from UserIdentity/HashedDiidentity/RawUidIdentity. EstbalishedAt is kept inside FirstLevelHashIdentity as the time the first level hash was created..
  8. Also, privacyBits and establishedAt are generated from either a. in a brand new token generation call (by default should be 1 and the timestamp at the time of the call) b. or, during token refresh logic, these fields will be inherited from previous refresh token.
  9. RefreshTokenRequest, AdvertisingTokenRequest and IdentityRequest will now require to store privacyBits and establishedAt itself as a compromise.
  10. generateIdentity will now take in a privacyBits param.
  11. Created verifyFirstLevelHashIdentityAndEstablishedAt method for verifying FirstLevelHashIdentity and added more checks for FirstLevelHash and check the correctness of raw Uid/First level hash in various unit tests
  12. Added PrivacyBitsTest/IdentityResponseTest/RawUidResponseTest classes and added some extra check for firstLevelHash in existing tests
  13. Replace int privacy bits type use to proper PrivacyBits class instance
  14. Refactored UIDOperatorVerticleTest unit tests to standardise the advertisingTokenRequest/refreshTokenRequest verification(2edcb05)

Changes from 2024-12-01
14. Renamed TokenUtils.java's functions to be blatently clear on the input value type
15. Fixed the old testIdentityMapForOptOutUser test that was using raw DII as a hash
16. Renamed all uses of IdentityString/IdentityHash variables to rawDii and hashedDii
17. Renamed IdentityType to DiiType
18. In InputUtil, renamed Identity* member variable/methods to Dii* to indicate it's Dii
19. Change TokenUtils class getIdentityHash to getDiiHash and same for getDiiHashString
20. Renamed FirstLevelHashIdentity/HashedDiiIdentity/RawUidIdentity to FirstLevelHash/HashedDii/RawUid
21. Renamed RawUidResponse to IdentityMapResponseItem, MapRequest to IdentityMapRequestItem
22. Renamed IUIDOperatorService.mapHashedDiiIdentity to mapHashedDii
23. Renamed IdentityRequest/IdentityResponse to TokenGenerateRequest/Response
24. Renamed RefreshTokenRequest/RefreshResponse to  TokenRefreshRequest/TokenRefreshResponse
25. Renamed AdvertisingTOkenInput/RefreshTokenInput to AdvertisingTokenRequest/RefreshTOkenRequest

…LevelHashIdentity and RawUidIdentity as UserIdentity is a very vague term and one needs to read function calls to really understand whether the UserIdentity class has hashed DII, first level hash or raw Uid.
2. Rename RefreshToken class to RefreshTokenInput
3. Rename createAdvertisingToken/createRefreshToken methods to createAdvertisingTokenInput/createRefreshTokenInput
…ii/FirstLevelHash/RawUidIdentity sub-classes
2. Moved toJson methods into IdentityResponse from UIDOperatorVerticle class
…stead of ITokenEncoder interfact so it's easier to jump to references and we haven't really created any other ITokenEncoder variants
2. Renamed AdvertisingId to RawUid in most operator codes (except optout store related)
2. Renamed RawUidResult to RawUidResponse
3. rename some id variable to rawUid to make it clearer
4. more comments
…xed the UserIdentity/FirstLevelHashIdentity class uses to HashedDiiIdentity instead in IdentityMapBenchmark and TokenEndecBenchmark
2. Renamed RawUidResult to RawUidResponse
3. rename some id variable to rawUid to make it clearer
4. more comments
new IdentityRequest(
new PublisherIdentity(siteId, 0, 0),
input.toUserIdentity(this.identityScope, 1, Instant.now()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1, Instant.now() became the default param for IdentityRequest constructor now

new IdentityRequest(
new PublisherIdentity(siteId, 0, 0),
input.toUserIdentity(this.identityScope, 1, Instant.now()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1, Instant.now() became the default param for IdentityRequest constructor now

@@ -1235,11 +1237,11 @@ private void handleIdentityMapV1(RoutingContext rc) {
}
try {
final Instant now = Instant.now();
final MappedIdentity mappedIdentity = this.idService.map(input.toUserIdentity(this.identityScope, 0, now), now);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

identity/map call doesn't use the 0 privacy bits and established at timestamp

@@ -48,13 +55,13 @@ public class UIDOperatorServiceTest {
ExtendedUIDOperatorService uid2Service;
ExtendedUIDOperatorService euidService;
Instant now;

RotatingSaltProvider saltProvider;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved here so we could use it to verify generated first level hash/uid result

@sunnywu sunnywu marked this pull request as ready for review October 30, 2024 07:32
Copy link
Contributor

@asloobq asloobq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really makes it easy to understand the core raw UID and token generation. Thanks for doing this.

OptoutCheckPolicy optoutCheckPolicy,
Instant asOf)
{
this.userIdentity = userIdentity;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is an existing styling issue here - { should be on previous line. Just a minor suggestion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


MappedIdentity mapIdentity(MapRequest request);
RawUidResponse mapIdentity(MapRequest request);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can rename this method to mapHashedDii to make it more obvious that it accepts Hashed DII

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to mapHashedDiiIdentity to stay consistent with generateIdentity/refreshIdentity


import java.time.Instant;

//base class for all other HshedDii/FirstLevelHash/RawUIDIdentity class and define the basic common fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Typo in HshedDii

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

import com.uid2.operator.model.IdentityScope;
import com.uid2.operator.model.IdentityType;

import java.time.Instant;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused import

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

}

private UserIdentity getFirstLevelHashIdentity(IdentityScope identityScope, IdentityType identityType, byte[] identityHash, Instant asOf) {
private FirstLevelHashIdentity getFirstLevelHashIdentity(IdentityScope identityScope, IdentityType identityType, byte[] identityHash, Instant asOf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not continue calling it hashedDii instead of byte[] identityHash ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a common functions used by opt out identity creation like UIDOperatorService#testOptOutIdentityForEmail

this.testOptOutIdentityForEmail = getFirstLevelHashIdentity(identityScope, IdentityType.Email,

@@ -1,11 +1,12 @@
package com.uid2.operator.model;

public class PublisherIdentity {
// The original publisher that requests to generate a UID token
public class SourcePublisher {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering why some file renames don't show up as such in the changes ? such as RefreshToken to RefreshTokenInput.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no idea :)

@@ -989,13 +991,13 @@ private void handleTokenGenerateV2(RoutingContext rc) {
return;
}

final IdentityTokens t = this.idService.generateIdentity(
final IdentityResponse t = this.idService.generateIdentity(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please rename the variable from t to identityResponse or just response

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

public final int siteId;
public final int clientKeyId;
public final long publisherId;

public PublisherIdentity(int siteId, int clientKeyId, long publisherId) {
public SourcePublisher(int siteId, int clientKeyId, long publisherId) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is always constructed as SourcePublisher(siteId, 0, 0) perhaps make another constructor with default arguments ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great suggestion - done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because it is a half- (well, 5%) baked feature. There is nothing special about values 0. Maybe worth putting some comment about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have added this comment in the class:

// these 2 values are added into adverting/UID token and refresh token payload but
// are not really used for any real purposes currently so sometimes are set to 0
// see the constructor below

@@ -1045,14 +1047,14 @@ else if (!input.isValid()) {

try {
siteId = AuthMiddleware.getAuthClient(rc).getSiteId();
final IdentityTokens t = this.idService.generateIdentity(
final IdentityResponse t = this.idService.generateIdentity(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rename variable t

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to response

new PublisherIdentity(siteId, 0, 0),
new UserIdentity(IdentityScope.UID2, IdentityType.Email, identity, privacyBits, Instant.ofEpochMilli(establishedMillis), null));
new SourcePublisher(siteId, 0, 0),
new FirstLevelHashIdentity(IdentityScope.UID2, IdentityType.Email, identity,
Copy link
Contributor

@asloobq asloobq Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we assume that the scope is UID2 and not EUID ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

legacy - this method is for refresh token v2 decoding but we only use v3 refresh token now see

this.refreshTokenVersion = TokenVersion.V3;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. We can clean up the old code in the near future.

public static RefreshResponse Expired = new RefreshResponse(Status.Expired, IdentityTokens.LogoutToken);
public static RefreshResponse Deprecated = new RefreshResponse(Status.Deprecated, IdentityTokens.LogoutToken);
public static RefreshResponse NoActiveKey = new RefreshResponse(Status.NoActiveKey, IdentityTokens.LogoutToken);
public static RefreshResponse Invalid = new RefreshResponse(Status.Invalid, IdentityResponse.OptOutIdentityResponse);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these ever get mutated right? I think we can set these to public static final

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if (remoteApiHost == null) {
handler.handle(Future.failedFuture("remote api not set"));
return;
}

this.webClient.get(remoteApiPort, remoteApiHost, remoteApiPath).
addQueryParam("identity_hash", EncodingUtils.toBase64String(firstLevelHashIdentity.id))
.addQueryParam("advertising_id", EncodingUtils.toBase64String(advertisingId))
addQueryParam("identity_hash", EncodingUtils.toBase64String(firstLevelHashIdentity.firstLevelHash))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor point - can we move the . from the previous line to this one, i.e. .addQueryParam instead of just addQueryParam

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

…put to AdvertisingTokenRequest/RefreshTOkenRequest
Copy link
Contributor

@asloobq asloobq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

new PublisherIdentity(siteId, 0, 0),
new UserIdentity(IdentityScope.UID2, IdentityType.Email, identity, privacyBits, Instant.ofEpochMilli(establishedMillis), null));
new SourcePublisher(siteId, 0, 0),
new FirstLevelHashIdentity(IdentityScope.UID2, IdentityType.Email, identity,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. We can clean up the old code in the near future.

public static final String ValidateIdentityForEmail = "[email protected]";
public static final String ValidateIdentityForPhone = "+12345678901";
public static final byte[] ValidateIdentityForEmailHash = EncodingUtils.getSha256Bytes(IdentityConst.ValidateIdentityForEmail);
public static final byte[] ValidateIdentityForPhoneHash = EncodingUtils.getSha256Bytes(IdentityConst.ValidateIdentityForPhone);

// DIIs to use when you want to generate a optout response in token generation or identity map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: should be an optout

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

import com.uid2.operator.util.PrivacyBits;
import com.uid2.shared.model.TokenVersion;

// class containing enough data to create a new uid or advertising token
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean "new uid token" and "new advertising token"? If so, what is the difference between the two?

Or does it mean "new uid" and "new advertising token"? If so, why is the advertising token request used to create a new uid? What is a uid in this case anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should re-phrase better to
class containing enough information to create a new uid token (aka advertising token)

I use the term uid because this is about UID2/EUID. I think we have an understanding previously that from now on we should try to call things UID if it's used for both UID2 & EUID.

import com.uid2.shared.model.TokenVersion;

// class containing enough data to create a new uid or advertising token
public class AdvertisingTokenRequest extends VersionedToken {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this class inherit from VersionedToken? Should the base classed be renamed to something like VersionedTokenRequest for consistency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VersionedToken is a generic store for both AdvertisingTokenRequest and RefreshTokenRequest. So yeh i can rename to VersionedTokenRequest to be consistent.

Fixed

@@ -0,0 +1,30 @@
package com.uid2.operator.model.userIdentity;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to this, package names are meant to be all-lowercase. To avoid the two-word, you could try naming the package like com.uid2.operator.model.user or com.uid2.operator.model.identity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And some more questions/suggestions:

  • Should other identity related types be moved into this package structure?
  • Should IdentityType be renamed to DiiType and moved under same package?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have renamed the package to com.uid2.operator.model.identity.
Renamed IdentityType/IdentityInputType to renamed to DiiType/DiiInputType and added some comments to explain DII and refer to the DII definition on unifiedid.com glossary page


// this defines all the fields for the response of the /token/generate and /client/generate endpoints before they are
// jsonified
public class IdentityResponse {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is used for token generation / refresh responses and that we have identity map API, it can be confusing what this refers to. Should this be something like TokenGenerateResponse instead?

import com.uid2.operator.model.IdentityType;

//base class for all other HashedDii/FirstLevelHash/RawUIDIdentity class and define the basic common fields
public abstract class UserIdentity {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this abstraction add value?


// Contains a hash DII,
// This hash can either be computed from a raw email/phone number DII input or provided by the UID Participant directly
public class HashedDiiIdentity extends UserIdentity {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not call this just HashedDii?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to keep the subclass name consistent so i prefer to add the term Identity here. But having a second thought - 1 problem i had previously with this code base is too many things are called Identity so ok i'll rename and just call it HashedDii

import java.util.Arrays;

// A raw UID is stored inside
public class RawUidIdentity extends UserIdentity {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would just RawUid make this easier to grasp? It could have a toString() override which would produce an encoded string.

import java.util.Arrays;

// Contains a first level salted hash computed from Hashed DII (email/phone number)
public class FirstLevelHashIdentity extends UserIdentity {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just FirstLevelHash?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I still have HashedDiiIdentity and RawUidIdentity but will keep them as is for now

}

// explicitly not checking establishedAt - this is only for making sure the first level hash matches a new input
public boolean matches(FirstLevelHashIdentity that) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an override of the equals() method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, i tend to think creating a equals override means we are making sure all the fields in a class is equal.
But the use case of this method in getGlobalOptOutResult explicitly do not check all fields, i.e. skip establishedAt.

Mainly renaming RefreshTokenRequest/RefreshResponse to
TokenRefreshRequest/TokenRefreshResponse
…entityMapRequestItem

- Renamed IUIDOperatorService.mapHashedDiiIdentity to mapHashedDii
… FirstLevelHash/HashedDii/RawUid

2. Removed the abstract class UserIdentity
3. fixed a unit test as removing UserIdentity broke it
2. In InputUtil, renamed Identity* member variable/methods to Dii* to indicate it's Dii
3. Change TokenUtils class getIdentityHash to getDiiHash and same for getDiiHashString
@sunnywu sunnywu force-pushed the syw-UID2-4159-token-gen-code-refactoring-UserIdentity branch from 78a70d4 to a8f0915 Compare December 10, 2024 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants