Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ID Library #5863

Merged
merged 9 commits into from
Nov 12, 2020
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
238 changes: 238 additions & 0 deletions modules/idLibrary.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,238 @@
import {getGlobal} from '../src/prebidGlobal.js';
import {ajax} from '../src/ajax.js';
import {config} from '../src/config.js';
import * as utils from '../src/utils.js';
import MD5 from 'crypto-js/md5.js';

let email;
let conf;
const LOG_PRE_FIX = 'ID-Library: ';
const CONF_DEFAULT_OBSERVER_DEBOUNCE_MS = 250;
const CONF_DEFAULT_FULL_BODY_SCAN = true;
const OBSERVER_CONFIG = {
subtree: true,
attributes: true,
attributeOldValue: false,
childList: true,
attirbuteFilter: ['value'],
characterData: true,
characterDataOldValue: false
};
const logInfo = createLogInfo(LOG_PRE_FIX);
const logError = createLogError(LOG_PRE_FIX);

function createLogInfo(prefix) {
return function (...strings) {
utils.logInfo(prefix + ' ', ...strings);
}
}

function createLogError(prefix) {
return function (...strings) {
utils.logError(prefix + ' ', ...strings);
}
}

function getEmail(value) {
const matched = value.match(/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/gi);
if (!matched) {
return null;
}
logInfo('Email found' + matched[0]);
return matched[0];
}

function bodyAction(mutations, observer) {
logInfo('BODY observer on debounce called');
// If the email is found in the input element, disconnect the observer
if (email) {
observer.disconnect();
logInfo('Email is found, body observer disconnected');
return;
}

const body = document.body.innerHTML;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are scanning the entire page with regex for an email?

Can you explain the use case here? This doesn't seem very prebid-y.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless the publisher has configured the element from which the email needs to be fetched, we will have to scan the entire page to get email.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't the publisher just pass the hashed email directly to the library when they initiate it? i agree that scanning an entire page is not ideal

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend we make this scanning of entire document a configurable option via the setConfig call to this module.

We will leave it defaulting to true but I think it is in our best interest to document the behavior of the module thoroughly, and make sure the publisher has the control necessary.

@jdwieland8282 Can we make sure the associated docs PR contains the steps outlined and the fallbacks?

Something that outlines the flow:

How the target is found:

  1. Uses publisher defined target element
  2. Searches for HTML input (text/email) element
  3. Searches entire document for email using regex.

And outline that the latter (3) can be configured off by something like:

pbjs.setConfig({
   idLibrary: {
      fallbackSearch: false // defaults to true (I am bad at naming vars)
   }
});

@SKOCHERI

Copy link
Member

@jdwieland8282 jdwieland8282 Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robertrmartinez the existing Docs PR describes the following:

A persistent identifier can be extracted in the following ways:

1. From a generic `<div>` element
2. From a publisher configured HTML element id
3. From an `<input>` element of type text/email

Are you suggesting your 1-3 above replace the existing 1-3?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well just something that outlines exactly the flow of the module.

Could be another flow chart or something.

email = getEmail(body);
if (email != null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!==

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

logInfo(`Email obtained from the body ${email}`);
observer.disconnect();
logInfo('Post data on email found in body');
postData();
}
}

function targetAction(mutations, observer) {
logInfo('Target observer called');
for (const mutation of mutations) {
for (const node of mutation.addedNodes) {
email = node.textContent;

if (email) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we get an email do we need to keep going through all other mutations?

Should there only ever be one email on page?

Should we break here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once email is found, data is posted and returned now

logInfo('Email obtained from the target ' + email);
observer.disconnect();
logInfo(' Post data on email found in target');
postData();
}
}
}
}

function addInputElementsElementListner(conf) {
logInfo('Adding input element listeners');
const inputs = document.querySelectorAll('input[type=text], input[type=email]');

for (var i = 0; i < inputs.length; i++) {
logInfo(` Original Value in Input = ${inputs[i].value}`);
inputs[i].addEventListener('change', event => processInputChange(event));
inputs[i].addEventListener('blur', event => processInputChange(event));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use a forEach instead of a for loop since you do not need to exit early or anything.

Just a personal preference, can leave it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using forEach LGTM analysis: JavaScript test failed so used for loop

}

function removeInputElementsElementListner() {
logInfo('Removing input element listeners');
const inputs = document.querySelectorAll('input[type=text], input[type=email]');

for (var i = 0; i < inputs.length; i++) {
inputs[i].removeEventListener('change', event => processInputChange(event));
inputs[i].removeEventListener('blur', event => processInputChange(event));
}
}

function processInputChange(event) {
const value = event.target.value;
logInfo(`Modified Value of input ${event.target.value}`);
email = getEmail(value);
if (email != null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, strict check since getEmail explicitly will return null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

logInfo('Email found in input ' + email);
postData();
removeInputElementsElementListner();
}
}

function debounce(func, wait, immediate) {
var timeout;
return function () {
var context = this;
var args = arguments;
var later = function () {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually we use let / or const now instead of actual var declaration.

No big deal though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

timeout = null;
if (!immediate) func.apply(context, args);
};
var callNow = immediate && !timeout;
clearTimeout(timeout);
logInfo('Debounce wait time ' + wait);
timeout = setTimeout(later, wait);
if (callNow) func.apply(context, args);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make more sense to instead just wrap this logic in a if else

if (callNow)

call the func

else

setTimeout to call the func.

Unnecessarily executing a setTimeout when !immediate seems like a waste.

};
};

function handleTargetElement() {
const targetObserver = new MutationObserver(debounce(targetAction, conf.debounce, false));

const targetElement = document.getElementById(conf.target);
if (targetElement) {
email = targetElement.innerText;

if (!email) {
logInfo('Finding the email with observer');
targetObserver.observe(targetElement, OBSERVER_CONFIG);
} else {
logInfo(' Target found with target ' + email);
logInfo(' Post data on email found in target with target');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the space is already included in the logInfo function you created, are these needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

postData();
}
}
}

function handleBodyElements() {
if (doesInputElementsHaveEmail()) {
logInfo('Email found in input elements ' + email);
logInfo('Post data on email found in target without');
postData();
return;
}
email = getEmail(document.body.innerHTML);
if (email != null) {
logInfo('Email found in body ' + email);
logInfo(' Post data on email found in the body without observer');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary space?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

postData();
return;
}
addInputElementsElementListner();
if (conf.fullscan === true) {
const bodyObserver = new MutationObserver(debounce(bodyAction, conf.debounce, false));
bodyObserver.observe(document.body, OBSERVER_CONFIG);
}
}

function doesInputElementsHaveEmail() {
const inputs = document.getElementsByTagName('input');

for (let index = 0; index < inputs.length; ++index) {
const curInput = inputs[index];
email = getEmail(curInput.value);
if (email != null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have coded getEmail to return null explicitly,

So you can change this to !==

return true;
}
}
return false;
}

function syncCallback() {
return {
success: function (responseBody) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not using responseBody, can remove I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

logInfo(' Data synced successfully.');
},
error: function () {
logInfo(' Data sync failed.');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaces still

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}
}
}

function postData() {
(getGlobal()).refreshUserIds();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we refreshing userIds every single call?

Not sure that was the intent when pbjs.refreshUserIds was implemented.

@jdwieland8282

CC @bretg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refreshUserIds is called prior to posting data which happens only when a email is found.

const userIds = (getGlobal()).getUserIds();
if (!userIds || userIds.length === 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think it may be better to use

!Array.isArray(userIds) instead of !userIds

If for some reason userIds comes back with a prop that does not have .length this will throw an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

userIds is an object, so updated the check to Object.keys(userIds).length === 0

return;
}
logInfo(' Users' + JSON.stringify(userIds));
const syncPayload = {};
syncPayload.hid = MD5(email).toString();
syncPayload.uids = JSON.stringify(userIds);
const payloadString = JSON.stringify(syncPayload);
logInfo(payloadString);
ajax(conf.url, syncCallback(), payloadString, {method: 'POST', withCredentials: true});
}

function associateIds() {
if (window.MutationObserver || window.WebKitMutationObserver) {
if (conf.target) {
handleTargetElement();
} else {
handleBodyElements();
}
}
}

export function setConfig(config) {
if (!config) {
logError('Required confirguration not provided');
return;
}
if (!config.url) {
logError('The required url is not configured');
return;
}
if (!config.debounce) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is debounce === 0 not a valid use case?

Maybe this should be:

if (typeof config.debounce !== 'number')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

config.debounce = CONF_DEFAULT_OBSERVER_DEBOUNCE_MS;
logInfo('Set default observer debounce to ' + CONF_DEFAULT_OBSERVER_DEBOUNCE_MS);
}
if (config.fullscan == undefined) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would reccomend just checking if it is type of boolean here to make sure it is always one.

if (typeof config.fullscan !== 'boolean')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also when possible, it is best to use strict ===.

Sometimes you want to catch null and undefined so == will do that,

but we know we want it to be a number for sure, so I think it is best to explicitly check if it is a number.

Same with fullscan (boolean).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

config.fullscan = CONF_DEFAULT_FULL_BODY_SCAN;
logInfo('Set default fullscan ' + CONF_DEFAULT_FULL_BODY_SCAN);
}
conf = config;
associateIds();
}

config.getConfig('idLibrary', config => setConfig(config.idLibrary));
24 changes: 24 additions & 0 deletions modules/idLibrary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
## ID Library Configuration Example


|Param |Required |Description |
|----------------|-------------------------------|-----------------------------|
|url |Yes | The url endpoint is used to post the hashed email and user ids. |
|target |No |It should contain the element id from which the email can be read. |
|debounce |No | Time in milliseconds before the email and ids are fetched |
|fullscan |No | Option to enable/disable full body scan to get email. By default the full body scan is enabled. |

### Example
```
pbjs.setConfig({
idLibrary:{
url: <url>,
debounce: 250,
target: 'username',
fullscan: false
},
});
```


```
61 changes: 61 additions & 0 deletions test/spec/modules/idLibrary_spec.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
import * as utils from 'src/utils.js';
import * as idlibrary from 'modules/idLibrary.js';

var expect = require('chai').expect;

describe('currency', function () {
let fakeCurrencyFileServer;
let sandbox;
let clock;

let fn = sinon.spy();

beforeEach(function () {
fakeCurrencyFileServer = sinon.fakeServer.create();
sinon.stub(utils, 'logInfo');
sinon.stub(utils, 'logError');
});

afterEach(function () {
utils.logInfo.restore();
utils.logError.restore();
fakeCurrencyFileServer.restore();
idlibrary.setConfig({});
});

describe('setConfig', function () {
beforeEach(function() {
sandbox = sinon.sandbox.create();
clock = sinon.useFakeTimers(1046952000000); // 2003-03-06T12:00:00Z
});

afterEach(function () {
sandbox.restore();
clock.restore();
});

it('results when no config available', function () {
idlibrary.setConfig({});
sinon.assert.called(utils.logError);
});
it('results with config available', function () {
idlibrary.setConfig({ 'url': 'URL' });
sinon.assert.called(utils.logInfo);
});
it('results with config default debounce ', function () {
let config = { 'url': 'URL' }
idlibrary.setConfig(config);
expect(config.debounce).to.be.equal(250);
});
it('results with config default fullscan ', function () {
let config = { 'url': 'URL' }
idlibrary.setConfig(config);
expect(config.fullscan).to.be.equal(true);
});
it('results with config fullscan ', function () {
let config = { 'url': 'URL', 'fullscan': false }
idlibrary.setConfig(config);
expect(config.fullscan).to.be.equal(false);
});
});
});