Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ID Library #5863

Merged
merged 9 commits into from
Nov 12, 2020
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
247 changes: 247 additions & 0 deletions modules/idLibrary.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,247 @@
import {getGlobal} from '../src/prebidGlobal.js';
import {ajax} from '../src/ajax.js';
import {config} from '../src/config.js';
import * as utils from '../src/utils.js';
import MD5 from 'crypto-js/md5.js';

let email;
const LOG_PRE_FIX = 'ID-Library: ';
const CONF_DEFAULT_OBSERVER_DEBOUNCE_MS = 250;
const OBSERVER_CONFIG = {
subtree: true,
attributes: true,
attributeOldValue: false,
childList: true,
attirbuteFilter: ['value'],
characterData: true,
characterDataOldValue: false
};
const logInfo = createLogInfo(LOG_PRE_FIX);
const logError = createLogError(LOG_PRE_FIX);

function createLogInfo(prefix) {
return function (...strings) {
utils.logInfo(prefix + ' ', ...strings);
}
}

function createLogError(prefix) {
return function (...strings) {
utils.logError(prefix + ' ', ...strings);
}
}

function getEmail(value) {
const matched = value.match(/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/gi);
if (!matched) {
return null;
}
return matched[0];
}

function hasEmail(value) {
const email = getEmail(value);
return !!email;
}

function bodyAction(conf, mutations, observer) {
logInfo('BODY observer on debounce called');

if (email) {
observer.disconnect();
logInfo('Email is found, body observer disconnected');
}

const body = document.body.innerHTML;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are scanning the entire page with regex for an email?

Can you explain the use case here? This doesn't seem very prebid-y.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless the publisher has configured the element from which the email needs to be fetched, we will have to scan the entire page to get email.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't the publisher just pass the hashed email directly to the library when they initiate it? i agree that scanning an entire page is not ideal

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend we make this scanning of entire document a configurable option via the setConfig call to this module.

We will leave it defaulting to true but I think it is in our best interest to document the behavior of the module thoroughly, and make sure the publisher has the control necessary.

@jdwieland8282 Can we make sure the associated docs PR contains the steps outlined and the fallbacks?

Something that outlines the flow:

How the target is found:

  1. Uses publisher defined target element
  2. Searches for HTML input (text/email) element
  3. Searches entire document for email using regex.

And outline that the latter (3) can be configured off by something like:

pbjs.setConfig({
   idLibrary: {
      fallbackSearch: false // defaults to true (I am bad at naming vars)
   }
});

@SKOCHERI

Copy link
Member

@jdwieland8282 jdwieland8282 Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robertrmartinez the existing Docs PR describes the following:

A persistent identifier can be extracted in the following ways:

1. From a generic `<div>` element
2. From a publisher configured HTML element id
3. From an `<input>` element of type text/email

Are you suggesting your 1-3 above replace the existing 1-3?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well just something that outlines exactly the flow of the module.

Could be another flow chart or something.


if (hasEmail(body)) {
email = getEmail(body);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it is a good idea to call hasEmail which calls getEmail and does a full regex lookup on the innerHtml of the whole page, then call getEmail again. Seems very inefficient.

If you are gonna run getEmail anyways, we should just check the response instead of wrapping it in hasEmail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been fixed


logInfo(`Email obtained from the body ${email}`);
observer.disconnect();
logInfo('Post data on email found in body');
postData(conf.url);
}
}

function targetAction(conf, mutations, observer) {
logInfo('Target observer called');
for (const mutation of mutations) {
for (const node of mutation.addedNodes) {
email = node.textContent;

if (email) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we get an email do we need to keep going through all other mutations?

Should there only ever be one email on page?

Should we break here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once email is found, data is posted and returned now

logInfo('Email obtained from the target ' + email);
observer.disconnect();
logInfo(' Post data on email found in target');
postData(conf.url);
}
}
}
}

function addInputElementsElementListner(conf) {
logInfo('Adding input element listeners');
const inputs = document.querySelectorAll('input[type=text], input[type=email]');

for (var i = 0; i < inputs.length; i++) {
logInfo(` Original Value in Input = ${inputs[i].value}`);
inputs[i].addEventListener('change', event => processInputChange(event, conf));
inputs[i].addEventListener('blur', event => processInputChange(event, conf));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use a forEach instead of a for loop since you do not need to exit early or anything.

Just a personal preference, can leave it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using forEach LGTM analysis: JavaScript test failed so used for loop

}

function removeInputElementsElementListner(conf) {
logInfo('Removing input element listeners');
const inputs = document.querySelectorAll('input[type=text], input[type=email]');

for (var i = 0; i < inputs.length; i++) {
inputs[i].removeEventListener('change', event => processInputChange(event, conf));
inputs[i].removeEventListener('blur', event => processInputChange(event, conf));
}
}

function processInputChange(event, conf) {
const value = event.target.value;
logInfo(`Modified Value of input ${event.target.value}`);
if (hasEmail(value)) {
email = getEmail(value);

logInfo('Email found in input ' + email);
postData(conf.url);
removeInputElementsElementListner(conf);
}
}

function debounce(func, wait, immediate) {
let timeout;

return function (...args) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So debounce is just returning a function to be called later?

However, anywhere debounce is called, the return is not used.

So this is effectively not doing anything.

I think the real fix here is to add the method bindings as requested for the other methods (first param anytime debounce is called).

And to make this function a regular function that just executes, not one which returns a function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, did you mean to use the keyword arguments instead of args ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robertrmartinez The debounce issue is fixed and full body scan config is provided.

const context = this;
const later = function () {
timeout = null;
if (!immediate) {
func.apply(context, args);
}
};
const callNow = immediate && !timeout;
clearTimeout(timeout);
timeout = setTimeout(later, wait);
if (callNow) {
func.apply(context, args);
}
};
}

function handleTargetElement(conf) {
const targetObserver = new MutationObserver(function (mutations, observer) {
logInfo('target observer called');
debounce(targetAction(conf, mutations, observer), conf.debounce, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SKOCHERI

What is the intent here? Please elaborate on the expectation of things to happen when this line is hit.

debounce takes in a func param that looks like the intent is for it to be called later.

But you are executing the targetAction here with those params, which returns undefined. So really func is always undefined when debounce is run.

Is the intent to not call targetAction here but just pass its reference to debounce?

If so, recommend you use bind.

debounce(targetAction.bind(null, conf, mutations, observer), conf.debounce, false);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The debounce issue is fixed

});

const targetElement = document.getElementById(conf.target);
if (targetElement) {
email = targetElement.innerText;

if (!email) {
logInfo('Finding the email with observer');
targetObserver.observe(targetElement, OBSERVER_CONFIG);
} else {
logInfo(' Target found with target ' + email);
logInfo(' Post data on email found in target with target');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the space is already included in the logInfo function you created, are these needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

postData(conf.url);
}
}
}

function handleBodyElements(conf) {
if (doesInputElementsHaveEmail()) {
logInfo('Email found in input elements ' + email);
logInfo('Post data on email found in target without');
postData(conf.url);
return;
}
if (hasEmail(document.body.innerHTML)) {
email = getEmail(document.body.innerHTML);

logInfo('Email found in body ' + email);
logInfo(' Post data on email found in the body without observer');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary space?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

postData(conf.url);
return;
}
addInputElementsElementListner(conf);
const bodyObserver = new MutationObserver(function (mutations, observer) {
logInfo('Body observer called');
debounce(bodyAction(conf, mutations, observer), conf.debounce, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as debounce change needed here.

});
bodyObserver.observe(document.body, OBSERVER_CONFIG);
}

function doesInputElementsHaveEmail() {
const inputs = document.getElementsByTagName('input');

for (let index = 0; index < inputs.length; ++index) {
const curInput = inputs[index];

if (hasEmail(curInput.value)) {
email = getEmail(curInput.value);
return true;
}
}
return false;
}

function syncCallback() {
return {
success: function (responseBody) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not using responseBody, can remove I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

logInfo(' Data synced successfully.');
},
error: function () {
logInfo(' Data sync failed.');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaces still

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}
}
}

function postData(url) {
(getGlobal()).refreshUserIds();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we refreshing userIds every single call?

Not sure that was the intent when pbjs.refreshUserIds was implemented.

@jdwieland8282

CC @bretg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refreshUserIds is called prior to posting data which happens only when a email is found.

const userIds = (getGlobal()).getUserIds();
if (!userIds || userIds.length === 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think it may be better to use

!Array.isArray(userIds) instead of !userIds

If for some reason userIds comes back with a prop that does not have .length this will throw an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

userIds is an object, so updated the check to Object.keys(userIds).length === 0

return;
}
logInfo(' Users' + JSON.stringify(userIds));
const syncPayload = {};
syncPayload.hid = MD5(email).toString();
syncPayload.uids = JSON.stringify(userIds);
const payloadString = JSON.stringify(syncPayload);
logInfo(payloadString);
ajax(url, syncCallback(), payloadString, {method: 'POST', withCredentials: true});
}

function associateIds(conf) {
if (window.MutationObserver || window.WebKitMutationObserver) {
if (conf.target) {
handleTargetElement(conf);
} else {
handleBodyElements(conf);
}
}
}

export function setConfig(config) {
if (!config) {
logError('Required confirguration not provided');
return;
}
if (!config.url) {
logError('The required url is not configured');
return;
}
if (!config.debounce) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is debounce === 0 not a valid use case?

Maybe this should be:

if (typeof config.debounce !== 'number')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

config.debounce = CONF_DEFAULT_OBSERVER_DEBOUNCE_MS;
logInfo('Set default observer debounce to ' + CONF_DEFAULT_OBSERVER_DEBOUNCE_MS);
}

associateIds(config);
}

config.getConfig('idLibrary', config => setConfig(config.idLibrary));
22 changes: 22 additions & 0 deletions modules/idLibrary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
## ID Library Configuration Example


|Param |Required |Description |
|----------------|-------------------------------|-----------------------------|
|url |Yes | The url endpoint is used to post the hashed email and user ids. |
|target |No |It should contain the element id from which the email can be read. |
|debounce |No | Time in milliseconds before the email and ids are fetched |

### Example
```
pbjs.setConfig({
idLibrary:{
url: <url>,
debounce: 250,
target: 'username'
},
});
```


```
51 changes: 51 additions & 0 deletions test/spec/modules/idLibrary_spec.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
import * as utils from 'src/utils.js';
import * as idlibrary from 'modules/idLibrary.js';

var expect = require('chai').expect;

describe('currency', function () {
let fakeCurrencyFileServer;
let sandbox;
let clock;

let fn = sinon.spy();

beforeEach(function () {
fakeCurrencyFileServer = sinon.fakeServer.create();
sinon.stub(utils, 'logInfo');
sinon.stub(utils, 'logError');
});

afterEach(function () {
utils.logInfo.restore();
utils.logError.restore();
fakeCurrencyFileServer.restore();
idlibrary.setConfig({});
});

describe('setConfig', function () {
beforeEach(function() {
sandbox = sinon.sandbox.create();
clock = sinon.useFakeTimers(1046952000000); // 2003-03-06T12:00:00Z
});

afterEach(function () {
sandbox.restore();
clock.restore();
});

it('results when no config available', function () {
idlibrary.setConfig({});
sinon.assert.called(utils.logError);
});
it('results with config available', function () {
idlibrary.setConfig({ 'url': 'URL' });
sinon.assert.called(utils.logInfo);
});
it('results with config default debounce ', function () {
let config = { 'url': 'URL' }
idlibrary.setConfig(config);
expect(config.debounce).to.be.equal(250);
});
});
});