Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[amp story shopping] JSON schema #37746

Closed
processprocess opened this issue Feb 22, 2022 · 17 comments
Closed

[amp story shopping] JSON schema #37746

processprocess opened this issue Feb 22, 2022 · 17 comments

Comments

@processprocess
Copy link
Contributor

Description

A JSON schema to spec the API for the shopping config and use for validation within tooling.

This file could be hosted on GitHub as part of the project but we do not want to include it in our bundle.

We will need to keep it in-sync with any config changes.

cc @calebcordry

Alternatives Considered

We could point a link to the validation javascript file, but that would require people to read through the component code.
JSON schema is scalable and standardized which will help ease of integration into tooling.

Additional Context

More context within discussion on shopping validation I2I.

@alanorozco
Copy link
Member

I propose that instead of keeping the JSON schema in sync with the implementation code, we generate the code from schema instead.

We can use ajv to generate the code. See an implementation on #37794. Output size is small enough from what I've seen.

@newmuis
Copy link
Contributor

newmuis commented Mar 1, 2022

@jshamble when you have bandwidth, can you try writing a JSON schema for the config JSON? I think it should be fairly straightforward, given that the structure isn't very verbose.

Once we have a test schema based on #36460, it should be pretty easy to see the bundle size of the output concretely, which I know is a concern that @gmajoulet brought up.

I'd love to do this if we can keep bundle size small; thanks @alanorozco for investigating!

@newmuis
Copy link
Contributor

newmuis commented Mar 1, 2022

Actually, I forgot I'd experimented with writing a JSON schema spec to represent an entire story (doesn't quite have feature parity, but it's a reasonable representation): https://newmuis.github.io/story-schema/

I have a compile-schema script that does:

make-dir dist && ajv compile -s 'root.schema.json' -r 'schemas/**/*.schema.json' -o 'dist/story-validator.js' --strict --all-errors=true --messages=false --inline-refs=false --code-optimize=10

With the full story schema, it gets:

$ yarn run compile-schema && gzip ./dist/story-validator.js && wc -c < ./dist/story-validator.js.gz
16723

Then I tried simplifying the schema to:

{
  "$id": "https://github.com/newmuis/story-schema/root.schema.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object"
}

This is one of the simplest schemas you can have AFAIK, and it yields:

$ yarn run compile-schema && gzip ./dist/story-validator.js && wc -c < ./dist/story-validator.js.gz
495

I think this bounds the gzipped, minified output between [495, 16723] bytes. I'm reasonably certain that the shopping JSON is (currently) closer to the simple schema than the story schema that I've shared (which has many references between subschemas) so hopefully it's viable. I think it's still worth @jshamble looking into writing out the schema, and testing that!

@newmuis
Copy link
Contributor

newmuis commented Mar 1, 2022

tl;dr

  • JSON schema validator compiled via ajv has a minified, gzipped bundle size output of 3.3kB. It has tooling support and an ecosystem built around it, and could set a precedent for arbitrary JSON validation, but it forces a larger bundle size.
  • Custom validation by @jshamble has a minified, gzipped bundle size output of 1.2kB. It will mean shipping a separate JSON schema spec (for tools to integrate with) and keeping it in sync with the custom validation, and likely has less robust tooling support, but it's a smaller bundle size impact.

@alanorozco @calebcordry @gmajoulet for your thoughts on the bundle size


Actually, while procrastinating from... other things... I just threw together a quick schema. @jshamble and @processprocess's outline in #36460 made it pretty easy, and I followed it pretty closely:

{
  "$id": "https://github.com/newmuis/story-schema/test.schema.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "additionalProperties": false,
  "required": [
    "items"
  ],
  "properties": {
    "items": {
      "type": "array",
      "items": {
        "type": "object",
        "additionalProperties": false,
        "required": [
          "productUrl",
          "productId",
          "productBrand",
          "productTitle",
          "productPrice",
          "productPriceCurrency",
          "productImages",
          "productDetails",
          "aggregateRating"
        ],
        "properties": {
          "productUrl": { "type": "string" },
          "productId": { "type": "string" },
          "productBrand": { "type": "string" },
          "productIcon": { "type": "string" },
          "productTitle": { "type": "string" },
          "productDetails": { "type": "string" },
          "productPrice": { "type": "number", "minimum": 0 },
          "productPriceCurrency": {
            "enum": [
              "AED",
              "AFN",
              /* like, 150+ more currencies */
              "ZMW",
              "ZWL"
            ]
          },
          "productImages": {
            "type": "array",
            "items": {
              "type": "object",
              "additionalProperties": false,
              "required": [ "url", "altText" ],
              "properties": {
                "url": { "type": "string" },
                "altText": { "type": "string" }
              }
            }
          },
          "aggregateRating": {
            "type": "object",
            "additionalProperties": false,
            "required": [ "ratingValue", "ratingCount", "ratingUrl" ],
            "properties": {
              "ratingValue": { "type": "number", "minimum": 0 },
              "ratingCount": { "type": "number", "minimum": 0 },
              "ratingUrl": { "type": "string" }
            }
          }
        }
      }
    }
  }
}

side note 1: we should not literally use the schema above, since we should probably e.g. verify that the URLs are URLs (if we care about that), verify that the IDs are HTML-ID-like, etc. But I'm not sure those will substantially change the bundle size.

Then I re-ran the same methodology as above:

$ yarn run compile-schema && gzip ./dist/story-validator.js && wc -c < ./dist/story-validator.js.gz
3464

So, this is 3.3kB gzipped. Of course, as the schema itself grows, as will the bundle size. Again, I should caveat that I haveno idea whether I'm compiling this with the most efficient options.

I then also took just the validation piece of @jshamble's #37474, and added to it the currency validation (the big list of ~180 currencies), to give me this:

const productImagesValidation = {
  'url': [validateRequired, validateURLs],
  'alt': [validateRequired, validateString],
};

const aggregateRatingValidation = {
  'ratingValue': [validateRequired, validateNumber],
  'reviewCount': [validateRequired, validateNumber],
  'reviewUrl': [validateRequired, validateURLs],
};

export const productValidationConfig = {
  /* Required Attrs */
  'productUrl': [validateRequired, validateURLs],
  'productId': [validateRequired, validateString],
  'productTitle': [validateRequired, validateString],
  'productBrand': [validateRequired, validateString],
  'productPrice': [validateRequired, validateNumber],
  'productImages': [
    validateRequired,
    createValidateConfigArray(productImagesValidation),
  ],
  'productPriceCurrency': [validateRequired, validateString, validateCurrency],
  'aggregateRating': [
    validateRequired,
    createValidateConfig(aggregateRatingValidation),
  ],
  /* Optional Attrs */
  'productIcon': [validateURLs],
  'productTagText': [validateString],
  'ctaText': [validateNumber],
  'shippingText': [validateNumber],
};

/**
 * @typedef {{
 *  items: !Array<!ShoppingConfigDataDef>,
 * }}
 */
let ShoppingConfigResponseDef;

/**
 * Used for keeping track of intermediary invalid config results within Objects or Arrays of Objects.
 * @private {boolean}
 */
let isValidConfigSection_ = true;

/**
 * Validates an Object using the validateConfig function.
 * @param {?Object=} validation
 * @return {boolean}
 */
function createValidateConfig(validation) {
  return (field, value) => {
    if (!isValidConfigSection_) return;
    isValidConfigSection_ = validateConfig(value, validation, field + ' ');
  };
}

/**
 * Validates an Array of Objects using the validateConfig function.
 * @param {?Object=} validation
 * @return {boolean}
 */
function createValidateConfigArray(validation) {
  return (field, value) => {
    for (const item of value) {
      if (!isValidConfigSection_) return;
      isValidConfigSection_ = validateConfig(item, validation, field + ' ');
    }
  };
}

/**
 * Validates if a required field exists for shopping config attributes
 * @param {string} field
 * @param {?string=} value
 */
export function validateRequired(field, value) {
  if (value === undefined) {
    throw Error(`Field ${field} is required.`);
  }
}

/**
 * Validates if string type for shopping config attributes
 * @param {string} field
 * @param {?string=} str
 */
export function validateString(field, str) {
  if (typeof str !== 'string') {
    throw Error(`${field} ${str} is not a string.`);
  }
}

/**
 * Validates number in shopping config attributes
 * @param {string} field
 * @param {?number=} number
 */
export function validateNumber(field, number) {
  if (
    (typeof number === 'string' && !/^[0-9.,]+$/.test(number)) ||
    (typeof number !== 'string' && typeof number !== 'number')
  ) {
    throw Error(`Value ${number} for field ${field} is not a number`);
  }
}



export function validateCurrency(field, str) {
  return [
    "AED",
    "AFN",
    /* like, 150+ more currencies */
    "ZMW",
    "ZWL"
  ].includes(str);
}

/**
 * Validates url of shopping config attributes
 * @param {string} field
 * @param {?Array<string>=} url
 */
export function validateURLs(field, url) {
  if (url === undefined) {
    return;
  }

  const urls = Array.isArray(url) ? url : [url];

  urls.forEach((url) => {
    assertHttpsUrl(url, `amp-story-shopping-config ${field}`);
  });
}

/**
 * Validates the shopping config of a single product.
 * @param {!ShoppingConfigDataDef} shoppingConfig
 * @param {!Object<string, !Array<function>>} validationObject
 * @param {?string} optParentFieldName
 * @return {boolean}
 */
export function validateConfig(
  shoppingConfig,
  validationObject = productValidationConfig,
  optParentFieldName = ''
) {
  let isValidConfig = true;

  Object.keys(validationObject).forEach((configKey) => {
    const validationFunctions = validationObject[configKey];
    validationFunctions.forEach((fn) => {
      try {
        /* This check skips optional attribute validation */
        if (
          shoppingConfig[configKey] !== undefined ||
          validationFunctions.includes(validateRequired)
        ) {
          fn(configKey, shoppingConfig[configKey]);
        }
      } catch (err) {
        isValidConfig = false;
        user().warn('AMP-STORY-SHOPPING-CONFIG', `${optParentFieldName}${err}`);
      }
    });
  });

  return isValidConfig;
}

When I minify and gzip this (excluding the core AMP dependencies like assertHttpsUrl), it's ~1.2kB.

side note 2: If we want to verify many different types of things at runtime, we could also consider the method of shipping a minified json schema validator (ajv is ~35kB min+gzip, but jsonschema is ~7kB min+gzip), then run all the schemas against it at runtime, rather than compiling the separate validation functions.

@calebcordry
Copy link
Member

Thanks for the awesome analysis @newmuis !

It seems like the main question is if the extra 2.1 kB is worth the maintenance cost of manually keeping the schema in sync? I really like the idea of JSON Schema but unfortunately I think the 2.1kB might be too large of a price to pay (unless, as Jon mentioned, we might be missing some compiler optimizations). I believe @gmajoulet has data showing real impact on users making it to "story load" for a magnitude of ~3kB.

@gmajoulet
Copy link
Contributor

About bundle size, as long as it is NOT in the critical loading path (e.g. v0/amp-story/amp-analytics...) it's acceptable.
Do we know for a fact that creation tools will use it? Have we heard complaints from creation tools about other components validation?

@newmuis
Copy link
Contributor

newmuis commented Mar 1, 2022

Other components use regular AMP validation, do I think they get coverage that we won't here. The only story features I can think of that use JSON thus far (ads, analytics, consent) do not change the narrative and can likely have a single setup per-tool, just plugging in a value or two. This has many dimensions of configurability across tool, shopping vendor, and creator. It probably benefits us to have a clear spec and a canonical validation flow so folks can make sure their getting it right

@gmajoulet
Copy link
Contributor

My main point is that this has a strict negative impact on end users, just so we don't have to maintain and keep in sync custom validation + schema. DevX is great, but users experience should go first.
I'd rather solve this with a few unit tests running the configuration against both the JSON schema and the custom validation, than shipping extra kbs to our bundles.

Making the shopping bundle slightly heavier is fine as it is lazy loaded after the story is rendered, but that means we couldn't use it for anything part of the main bundles (as @newmuis mentioned, Ads, analytics, consent).

However I'm not going to be resistant to this change, as long as it doesn't impact the main bundles.

@newmuis
Copy link
Contributor

newmuis commented Mar 2, 2022

Got it. I think all I care about is that we have a distributable means of validation, but how we actually validate against it doesn't matter as much to me. But it's actually quite important that we're sure that the validation used by tools and the validation used at runtime work the same way (down to the little nuances) -- as long as we can guarantee that, we're good 🙂

@jshamble
Copy link
Contributor

jshamble commented Mar 3, 2022

@newmuis For the validate currency function, I am in favor of using javascript's built in Intl.NumberFormat over the array of currency codes listed above, it will simply throw an error if a currency is unrecognized (which is what we want given the way our validation is setup). The complete list of currency codes I got from here minified and gzipped is around 7-8kb - not ideal compared to the one-liner below.

/**
 * Validates currency in shopping config attributes
 * @param {string} field
 * @param {?string=} curr
 */
export function validateCurrency(field, curr) {
  // This will throw an error on invalid currency codes.
  Intl.NumberFormat('en-EN', {currency: curr}).format(0);
}

@jshamble
Copy link
Contributor

jshamble commented Mar 3, 2022

I ran some tests compiling my JSON validator with this minifier
and am also getting 1.2 KB after gzip compression.

Screen Shot 2022-03-03 at 12 51 31 AM

For ajv, I used ajv-cli to complie Jon's custom schema (listed above).

Getting 3.4 KB after gzip compression.

Screen Shot 2022-03-03 at 9 14 01 AM

As I mentioned in the comment right above this one, using built in javascript Intl.NumberFormat for validation will save us a ton of work from including a 300+ currency block in the schema file. One could argue that we can instead use a hybrid approach, utilizing Intl.NumberFormat with ajv, but I'm thinking that this double-dipping would only hurt us in the end - we would essentially be maintaining two libraries - ajv and our validation modifications to ajv - at this point, we may as well run our own validator.

I cannot predict what fields we will add in the future, but what I can say is that there would potentially be more edge cases (like the validate currency field) that we can easily handle with built in JS functions.

One good example I can think of is the
Date time Format, I can easily imagine a publisher wanting to keep track of dates / times / timezones, of when they first published the product to the market (or when the product was created, as in the case of photography, there is often a timestamp for the photo). This method would provide us with the same robust error handling as Intl.NumberFormat, i.e. would throw an error on invalid timezone formatting.

I would make the same argument as above, rather than listing the valid timezones in a schema (i.e. UTC,PST, etc.) and adding on more kb, why not just use the already built-in js functions and rely on their error handling?

@newmuis
Copy link
Contributor

newmuis commented Mar 4, 2022

It's a tradeoff, I think. This is undoubtedly easier and lighter when we only concern ourselves with JavaScript stacks.

But, if you imagine a stories creation tool written in anything other than JS (of which there are several.... It might actually be the majority of tools), they would then need to understand how to emulate the behavior of these JavaScript functions.

The nice thing about JSON schema is that it is guaranteed to be self-contained and has tooling support across many languages. It seems reasonably painful for tools to need to, on a field-by-field basis, understand JavaScript currency handling, date parsing, etc, especially given that the differences are often quite nuanced.

@processprocess
Copy link
Contributor Author

processprocess commented Mar 4, 2022

To parse the core goals from this discussion to help guide our decision:
1. Distributable / scalable means of validation that works the same between runtime and tools (down to the nuances).
2. Does not impact critical loading path.

To achieve 1:
Following industry standards ensures as much consistency as possible. Maintaining custom validation that works the same down to the nuances may not be realistic. Technical debt can potentially compound as fields are added and edited.

To achieve 2:
Communicate to publishers to not use the shopping experience on the first page of stories as a best practice.
Communicate to tools to disable the ability to create the shopping feature on the first page of stories.

Unless we believe there are strong use-cases for shopping on the first page (like single page stories), JSON schema + ajv may be the most scalable and maintainable approach.

@jshamble
Copy link
Contributor

After a conversation with @newmuis, for the shopping launch, we have decided to move forward with both solutions, i.e. JSON schema + ajv and JS client-side validation.

The main reason is that: What if validation on the tooling level fails?
Especially for fields like URLs, any invalid / malformed URL could potentially fall though, posing a hazard to publishers, posing a security threat. Another level of validation would mitigate that risk.

@alanorozco
Copy link
Member

We might not need to take the the size tradeoff of using enum for currency code vs. validating with Intl.

ajv allows extending syntax with addKeyword, which allows us to define the output code.

With some remapping, we can generate the ideal Intl code that avoids enum, while providing the enum for other systems that ingest the schema directly.

@jshamble jshamble closed this as completed Apr 6, 2022
@swissspidy
Copy link
Contributor

swissspidy commented Apr 6, 2022

Is the new schema at https://github.com/ampproject/amphtml/blob/main/examples/amp-story/shopping/product.schema.json also accessible via some URL at amp.dev or so? Or are tools expected to grab it from GitHub for now?

Also, does this schema adhere to a specific version of JSON schema? There's no $id or $schema property.

@jshamble
Copy link
Contributor

jshamble commented Apr 7, 2022

@swissspidy

I don't think there currently is a URL at amp.dev which hosts the product.schema.json, but 'm happy to add a patch and host it there if that is convenient for tooling - let me know.

$id at the top level doesn't have a purpose if it's not referenced recursively

without a specified $schema ajv defaults to the most recent supported schema. All supported schemas are here: https://ajv.js.org/guide/schema-language.html

VSCode does not support anything after draft-2020-12 yet, but we can still specify the $schema if needed, let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants