Making custom token for use in a custom plugin #167

nrakic90 · 2016-09-29T15:09:08Z

Hello.

First I want to say good job on this plugin.
I am making a plugin that will detect custom format, something in the lines of "keyword://test/test1/test2".
I managed to make a plugin based on what I saw in hasthag.js and mention.js . I am having trouble making a token out of "keyword". Can you explain this process a bit? I've attached a "sketch" of my plugin, would you kindly tell me what am I doing wrong? I would be grateful. All the best!

nfrasser · 2016-09-30T18:48:47Z

Hey @nrakic90, the first thing I wanted to mention is that the plugin API is largely undocumented and is subject to change in the future. Given that, kudos to you for figuring this out.

The big roadblock you'll run into next is due to a fundamental problem with the plugin API: There's no easy way to integrate new text tokens with the rest of the link-parsing state machine. I'm going to try my best to help you out here, but this is going to get complicated.

The first thing you need is to generate intermediate CharacterStates for the keyword text token. This will involve a call to the stateify function after you've defined KEYWORD_TOKEN. That should look like this:

let intermediateKeywordStates = stateify('keyword', S_START, KEYWORD_TOKEN, linkify.scanner.tokens.DOMAIN);

Then you'll need a loop like this for the intermediate states, since those could have jumps to domains (e.g., key is an intermediate state that could be a domain, and keys is a domain but even though it started with the key, it will never resolve to keyword). ALPHANUM should be defined like this.

See how the localhost text token is handled for a real example of this.

In your example, seeing the text token keyword jumps you into the S_KEYWORD state from the S_START state. But what happens if instead of // you see .com? Then you'd expect keyword.com to be of type url. But text tokens currently are not polymorphic, so you'd have to manually define jumps to and from S_KEYWORD. Basically, you'll need to duplicate all lines in parser.js that contain S_DOMAIN and replace S_DOMAIN with S_KEYWORD.

TL;DR, this is doable but not pretty. There are definitely plans on improving this interface to abstract-away all this complexity, but for now that's all the help I can offer.

nrakic90 · 2016-10-03T08:23:24Z

Thank you so much for an in-depth explanation, I really appreciate it! I was experimenting with statefy at one point but then gave it up because I didn't have all the pieces of the puzzle apparently.
Thanks again!

toger5 · 2022-01-19T11:51:58Z

Has this gotten any easier. I really would like to use a custom token!

toger5 · 2022-01-19T16:39:53Z

In the docs it seems like, it should be possible to do S_START.tt("a", acceptedState)
to transition on an 'a'.
From the documentation
https://github.com/Hypercontext/linkifyjs/blob/a38611393a35b922b34632a30a79fb709c745b2e/packages/linkifyjs/src/core/fsm.js#L52

This does not seem to work. How is the word character meant in the docs.

nfrasser · 2022-01-21T16:33:07Z

@toger5 I'm working on some additional examples/docs for this in an upcoming release. For now, check out the hashtag plugin for reference

Notes:

Linkify has two state machines for tokenizing strings, the scanner and parser
The scanner groups string characters into smaller, self-container tokens such as NUM (a number) or TLD (any top-level domain name like "com")
- The starting state (S_START) is scanner.start
The parser (used in the hashtag plugin example) groups text tokens from the scanner into "multi-tokens" such as URL, EmailAddress or Hashtag
- The starting state is parser.start

Similarly to how adding the hashtag multi-token works in the example plugin, you can add a new scanner token. For example:

const GreetingState = scanner.start
  .tt('h')
  .tt('e')
  .tt('l')
  .tt('l')
  .tt('o', 'GREETING') // create accepting state

The scanner will recognize the word "hello" as a GREETING token. You can capture the states and branch off to recognize additional GREETINGs:

const HState = scanner.start.tt('h')
const GreetingState = HState
  .tt('i', GreetingState)  // don't create a new accepting state, use the existing one

Now both "hi" and "hello" are recognized as GREETING tokens. You can similarly use the GREETING token with the scanner:

const GreetingMultiToken = utils.createTokenClass('greeting', { 
  isLink: true,
  toHref() {
    return `javascript:alert("${this.toString()}!")`
 })
parser.start.tt('GREETING', GreetingMultiToken)

There is no way to create tokens from arbitrary regular expressions right now with the tt method
- You can, however, emulate anything that's possible with a regular expression by capturing the states and transitioning between them multiple times (the second argument to tt is either an accepting token or any previously-captured state).
- This may improve in a future release.

toger5 · 2022-01-21T21:05:12Z

This is super helpful thank you very much for the detailed comment!
I was trying something like this:

const acceptingState = createTokenClass("something")
scanner.start
  .tt('h')
  .tt('e')
  .tt('l')
  .tt('l')
  .tt('o', acceptingState)

but that did not seem to work.
For me PARAM1 and PARAM2 in const PARAM1 = state.tt('TOKEN') and state.tt('TOKEN', PARAM2) were basically the same except, that in the second case PARAM2 needs to be created before.
In your example they seem to differ, so that PARAM2 can also be used to add a new token called GREETING.
But this seems to indicate, that there is another difference between PARAM1 and PARAM2

// (A)
const GreetingState = HState
  .tt('i', GreetingState)  // don't create a new accepting state, use the existing one
// VS
// (B)
const GreetingState = HState
  .tt('i')

What I tried is (A) but that does not seem to work. (B) however does. What exactly is the difference between those two?

nfrasser added the custom protocol label Mar 3, 2018

nfrasser added this to the 3.0 milestone Mar 11, 2021

nfrasser mentioned this issue Mar 11, 2021

v3.0 #318

Merged

nfrasser modified the milestones: 3.0, 4.0 Oct 14, 2021

nfrasser self-assigned this Oct 14, 2021

toger5 mentioned this issue Jan 18, 2022

Parse matrix-schemed URIs matrix-org/matrix-react-sdk#7453

Merged

nfrasser mentioned this issue Jan 21, 2022

An easy way to compatible special characters in mention and hashtag #312

Closed

WIStudent mentioned this issue Jun 6, 2023

Trying to add & detect custom prefixes #368

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making custom token for use in a custom plugin #167

Making custom token for use in a custom plugin #167

nrakic90 commented Sep 29, 2016

nfrasser commented Sep 30, 2016

nrakic90 commented Oct 3, 2016

toger5 commented Jan 19, 2022

toger5 commented Jan 19, 2022 •

edited

Loading

nfrasser commented Jan 21, 2022

toger5 commented Jan 21, 2022 •

edited

Loading

Making custom token for use in a custom plugin #167

Making custom token for use in a custom plugin #167

Comments

nrakic90 commented Sep 29, 2016

nfrasser commented Sep 30, 2016

nrakic90 commented Oct 3, 2016

toger5 commented Jan 19, 2022

toger5 commented Jan 19, 2022 • edited Loading

nfrasser commented Jan 21, 2022

toger5 commented Jan 21, 2022 • edited Loading

toger5 commented Jan 19, 2022 •

edited

Loading

toger5 commented Jan 21, 2022 •

edited

Loading