-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Magic URL Payload Format for Twitter (v1.0.0.0) #198
Comments
I think this approach is too hacky. Considering the fact that we will
support iMgs / files sooner or later, let’s rather use IPFS or some storage
solutions directly. It’s important to make our solutions elegant and don’t
have any single point failure — imagine what if twitter decides to modify
the URL policy, all of our users post lose permanently.
I’m strongly against this way. It’s worse than BaseCJK/Emoji or store in
IPFS and put hash pointers in tweet.
…On Fri, Sep 20, 2019 at 02:26 Neruthes 0x5200DF38 ***@***.***> wrote:
Background
Twitter has a strict length restriction for tweets which we must find ways
to bypass in order to publish armored payloads within one tweet.
Basic Idea
URLs on Twitter will be converted to t.co short links and only the length
of the t.co short link is considered actually occupying characters in the
tweet. We add a prefix before the actual payload to make it look like a
real URL for Twitter, in order to take advantage of this *feature*.
Prefix
We maintain a list of popular websites (like Alexa top 100) and use a
random prefix to avoid basic pattern detection.
Encoding
Since there are restrictions on which characters may be used in URI,
according to IETF and W3C specifications, we use a subtly different payload
encoding method.
Refer to specifications of location object in DOM API for JavaScript.
Prefix
The prefix part always look like https://www.amazon.com/item/233.html.
It has protocol, host, and path.
It does not have user, password, search, hash.
It may or may or have port.
Prefix Randomization (PR)
Prefix Randomization is not mandatory but recommended. For now, it is ok
to maintain a simple list of prefixes.
The prefix has a static part and a dynamic part.
Static part includes host.
Dynamic part includes:
Part Name Value Range Examples
protocol ... http, https
host domain names, IPv4 addresses twitter.com
port implied, 8000, 8080, 9527-12315
path \/[\w-_\.]{0, 24}/ /_233_/ Base-64 Alternative Characters
In Base-64 encoding, we need +, /, and =. These characters need to be
replaced in order to ensure the reliability of URI detection on Twitter.
From To
+ -
/ /
= _ Header Token
We use %20 to mark the payload sequence starting.
Footer Token
We use %40 to mark the payload sequence ending.
Garbage Bytes (GB)
Random garbage bytes may be added after footer token.
Garbage Bytes is not mandatory but recommended.
Separation Token
We use .to mark the separation between two adjacent fields.
------------------------------
Discussion wanted.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#198?email_source=notifications&email_token=ABTAVTKZ74IEQR2MRVCNNULQKSJLDA5CNFSM4IYU2LBKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HMUDQMQ>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABTAVTLUFBPKBABT34FJIMDQKSJLDANCNFSM4IYU2LBA>
.
|
Over all we’re building a protocol and a product. This hacky solution may
help the product in short term but will harm the protocol largely.
…On Fri, Sep 20, 2019 at 02:35 Suji Yan ***@***.***> wrote:
I think this approach is too hacky. Considering the fact that we will
support iMgs / files sooner or later, let’s rather use IPFS or some storage
solutions directly. It’s important to make our solutions elegant and don’t
have any single point failure — imagine what if twitter decides to modify
the URL policy, all of our users post lose permanently.
I’m strongly against this way. It’s worse than BaseCJK/Emoji or store in
IPFS and put hash pointers in tweet.
On Fri, Sep 20, 2019 at 02:26 Neruthes 0x5200DF38 <
***@***.***> wrote:
> Background
>
> Twitter has a strict length restriction for tweets which we must find
> ways to bypass in order to publish armored payloads within one tweet.
> Basic Idea
>
> URLs on Twitter will be converted to t.co short links and only the
> length of the t.co short link is considered actually occupying
> characters in the tweet. We add a prefix before the actual payload to make
> it look like a real URL for Twitter, in order to take advantage of this
> *feature*.
> Prefix
>
> We maintain a list of popular websites (like Alexa top 100) and use a
> random prefix to avoid basic pattern detection.
> Encoding
>
> Since there are restrictions on which characters may be used in URI,
> according to IETF and W3C specifications, we use a subtly different payload
> encoding method.
>
> Refer to specifications of location object in DOM API for JavaScript.
> Prefix
>
> The prefix part always look like https://www.amazon.com/item/233.html.
>
> It has protocol, host, and path.
>
> It does not have user, password, search, hash.
>
> It may or may or have port.
> Prefix Randomization (PR)
>
> Prefix Randomization is not mandatory but recommended. For now, it is ok
> to maintain a simple list of prefixes.
>
> The prefix has a static part and a dynamic part.
>
> Static part includes host.
>
> Dynamic part includes:
> Part Name Value Range Examples
> protocol ... http, https
> host domain names, IPv4 addresses twitter.com
> port implied, 8000, 8080, 9527-12315
> path \/[\w-_\.]{0, 24}/ /_233_/ Base-64 Alternative Characters
>
> In Base-64 encoding, we need +, /, and =. These characters need to be
> replaced in order to ensure the reliability of URI detection on Twitter.
> From To
> + -
> / /
> = _ Header Token
>
> We use %20 to mark the payload sequence starting.
> Footer Token
>
> We use %40 to mark the payload sequence ending.
> Garbage Bytes (GB)
>
> Random garbage bytes may be added after footer token.
>
> Garbage Bytes is not mandatory but recommended.
> Separation Token
>
> We use .to mark the separation between two adjacent fields.
> ------------------------------
>
> Discussion wanted.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#198?email_source=notifications&email_token=ABTAVTKZ74IEQR2MRVCNNULQKSJLDA5CNFSM4IYU2LBKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HMUDQMQ>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABTAVTLUFBPKBABT34FJIMDQKSJLDANCNFSM4IYU2LBA>
> .
>
|
for insider preview, it will be implemented as this: |
So no other feedbacks? |
If you want to delay to February. |
We may initiate a project to examine the availability of these solutions and arrange implementations accordingly. Also, I prefer not to put backward compatibility at risk, unless we revert the banner-removal commit and pretend it is always early-access beta test version. |
These will require much amount of work and I prefer putting these resources on features which have greater priority, including new dashboard, automatic recipient amending, and Misakanet. |
@neruthes
3->5 need to be relatively fast since we don't want to got banned etc. |
IPFS can be an option for fallback, with regards to our Principle of Saturation. I have no idea how much times does it require to build IPFS compatibility. Base-Emoji may have difficulties. Keep watching #139. And explain |
Instead of encoding it as a path, it is also very viable to encode it as a #-fragment. This has the benefit of allowing native base64. For example: Another thing we can use for an alternative-base64 is the RFC 4648 base64url. The equal-signs can be discarded. |
@Artoria2e5 good idea. |
Updated with link to RFC 4648. I would still recommend not to rely on |
I appears that this RFC is open for long enough and a lot of improvements have been merged. I will move this RFC to become a current technical specification on Tuesday. Later on, all suggestions for modification will be difficult. |
@neruthes similar functionality like 'all maskbook user can see' etc. As we discussed before in the chat. This is mainly for growth. Imagine some KOL post something encrypted. |
@neruthes I will talk to our friends 👬 who share similar vision and using ipfs now. Will try to talk to textile as well. |
For this matter, we may amend UserGroup Abstraction Model (#12). It is not within the scope of RFC 198. |
ACK |
Good to hear that. But I recommend moving the IPFS middleware to the next milestone as we all see the risk of introducing extra delay. |
The internal payload structure may be subject to refactoring as we design #329 |
Ratified. |
Background
Twitter has a strict length restriction for tweets which we must find ways to bypass in order to publish armored payloads within one tweet.
Basic Idea
URLs on Twitter will be converted to
t.co
short links and only the length of thet.co
short link is considered actually occupying characters in the tweet. We add a prefix before the actual payload to make it look like a real URL for Twitter, in order to take advantage of this feature.Prefix
We maintain a list of popular websites (like Alexa top 100) and use a random prefix to avoid basic pattern detection.
Encoding
Since there are restrictions on which characters may be used in URI, according to IETF and W3C specifications, we use a subtly different payload encoding method.
Refer to specifications of
location
object in DOM API for JavaScript.Prefix
The prefix part always look like
https://www.amazon.com/item/233.html
.It has
protocol
,host
, andpath
.It does not have
user
,password
,search
,hash
.It may or may or have
port
.Prefix Randomization (PR)
Prefix Randomization is not mandatory but recommended. For now, it is ok to maintain a simple list of prefixes.
The prefix has a static part and a dynamic part.
Static part includes
host
.Dynamic part includes:
protocol
http
,https
host
IPv4 addressestwitter.com
port
path
\/[\w-_\.]{0, 24}/
/_233_/
Update: URIs whose
host
is an IPv4 address will not be converted to short links on Twitter.Base64 Alternative Characters
In Base64 encoding, we need
+
,/
, and=
. These characters need to be replaced in order to ensure the reliability of URI detection on Twitter. We should follow RFC 4648 for URI-safe Base64 codec.Padding characters may optionally be discarded.
Header Token
We use
%20
to mark the payload sequence starting.Footer Token
We use
%40
to mark the payload sequence ending.Garbage Bytes (GB)
Random garbage bytes may be added after footer token.
Garbage Bytes is not mandatory but recommended.
Separation Token
We use
.
to mark the separation between two adjacent fields.Discussion wanted.
The text was updated successfully, but these errors were encountered: