Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle image pasted/dropped from various sources (external websites, base64-encoded, etc.) #5152

Open
mjadobson opened this issue Jun 20, 2018 · 14 comments · May be fixed by #15052
Open

Handle image pasted/dropped from various sources (external websites, base64-encoded, etc.) #5152

mjadobson opened this issue Jun 20, 2018 · 14 comments · May be fixed by #15052
Labels
package:image squad:core Issue to be handled by the Core team. support:2 An issue reported by a commercially licensed client. type:feature This issue reports a feature request (an idea for a new functionality or a missing option).

Comments

@mjadobson
Copy link

When copy-pasting a local image into the editor, it will be uploaded as expected.

However, if you copy an image from another website and paste it into the editor, it will use the previous url as the source, rather than uploading it.

Is this the intended behaviour? This can sometimes cause issues with regards to hotlinking, dead links, security, etc.

mjadobson referenced this issue in mjadobson/ckeditor5-image Jun 22, 2018
Previous check ignored copy pasting images from other websites, as these drag events include html data in addition to file data.
@Reinmar
Copy link
Member

Reinmar commented Jun 22, 2018

There are multiple possible scenarios what we may get when someone pastes/drops an image from another website or application. It will depend on how someone copied that content and on the browser. We may get:

  • HTML with base64-encoded images,
  • HTML with images which src attributes point to an external website,
  • HTML with images which src are references to files in the clipboard's dataTransfer.files,
  • just images (as dataTransfer.files),
  • and I'm sure I missed something.

I haven't checked what browsers really do, so these are all theoretical scenarios. I expect to see most of them in real life :D

Actually, you even described this in the PR:

However, they will still not be triggered when drag + dropping an image from a website. This is because the drag event doesn't include file data, but text/uri-list data, which includes the image url(s). Similarly, images won't be uploaded if selecting and copying+dragging/pasting a bunch of html.

In these instances, due to potential CORS issues, you would need to have a server endpoint that downloads and saves images directly from these external urls.

Additionally, it may also be worthwhile having a config option to validate img src, to only insert them when the url matches certain patterns. This could be used to prevent re-uploading previously uploaded content, and restrict hotlinking from other websites.

So, this gets complex. We need configuration (this needs to be opt-in behaviour and we need src validation) and we'll need an endpoint to retrieve external images. Plus, we need to check what we really get from the browsers.

The answer, therefore, is that we skipped those cases for now. We planned to work on this in some, undefined future.

For now, I'm also unsure whether all these scenarios should be handled by one plugin or whether we should split this. There's a risk that this plugin will grow significantly. We should review which potential input options can be handled in similar way to what we do so far. We also need to define configuration in a way which forsee the other input options which may be implemented later.

@Reinmar Reinmar changed the title Inconsistent image pasting Handle image pasted/dropped from various sources (external websites, base64-encoded, etc.) Jun 22, 2018
@mjadobson
Copy link
Author

mjadobson commented Jun 22, 2018

Interestingly, for issue https://github.com/ckeditor/ckeditor5-upload/issues/5 , the upload works as expected with current version of ckeditor5 (at least on Windows Chrome/Edge from MSPaint).

For my use case at least, I wish to have all images uploaded because:

  • They will need to be synced to other devices and accessible offline
  • Images from free hosting providers often go dead, hit usage limits, prevent hotlinking, etc
  • Including images from non-secure websites shows warnings
  • Data-uri images slow down API calls, and sometimes fail to save if HTML output is too big to fit in database.

Most cases are handled by the current functionality (+ my PR). I will probably filter out any other cases with userland code.

For configuration, maybe it would be useful to let users provide a function, as it could get fairly complex with flags for every use case. For example:

import upload // promise, takes fileData, resolves to upload dest
import uploadFromUrl // promise, takes url resolves to upload dest
import convertDataURI // takes datauri, converts to blob data

// config - upload everything
function handleImage(type, data, url) {
  switch (type) {
    case 'file':
      return upload(data);
    case 'data-img':
      return data ? upload(data) : upload(convertDataURI(url));
    case 'url':
      if (url.match(/^https:\/\/mydomain.com/) return url;
      else return uploadFromUrl(url);
    case 'webkit-fake-image':
      throw new Error('Image upload not supported in Safari.')
  }
}

// config - no uploads, only include small data-img
function handleImage(type, data, url) {
  switch (type) {
    case 'file':
      return null;
    case 'data-img':
      if (url.length < 10000) return url;
      else return null;
    case 'url':
      return url;
  }
}

@Reinmar
Copy link
Member

Reinmar commented Jun 25, 2018

cc @pjasiun @wwalc

@Reinmar
Copy link
Member

Reinmar commented Jun 25, 2018

For configuration, maybe it would be useful to let users provide a function, as it could get fairly complex with flags for every use case. For example:

It'd actually need to be simpler because we can only expose one upload() function and should work with File instances, URLs and base64. That's because all these uploads need to go through our FileRepository.

So, a shouldUpload( type, data, url ) -> {Boolean} function should be enough. However, we may also need to extend UploadAdapter so it has a method for uploading files from external websites.


What worries me still is handling all of these in the clipboardInput's listener which you tried to patch. Without analysing all the possible input scenarios I'm not able to tell now how to handle them and whether we don't need something more.

@Reinmar
Copy link
Member

Reinmar commented Jun 25, 2018

Since a complete solution to this ticket is a longer-term project, I don't want to block ckeditor/ckeditor5-image#212 with these discussions. However, it still requires some work to be merged.

@pjasiun
Copy link

pjasiun commented Jun 26, 2018

For sure what we have now is an MVP for the image upload, there is much more to be handled.

4 years ago I did a research for CKEditor 4 and I learn that that paste and drop handling is very complex. You can:

  • drop file,
  • paste file,
  • paste image (for instance from MS Paint),
  • drop HTML with image,
  • paste HTML with image,
  • drop HTML image (only image without anything else),
  • paste content from MS Word with an image.

Each browser has (had) some quirks, and you can get a very strange content. It was very often that there were not common patterns, for instance, the same case, on the same browser may work differently depending on the action you did (drop gives you different results than paste), on the OS (different format on Mac and on Windows), etc.

As @Reinmar said, you can get:

  • HTML with base64-encoded images,
  • HTML with images which src attributes point to an external website,
  • file in dataTransfer.files,
  • file in dataTransfer.items,
  • might be more.

For sure sometimes you get mixed content: <img> in HTML and file in dataTransfer.

I am not sure if it should be split between multiple plugins since cases might be mixed. I am not sure about the PR above, because it covers only a small part of cases and I do not know if it will not break other. I am not sure how extensible API we will be able to provide. What I am sure is that we need to do a deep research to learn all cases we need to handle, and then we will be able to design a solution.

@wwalc
Copy link
Member

wwalc commented Jun 26, 2018

Just a note that adding one particular host/domain to the whitelist is not enough as many times companies have/use multiple domains/hosts.

Anyone heavily interested in this feature: please add 👍 to the first post.

@wwalc
Copy link
Member

wwalc commented Aug 6, 2018

It'd actually need to be simpler because we can only expose one upload() function and should work with File instances, URLs and base64. That's because all these uploads need to go through our FileRepository.

So, a shouldUpload( type, data, url ) -> {Boolean} function should be enough. However, we may also need to extend UploadAdapter so it has a method for uploading files from external websites.

In general to correctly handle the use case mentioned in the original report there would need to be the possibility to use at least two different upload functions depending on the context. The first upload function already handles uploading local images (also base64-encoded etc.).

The second function, to handle <img> elements with src pointing to external URLs, would have to work differently. Such upload function would need to connect to the server side endpoint and ask it to upload the image (providing just the URL of it) and return back the correct url. It has to be done this way due to CORS restrictions.

In theory, uploading images coming from other domains could happen at a different stage, for example when saving the document in the CMS database, together with other post-processing operations like XSS sanitization. This way the system would be more bulletproof as always someone may intercept the POST request and inject an image coming from external URL anyway.

On the other side, the sooner the user receives a feedback that an image could not be uploaded from external URL the better. There may be situations in which the server side script to fetch images will not work, for example in case of copying an image from a website where only logged in users can see resources or from some intranet domain. Taking this into consideration, making an attempt to make "a local copy" of an image as soon as it gets pasted into the editor makes more sense.

@tomaszhlawiczka
Copy link

Does it make sens for you to make first step with sth simple like bool shouldUpload(type, data, url) and then eventually extend it?

(I just don't trust persistence of any external resources so prefer to have all images on my side)

@pjasiun
Copy link

pjasiun commented Aug 7, 2018

I am afraid that simple bool shouldUpload(type, data, url) is not enough. To make it works we need to be able to handle upload in at least 3 format:

  • File,
  • Base64 data,
  • URL to the external image.

We need to recognize and handle all cases properly to make this work, what makes this ticket quite complicated.

@taocz
Copy link

taocz commented Sep 4, 2018

Lazy upload image:

  1. Query all image tags
  2. Check the "src"
  3. Upload images if not uploaded

@mlewand mlewand transferred this issue from ckeditor/ckeditor5-image Oct 9, 2019
@mlewand mlewand added this to the backlog milestone Oct 9, 2019
@mlewand mlewand added status:confirmed type:feature This issue reports a feature request (an idea for a new functionality or a missing option). package:image labels Oct 9, 2019
@dirkjf
Copy link

dirkjf commented Nov 16, 2019

@taocz, I didexactly what you suggested:

(async() => {
    let files = [];
    for (let img of images) {
        let response = await fetch(img.src);
        let blob = await response.blob();
        files.push(new File([blob], 'image', response));
        console.log(response);
    }
    editor.execute('imageUpload', { file: files } );
})();

@canveed
Copy link

canveed commented Jul 11, 2021

Any updates/solutions? Trying to upload a pasted image, searching for any solution

@astro2049
Copy link

astro2049 commented Sep 27, 2021

I implemented the lazy upload mentioned by @taocz with Axios, here are the additional logics to go through for a newly created imageElement:

  1. Check if the image is on server, return if it is
  2. Download the image from internet
  3. Upload the image to server
const additionalUploadImage = (imageElement) => {
    let url = imageElement.getAttribute('src')

    // return if the image is already on server
    if (url.includes(process.env.SERVER_ADDRESS)) {
    return
    }

    ;(async () => {
    let filename = url.split('/').pop().split('#')[0].split('?')[0] // extracts image's filename, see https://stackoverflow.com/a/36756650/15630163

    let response = await axios.get(url, {
        responseType: 'blob',
    })
    let blob = response.data

    let formData = new FormData()
    formData.append('upload', blob, filename) // form name and file name can be customized

    let uploadResult = await axios.post(
        process.env.SERVER_ADDRESS + '/files/upload', // route can be customized
        formData,
    )

    // replace the image's url if it's successfully uploaded
    if (uploadResult.status === 200) {
        imageElement._attrs.set('src', uploadResult.data.url)
    }
    })()
}

So, since these are lazy upload logics to run through, we'll execute these logics in the for loop that iterates through changed image elements in the doc.on( 'change', ... ) listener.

for (const imageElement of getImagesFromChangeItem(editor, item)) {
    // Check if the image element still has upload id.
    const uploadId = imageElement.getAttribute('uploadId')

    // avoid executing our custom upload logic when:
    // 1. the picture is being handled by original paste/click-to-upload logic, or
    // 2. the image element is in graveyard, aka. just been deleted
    if (!uploadId && !isInsertedInGraveyard) {
        additionalUploadImage(imageElement)
    }
  
  	...           

Overall view here.

This solves dropping images from websites, however, I think the solution is not perfect at all, for

  1. As mentioned by @wwalc above, CORS issues.

    The second function, to handle <img> elements with src pointing to external URLs, would have to work differently. Such upload function would need to connect to the server side endpoint and ask it to upload the image (providing just the URL of it) and return back the correct url. It has to be done this way due to CORS restrictions.

    In fact, I also get HTTP 403 sometimes when trying to request pictures from some websites, which means some images cannot be normally downloaded in the first place.

  2. On the other hand, lazy loading here kind of disrupts the normal workflow of uploading images, for the original concept is that uploading of a image are only triggered by paste / drop events so that the uploading process will only be triggered once in a image's life cycle, while listening to changed image elements in the document is not 'image-oriented' and will trigger every time the editor refreshes.

So at the end, I'm moving this task to server side. Basically what I'm going to do is on every editor upload, let the server examine the document's raw HTML content, and

  1. Find out all <img> elements
  2. Download the pictures from the internet and replace the src of these <img> elements

@pomek pomek removed this from the backlog milestone Feb 21, 2022
@martynawierzbicka martynawierzbicka added the support:2 An issue reported by a commercially licensed client. label Oct 3, 2022
@Witoso Witoso added the squad:core Issue to be handled by the Core team. label Oct 6, 2023
@CKEditorBot CKEditorBot added status:planned Set automatically when an issue lands in the "Sprint backlog" column. We will be working on it soon. and removed status:planned Set automatically when an issue lands in the "Sprint backlog" column. We will be working on it soon. labels Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package:image squad:core Issue to be handled by the Core team. support:2 An issue reported by a commercially licensed client. type:feature This issue reports a feature request (an idea for a new functionality or a missing option).
Projects
None yet