Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcy.net giving errors for new posts #613

Closed
pxssy opened this issue Feb 18, 2020 · 5 comments
Closed

bcy.net giving errors for new posts #613

pxssy opened this issue Feb 18, 2020 · 5 comments

Comments

@pxssy
Copy link

pxssy commented Feb 18, 2020

Thank you so much for implementing and supporting the site! It was a real hassle to use that site honestly and your downloader really improved the experience.

That being said, i've been getting quite a few errors and i believe its because they changed the format sometime ago. The old posts still uses the old format i mentioned in the other post, but it seems like they have changed it for the new ones

A recent example https://bcy.net/item/detail/6780546160802143236

The display "thumbnail":
https://p1-bcy.byteimg.com/img/banciyuan/3ccdff22479c4060aadc86718209b281~tplv-banciyuan-w650.image

The "original" with watermark:
https://p1-bcy.byteimg.com/img/banciyuan/3ccdff22479c4060aadc86718209b281~tplv-banciyuan-logo-v3:wqnpnLLlhZLlpKfprZTnjotfCuWNiuasoeWFgyAtIEFDR-eIseWlveiAheekvuWMug==.image?sig=XOCQEWBAelmBFHEPfxA8dD5dX2g%3D

Seems like the string "~tplv-banciyuan-logo-v3:wqnpnLLlhZLlpKfprZTnjotfCuWNiuasoeWFgyAtIEFDR-eIseWlveiAheekvuWMug==.image" gives the original image, but makes it come along with a watermark.
Surprisingly, "wqnpnLLlhZLlpKfprZTnjotfCuWNiuasoeWFgyAtIEFDR-eIseWlveiAheekvuWMug" is actually base64 for the chinese characters on the watermark itself. The watermark includes the poster's name, which makes me believe this is NOT a coincidence. There is a very headache catch though.

The characters on the watermark
"©露兒大魔王_
半次元 - ACE爱好者社区"

Actually maps to (in base64, UTF-8)
"wqnpnLLlhZLlpKfprZTnjotfCuWNiuasoeWFgyAtIEFDReeIseWlveiAheekvuWMugo="
while what's used above in the link is
"wqnpnLLlhZLlpKfprZTnjotfCuWNiuasoeWFgyAtIEFDR-eIseWlveiAheekvuWMug=="
Almost exactly the same except that a repeated "e" is replaced with a "-", very strange indeed.
Replacing the original with the "correct" string "wqnpnLLlhZLlpKfprZTnjotfCuWNiuasoeWFgyAtIEFDReeIseWlveiAheekvuWMugo=" doesn't work. Seems like its some kind of obfuscation or human error.

I tried replacing the whole string with a base64 encode for a space, ie "ICAg==" or "IA==" Doesn't work. Atm i'm stuck.

The only other template i managed to find is
"https://p1-bcy.byteimg.com/img/banciyuan/3ccdff22479c4060aadc86718209b281~noop.image". Its unwatermarked but it seems to be compressed quite a bit, its not exactly the original.
I think we just need to get the template right; the correct "~xxxx" tag for the original unwatermarked.

The downloader gives this output when trying to download said profile.

downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/3ccdff22479c4060aadc86718209b281' download: Failed to download 6780546160802143236 35432115.part downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/481a06423e3e4969bf129319541c4ab5' download: Failed to download 6780546160802143236 35432116.part downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/bc46a12d7d5b4f838506c63cdc5a126f' download: Failed to download 6780546160802143236 35432117.part downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/51936a46c02c49a09dfee28d495eea1c' download: Failed to download 6780546160802143236 35432118.part downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/a6a61bce98b448abbb1e12e9deb6cb6b' download: Failed to download 6780546160802143236 35432119.part downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/14dbc38e5bff48688716119d17639520' download: Failed to download 6780546160802143236 35432120.part downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/a19a6e8fc59c49d28e04b753fb5cb102' download: Failed to download 6780546160802143236 35432121.part downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/1f52f033ebb74293a244067f975e095c' download: Failed to download 6780546160802143236 35432122.part downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/a337da495119443aad11145aa1db7d90' download: Failed to download 6778693005793565699 35037961.part downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/1eb566a4a3854a19beb4cff899cd00a1' download: Failed to download 6778693005793565699 35037962.part downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/bdb5dae63fa6477fa478b927cbac3236' download: Failed to download 6778693005793565699 35037963.part downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/2170770b8fcb4308b3367e31d441e62b' download: Failed to download 6778693005793565699 35037964.part

These are most likely from the downloader using the old technique to handle the new links, which i've tried, does not work.

The current roundabout way to handle this imho is to maybe check if link has an image extension (.jpg/.png), and if it does, implement the old method.
If it doesn't then just grab the watermarked originals as well as the "~noop" version mentioned above (until we find a method to remove the watermark from originals), perhaps also place it in separate folders until a final solution can be found. In the mean time i'll manually use the noop version to crop out the watermark from the original.

@mikf
Copy link
Owner

mikf commented Feb 22, 2020

Well, I also haven't managed to find anything yet,
except another issue when downloading from that user's timeline:

$ gallery-dl https://bcy.net/u/109282764041
[downloader.http][warning] '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/3ccdff22479c4060aadc86718209b281'
[download][error] Failed to download 6780546160802143236 35432115.part
^C
KeyboardInterrupt

That's not exactly the same problem as mentioned before, but the image URLs and metadata from the API endpoint are different than the embedded ones in /item/detail/ web pages, and rather incomplete:

https://bcy.net/apiv3/user/selfPosts?uid=109282764041

{
    "h": 4032,
    "mid": 35432115,
    "origin": "",
    "original_path": "",
    "path": "https://img-bcy-qn.pstatp.com/banciyuan/3ccdff22479c4060aadc86718209b281",
    "ratio": 0.6666666666666666,
    "type": "image",
    "visible_level": "",
    "w": 2688
}
https://bcy.net/item/detail/6780546160802143236

{
    "h": 4032,
    "mid": 35432115,
    "origin": "https://p1-bcy.byteimg.com/img/banciyuan/3ccdff22479c4060aadc86718209b281~tplv-banciyuan-logo-v3:wqnpnLLlhZLlpKfprZTnjotfCuWNiuasoeWFgyAtIEFDR-eIseWlveiAheekvuWMug==.image?sig=XOCQEWBAelmBFHEPfxA8dD5dX2g%3D",
    "original_path": "https://p1-bcy.byteimg.com/img/banciyuan/3ccdff22479c4060aadc86718209b281~noop.image",
    "path": "https://p1-bcy.byteimg.com/img/banciyuan/3ccdff22479c4060aadc86718209b281~tplv-banciyuan-w650.image",
    "ratio": 0,
    "type": "image",
    "visible_level": "",
    "w": 2688
}

This can probably be solved by adding the watermark or "noop" filter to the path value, but it just feels bad doing that.

@mikf mikf added the bug label Feb 22, 2020
mikf added a commit that referenced this issue Feb 28, 2020
Images from new posts can have incomplete/partial URLs (1)
without any filename extension when fetching their data from
'/apiv3/user/selfPosts', so now all data gets taken from
'/item/detail/ID' pages.

It is currently unknown how to get the non-watermarked original version
of these images, or if that is possible at all. (2)
Images with a watermark will have their 'filter' metadata field set to
"watermark". For original images this field is an empty string "".

Enabling the 'noop' option will, in addition to the watermarked version,
yield the the '~noop.image' filter version (3),
where 'filter' is set to "noop".

(1) "https://img-bcy-qn.pstatp.com/banciyuan/3ccdff22479c4060aadc86718209b281"
(2) "https://p1-bcy.byteimg.com/img/banciyuan/3ccdff22479c4060aadc86718209b281~tplv-banciyuan-logo-v3:wqnpnLLlhZLlpKfprZTnjotfCuWNiuasoeWFgyAtIEFDR-eIseWlveiAheekvuWMug==.image"
(3) "https://p1-bcy.byteimg.com/img/banciyuan/3ccdff22479c4060aadc86718209b281~noop.image"
@mikf
Copy link
Owner

mikf commented Feb 28, 2020

The '404 Not Found' errors should be fixed with 8fbbaa5, but it's still only capable of downloading watermarked images for these kinds of posts - or the noop version when enabling the noop option.

This commit also adds a filter metadata field which is either empty "" for original images, "watermark", or "noop", depending on the filter used by bcy.net. You can't use it for directory names, but adding a -watermark or -noop to filenames is possible with {filter:?-//}.

mikf added a commit that referenced this issue Mar 4, 2020
The former implementation would try to use the embedded data from
'/item/detail/' pages for every post, even if that wasn't really
necessary.

This commit also fixes some issues with posts only visible to
logged in users.
@xion2
Copy link

xion2 commented Sep 6, 2021

The '404 Not Found' errors should be fixed with 8fbbaa5, but it's still only capable of downloading watermarked images for these kinds of posts - or the noop version when enabling the noop option.

This commit also adds a filter metadata field which is either empty "" for original images, "watermark", or "noop", depending on the filter used by bcy.net. You can't use it for directory names, but adding a -watermark or -noop to filenames is possible with {filter:?-//}.

Some posts aren't downloading properly because the URL's are different for them. So it doesn't download the "original" watermarked version or the "noop" version which is higher quality than what it grabbed. Here's an example:

https://bcy.net/item/detail/6721286314647355660

When I use '-g' it shows the URL as this which doesn't even work when put into a browser:
"https://img.bcy-qn.pstatp.com/user/1381845/item/c0rbo/809d946f10eb471989edf8cafc3bb9ea.jpg"

It should be pointed at this for the "noop" version (which I compared and is a higher quality image that is 200kb larger with less compression artifacts):
"https://p3-bcy.byteimg.com/img/banciyuan/user/1381845/item/c0rbo/809d946f10eb471989edf8cafc3bb9ea.jpg~noop.image"

And here is the the "original" that isn't even being detected at all by gallery-dl right now:
"https://p3-bcy.byteimg.com/img/banciyuan/user/1381845/item/c0rbo/809d946f10eb471989edf8cafc3bb9ea.jpg~tplv-banciyuan-logo-v3:wqnnjovlrZBzYW1l5piv6ICB5aS0ZXIK5Y2K5qyh5YWDIC0gQUNH54ix5aW96ICF56S-5Yy6.image?sig=tB44v3eJxdb-dY9Jl9Ge8A6xIjo%3D"

A few others have "c0qxx" or "c0r67" instead of "c0rbo". The first 3 of this users posts download normally with both "noop" and "watermarked" detected. The last 4 posts do not.

@pxssy
Copy link
Author

pxssy commented May 1, 2022

Someone appears to have found a solution that is working. I've somewhat tested myself. personally I don't code but its generating a signature that matches the unwatermarked, original images.

Unfortunately i don't actually understand what's being done, but it'd be great if you could take a look into it and see if whatever is done can be integrated into gallery-dl

https://greasyfork.org/en/scripts/434023-%E5%8D%8A%E6%AC%A1%E5%85%83%E5%8E%9F%E5%9B%BE-%E6%97%A0%E6%B0%B4%E5%8D%B0

https://github.com/SWZ128/BCY

@mikf
Copy link
Owner

mikf commented Dec 3, 2022

Should be fixed with 46b6425 (v1.23.4)

@pxssy This script only links to the low-quality noop versions of images (original_path)

@mikf mikf closed this as completed Dec 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants