Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

错误的解析了多个,但在jsoup是正常的 #20

Closed
Matcha-xiaobin opened this issue Dec 12, 2023 · 20 comments
Closed

错误的解析了多个,但在jsoup是正常的 #20

Matcha-xiaobin opened this issue Dec 12, 2023 · 20 comments
Assignees
Labels
bug Something isn't working

Comments

@Matcha-xiaobin
Copy link

Matcha-xiaobin commented Dec 12, 2023

Describe the bug
ksoup 0.1.0
解析html后多出了部分不完整的内容,当我采用get请求拿到网页内容后,我查看了一下内容,并没有错误重复的内容,但使用Ksoup.parse(htmlString)后,就出现了错误的重复内容

image
上图中,html内容是正确的,但是body的内容则包含重复的

To Reproduce
Steps to reproduce the behavior:

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Device (please complete the following information):
android 13
ios 17
jvm

Additional context
网页结构上是这样的:
image

代码这么写的:
image
解析出来后是这样的:
image

我期望的正确的打印结果应该是:
TAG: mainContents: 1, titles: 6, contents: 6

@itboy87
Copy link
Collaborator

itboy87 commented Dec 12, 2023

Hi @Matcha-xiaobin, could you please provide me with a sample code and the HTML code or link necessary for reproducing this issue?

@Matcha-xiaobin
Copy link
Author

@itboy87
你好!
以下是复现代码:
Ksoup.parseGetRequest("https://www.dm530w.org/").apply {
body().select("div[class=firs l]")
.firstOrNull()?.let { element ->
val titles = element.select("div[class=dtit]")
val contents = element.select("div[class=img]")
logD("titles: ${titles.size}, contents: ${contents.size}")
}
}

期望的正确结果是:
titles: 6, contents: 6

@itboy87
Copy link
Collaborator

itboy87 commented Dec 13, 2023

@Matcha-xiaobin the issue is related to the charset. We will fix it as soon as possible. Thanks

@Matcha-xiaobin
Copy link
Author

@Matcha-xiaobin该问题与字符集有关。我们会尽快修复它。谢谢

好的

@itboy87 itboy87 added the bug Something isn't working label Dec 14, 2023
@itboy87 itboy87 mentioned this issue Dec 18, 2023
@itboy87
Copy link
Collaborator

itboy87 commented Dec 19, 2023

@Matcha-xiaobin , I have fixed this issue in the latest version (0.1.1). Please try it. Thanks for bringing this issue to our attention.

@Matcha-xiaobin
Copy link
Author

@itboy87 你好,我看了一下,Maven仓库 中似乎不能找到0.1.1版本。
image

@itboy87
Copy link
Collaborator

itboy87 commented Dec 20, 2023

@Matcha-xiaobin it takes some time to show up there but it's already released. You can use it, just set it to com.fleeksoft.ksoup:ksoup:0.1.1

@Matcha-xiaobin
Copy link
Author

@Matcha-xiaobin需要一些时间才能出现,但它已经发布了。你可以使用它,只需将其设置为com.fleeksoft.ksoup:ksoup:0.1.1

目前我用android studio 是无法下载这个的,我再等等看。
image

@Matcha-xiaobin
Copy link
Author

@itboy87 我想我应该是弄错了,只有ksoup 需要升级到0.1.1, ksoup-network 并不需要升级,依然使用0.1.0,对吗?

@Matcha-xiaobin
Copy link
Author

经过测试,确实没有看到错误的内容了,非常棒!感谢!

@Matcha-xiaobin
Copy link
Author

Matcha-xiaobin commented Dec 20, 2023

你好,很抱歉再次打扰下,这个问题,在IOS上似乎依然存在!在Android 和jvm平台上似乎是好的。

@itboy87
Copy link
Collaborator

itboy87 commented Dec 20, 2023

@Matcha-xiaobin it appears there is an issue with the network request to your web link, https://www.dm530w.org/. It seems to have an SSL issue, causing a failure on iOS when requesting from the network. Currently, ksoup-network doesn't provide an option to ignore SSL verification. You'll need to handle that manually by fetching the HTML from https://www.dm530w.org/ and then parsing it with ksoup, excluding the use of ksoup-network. In a future version, I may add a flag for SSL validation ignore.

@Matcha-xiaobin
Copy link
Author

Matcha-xiaobin commented Dec 21, 2023

@Matcha-xiaobin it appears there is an issue with the network request to your web link, https://www.dm530w.org/. It seems to have an SSL issue, causing a failure on iOS when requesting from the network. Currently, ksoup-network doesn't provide an option to ignore SSL verification. You'll need to handle that manually by fetching the HTML from https://www.dm530w.org/ and then parsing it with ksoup, excluding the use of ksoup-network. In a future version, I will add a flag for SSL verification.

好的,我刚刚检查了下代码,似乎是ktor这边存在一些问题,我是采用ktor获取html内容后给到ksoup解析的,但是现在失败了...
ktor采用的CIO引擎,它在android和jvm目标上没有问题,也许我应该给ios目标换一个引擎。

目前我改成了直接使用Ksoup.parseGetRequest(url),一切似乎都没有问题了。
关于ssl问题,我已经在iosApp中的info.plist中添加了如下代码:
image

@itboy87
Copy link
Collaborator

itboy87 commented Dec 21, 2023

@Matcha-xiaobin, you're correct; the issue is related to Ktor Darwin, as mentioned in this KTOR-5158. However, it seems the problem is still unresolved. I'm considering switching to CIO, which might work better. I'll conduct some testing and then make a decision.

Could you please confirm if it's working fine after adding the above-mentioned code in the info.plist?

@itboy87
Copy link
Collaborator

itboy87 commented Dec 21, 2023

@Matcha-xiaobin I just tried CIO, and it appears that CIO still doesn't support TLS sessions on the native platform. It is throwing the following error:
kotlin.IllegalStateException: TLS sessions are not supported on Native platform.

@Matcha-xiaobin
Copy link
Author

@Matcha-xiaobin I just tried CIO, and it appears that CIO still doesn't support TLS sessions on the native platform. It is throwing the following error: kotlin.IllegalStateException: TLS sessions are not supported on Native platform.

是的,我这边也是抛出了这个错误。我之前并没有注意到这个错误,因为在jvm和android上是正常的,我就忽略ktor的问题,这是我的疏忽。

@Matcha-xiaobin
Copy link
Author

@Matcha-xiaobin,你是对的;正如KTOR-5158中提到的,该问题与 Ktor Darwin 有关。然而,问题似乎仍然没有解决。我正在考虑转向 CIO,这可能会更好。我会进行一些测试,然后做出决定。

请您确认在info.plist中添加上述代码后是否可以正常工作?

是的,我目前确定是可以正常工作

@Matcha-xiaobin
Copy link
Author

@Matcha-xiaobin,你是对的;正如KTOR-5158中提到的,该问题与 Ktor Darwin 有关。然而,问题似乎仍然没有解决。我正在考虑转向 CIO,这可能会更好。我会进行一些测试,然后做出决定。
请您确认在info.plist中添加上述代码后是否可以正常工作?

是的,我目前确定是可以正常工作

我可能说的有点误解,我重新解释下:
image
添加如上代码后,使用Ksoup.parseGetRequest(url),是可以正常工作的

@Matcha-xiaobin
Copy link
Author

@itboy87 引入0.1.1似乎会报这个错误

image

但暂时没有影响到我。

@itboy87
Copy link
Collaborator

itboy87 commented Dec 21, 2023

@itboy87 引入0.1.1似乎会报这个错误

image

但暂时没有影响到我。

For some reason, the last publish missed this file, and also ksoup-network:0.1.1. I will fix that in the next version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants