Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

有个问题 #1

Open
zoe531 opened this issue Oct 9, 2015 · 3 comments
Open

有个问题 #1

zoe531 opened this issue Oct 9, 2015 · 3 comments

Comments

@zoe531
Copy link

zoe531 commented Oct 9, 2015

爬取googleplay不用翻墙代理吗?googleplay是动态加载的只用scrapy可以是实现吗 ?

@oa414
Copy link
Owner

oa414 commented Oct 10, 2015

因为我默认用 PAC 文件翻墙了所以会默认通过代理访问 Play,我把它也是部署在国外的 VPS 上。。

如果需要翻墙,无论是开 VPN 还是 Scrapy 的 HTTP 代理配置都可以。

如果内容是 JS 加载的那么 Scrapy 需要配合一个 webview 引擎来用才能解析到最后的结果,不过我爬的几个字段都是 HTML 里面有的,所以没有遇到。

@popoaichuiniu
Copy link

感觉你这个项目代码不是完整的,感觉好多东西没有啊,那个google play 确实能工作吗,2015年的时候,googleplay的商店不是动态加载的吗?不要模拟登录吗?

@oa414
Copy link
Owner

oa414 commented Feb 13, 2017

当时确实能跑起来的...现在不确定了。

当时脚本里获取的信息是第一次 GET 网址就返回的 HTML 标签的内容,可能有其他内容是 AJAX 动态加载的。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants