Skip to content
This repository has been archived by the owner on Jan 24, 2024. It is now read-only.

Add Malicious Webpage Detection Example #976

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

edencfc
Copy link

@edencfc edencfc commented May 13, 2021

Add Malicious Webpage Detection Example by PaddleNLP

Add Malicious Webpage Detection Example by PaddleNLP
Copy link
Contributor

@TCChenlong TCChenlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有些地方有问题,comments了,辛苦改下吧 感谢~

"source": [
"# 使用LSTM的恶意网页识别\n",
"\n",
"**作者:** [PaddlePaddle](https://github.com/PaddlePaddle) <br>\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

作者这里写自己的github名字和链接 感谢大家的贡献~

"source": [
"## 三、网络搭建\n",
"\n",
"### 3.1 构造dataloder\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataloder -> DataLoader

"import paddlenlp\n",
"import paddle.nn as nn\n",
"import paddle.nn.functional as F\n",
"import paddlenlp as ppnlp\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不推荐这么用,还是 paddlenlp 就好~

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

就是删掉72行?

Copy link
Contributor

@TCChenlong TCChenlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

},
"outputs": [],
"source": [
"!pip install lxml -i https://mirror.baidu.com/pypi/simple/\r\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lxml和html5lib若后面没用到,需删除

},
"outputs": [],
"source": [
"class SelfDefinedDataset(paddle.io.Dataset):\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PaddleNLP自定义数据集有多种方式,可参考:https://paddlenlp.readthedocs.io/zh/latest/data_prepare/dataset_self_defined.html
当然,这里的自定义也没问题~

"然后接一个线性变换层,完成二分类任务。\n",
"\n",
"- `paddle.nn.Embedding`组建word-embedding层\n",
"- `ppnlp.seq2vec.LSTMEncoder`组建句子建模层\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里也需要改一下: ppnlp -> paddlenlp

" padding_idx=padding_idx)\n",
"\n",
" # 将word embedding经过LSTMEncoder变换到文本语义表征空间中\n",
" self.lstm_encoder = ppnlp.seq2vec.LSTMEncoder(\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里也需要改一下: ppnlp -> paddlenlp

"# 提取全部被黑页面样本\r\n",
"d_page = tempdf[tempdf['flag']=='d']\r\n",
"# 合并样本\r\n",
"train_page = pd.concat([n_page,d_page],axis=0)\r\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里做了两次合并 合并一次就可以吧?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants