-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Add Malicious Webpage Detection Example #976
base: develop
Are you sure you want to change the base?
Conversation
Add Malicious Webpage Detection Example by PaddleNLP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有些地方有问题,comments了,辛苦改下吧 感谢~
"source": [ | ||
"# 使用LSTM的恶意网页识别\n", | ||
"\n", | ||
"**作者:** [PaddlePaddle](https://github.com/PaddlePaddle) <br>\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
作者这里写自己的github名字和链接 感谢大家的贡献~
"source": [ | ||
"## 三、网络搭建\n", | ||
"\n", | ||
"### 3.1 构造dataloder\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataloder -> DataLoader
"import paddlenlp\n", | ||
"import paddle.nn as nn\n", | ||
"import paddle.nn.functional as F\n", | ||
"import paddlenlp as ppnlp\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不推荐这么用,还是 paddlenlp 就好~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
就是删掉72行?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
}, | ||
"outputs": [], | ||
"source": [ | ||
"!pip install lxml -i https://mirror.baidu.com/pypi/simple/\r\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lxml和html5lib若后面没用到,需删除
}, | ||
"outputs": [], | ||
"source": [ | ||
"class SelfDefinedDataset(paddle.io.Dataset):\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PaddleNLP自定义数据集有多种方式,可参考:https://paddlenlp.readthedocs.io/zh/latest/data_prepare/dataset_self_defined.html
当然,这里的自定义也没问题~
"然后接一个线性变换层,完成二分类任务。\n", | ||
"\n", | ||
"- `paddle.nn.Embedding`组建word-embedding层\n", | ||
"- `ppnlp.seq2vec.LSTMEncoder`组建句子建模层\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里也需要改一下: ppnlp -> paddlenlp
" padding_idx=padding_idx)\n", | ||
"\n", | ||
" # 将word embedding经过LSTMEncoder变换到文本语义表征空间中\n", | ||
" self.lstm_encoder = ppnlp.seq2vec.LSTMEncoder(\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里也需要改一下: ppnlp -> paddlenlp
"# 提取全部被黑页面样本\r\n", | ||
"d_page = tempdf[tempdf['flag']=='d']\r\n", | ||
"# 合并样本\r\n", | ||
"train_page = pd.concat([n_page,d_page],axis=0)\r\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里做了两次合并 合并一次就可以吧?
Add Malicious Webpage Detection Example by PaddleNLP