-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsearch.xml
171 lines (168 loc) · 84.6 KB
/
search.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
<?xml version="1.0" encoding="utf-8"?>
<search>
<entry>
<title>C++读取txt数据并生成图像,识别圆</title>
<url>/2022/05/06/C++%E8%AF%BB%E5%8F%96txt%E6%95%B0%E6%8D%AE%E5%B9%B6%E7%94%9F%E6%88%90%E5%9B%BE%E5%83%8F%EF%BC%8C%E8%AF%86%E5%88%AB%E5%9C%86/</url>
<content><![CDATA[<p>在Visual Studio2019中配置OpenCV的C++环境是比较麻烦和困难的,需要反复操作,这部分我会专门写一篇文章去记录。今天这篇文章只记录如何将数据生成图像,并识别图像中的圆。</p>
<p>首先,环境装好后,开始编程。</p>
<h1 id="准备工作"><a href="#准备工作" class="headerlink" title="准备工作"></a>准备工作</h1><h2 id="include相关的包"><a href="#include相关的包" class="headerlink" title="#include相关的包"></a>#include相关的包</h2><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">#include <iostream></span><br><span class="line">#include <opencv2/core/core.hpp></span><br><span class="line">#include <opencv2/highgui/highgui.hpp></span><br><span class="line">#include <opencv2/imgproc.hpp></span><br><span class="line">#include<opencv2/imgproc/imgproc.hpp> </span><br><span class="line">#include<vector></span><br><span class="line">#include<fstream></span><br><span class="line">#include<typeinfo></span><br></pre></td></tr></table></figure>
<h2 id="添加环境变量"><a href="#添加环境变量" class="headerlink" title="添加环境变量"></a>添加环境变量</h2><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">using namespace cv;</span><br><span class="line">using namespace std;</span><br></pre></td></tr></table></figure>
<h1 id="读取数据"><a href="#读取数据" class="headerlink" title="读取数据"></a>读取数据</h1><h2 id="读取txt中的数据"><a href="#读取txt中的数据" class="headerlink" title="读取txt中的数据"></a>读取txt中的数据</h2><p>注意,程序中所使用的数组以及全局变量声明省略了,使用时自己根据需要申请内存空间。</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">int len_data = 256*256;</span><br><span class="line">int read_txt()</span><br><span class="line">{</span><br><span class="line"> ifstream infile; //定义读取文件流,相对于程序来说是in</span><br><span class="line"> infile.open("E:\\data.txt"); //打开文件</span><br><span class="line"> for (int i = 0; i < len_data; i++) //定义行循环</span><br><span class="line"> {</span><br><span class="line"> //读取一个值(空格、制表符、换行隔开)就写入到矩阵中,行列不断循环进行</span><br><span class="line"> infile >> data_all[i]; </span><br><span class="line"> }</span><br><span class="line"> infile.close(); //读取完成之后关闭文件</span><br><span class="line"> return 0;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<h1 id="数据处理"><a href="#数据处理" class="headerlink" title="数据处理"></a>数据处理</h1><h2 id="数据归一化以及转换成二维矩阵"><a href="#数据归一化以及转换成二维矩阵" class="headerlink" title="数据归一化以及转换成二维矩阵"></a>数据归一化以及转换成二维矩阵</h2><p>注意,C++读取txt的数据是按照流进行读取的,不区分空格还是换行符,数据以空格或者换行符为分隔符。<br>由于图像颜色的分布是0-255的,所以在生成彩色图像时,也必须将数据范围固定到0-255的范围,先进行数据归一化。也可以选择你们自己的归一化程序进行归一化。</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">normalize_MaxMin(data_all, 65536, 255); //进行0~255的最大最小归一化</span><br><span class="line">int max_position(float* a, int len_a)</span><br><span class="line">{</span><br><span class="line"> //计算最小值所在的位置</span><br><span class="line"> int position = 0;</span><br><span class="line"> float max_y = -65535; //初始化一个最大值</span><br><span class="line"> for (int i = 0; i < len_a; i++)</span><br><span class="line"> {</span><br><span class="line"> if (a[i] > max_y)</span><br><span class="line"> {</span><br><span class="line"> max_y = a[i];</span><br><span class="line"> position = i;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> return position;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line">int min_position(float* a, int len_a)</span><br><span class="line">{</span><br><span class="line"> //计算最小值所在的位置</span><br><span class="line"> int position = 0;</span><br><span class="line"> float min_y = 65535; //初始化一个最大值</span><br><span class="line"> for (int i = 0; i < len_a; i++)</span><br><span class="line"> {</span><br><span class="line"> if (a[i] < min_y)</span><br><span class="line"> {</span><br><span class="line"> min_y = a[i];</span><br><span class="line"> position = i;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> return position;</span><br><span class="line">}</span><br><span class="line"> </span><br><span class="line">void normalize_MaxMin(float* x, int len_x, int multiple)</span><br><span class="line">{</span><br><span class="line"> //最大最小归一化到0~multiple;</span><br><span class="line"> float max_x = x[max_position(x, len_x)];</span><br><span class="line"> float min_x = x[min_position(x, len_x)];</span><br><span class="line"> for (int i = 0; i < len_x; i++)</span><br><span class="line"> {</span><br><span class="line"> x[i] = ((float)(x[i] - min_x) / (float)(max_x - min_x))*multiple;</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line">//一维矩阵转二维矩阵</span><br><span class="line">for (int p = 0; p < 256; p++)</span><br><span class="line">{</span><br><span class="line"> for (int q = 0; q < 256; q++)</span><br><span class="line"> {</span><br><span class="line"> data_C2[q][p] = data_C[p * 256 + q];</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<h2 id="数据格式转Mat格式"><a href="#数据格式转Mat格式" class="headerlink" title="数据格式转Mat格式"></a>数据格式转Mat格式</h2><p>下一步将数据转换为二维矩阵,然后转化成Mat格式。Mat格式是OpenCV的特有图像格式。包含了头尾在内的一些图像本身的彩色,灰度等特征信息。下面将数据生成二维图。</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">//把数据生成256*256的图</span><br><span class="line">Mat srcImage = Mat(Size(256, 256), CV_32F, data_C2);</span><br><span class="line">Mat im_color;</span><br><span class="line">srcImage.convertTo(im_color, CV_8UC1, 255.0 / 255); //映射从CV_32F转换到CV_8U 的0-255</span><br></pre></td></tr></table></figure>
<h1 id="识别圆"><a href="#识别圆" class="headerlink" title="识别圆"></a>识别圆</h1><p>注意,这才是本文的核心,也是我查阅很多资料,自己尝试了很多方法和参数才得出来的生成方式,除了数组大小外不建议调整,如果有更好的生成方式欢迎留言讨论。</p>
<p>上面先将二维数组生成一个Mat格式的数据,数据使用32位的Float格式的数据。CV_32F是像素是在0-1.0之间的任意值,这对于一些数据集的计算很有用,但是它必须通过将每个像素乘以255来转换成8位来保存或显示。</p>
<p>srcImage.convertTo负责转换数据类型,将srcImage映射从CV_32F转换到CV_8U 的0-255并赋值给im_color。</p>
<h2 id="检测圆并显示结果"><a href="#检测圆并显示结果" class="headerlink" title="检测圆并显示结果"></a>检测圆并显示结果</h2><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">vector<Vec3f> circles;</span><br><span class="line"></span><br><span class="line">double dp = 2;</span><br><span class="line">double minDist = 100; //两个圆心之间的最小距离</span><br><span class="line">double param1 = 10; //Canny边缘检测的较大阈值</span><br><span class="line">double param2 = 100; //累加器阈值</span><br><span class="line">int min_radius = 0; //圆形半径的最小值</span><br><span class="line">int max_radius = 1000; //圆形半径的最大值</span><br><span class="line">//识别圆形</span><br><span class="line">HoughCircles(im_color, circles, HOUGH_GRADIENT, dp, minDist, param1, param2, min_radius, max_radius);</span><br><span class="line">//在图像中标记出圆形</span><br><span class="line">for (size_t i = 0; i < circles.size(); i++)</span><br><span class="line">{</span><br><span class="line"> //读取圆心</span><br><span class="line"> Point center(cvRound(circles[i][0]), cvRound(circles[i][1]));</span><br><span class="line"> //读取半径</span><br><span class="line"> int radius = cvRound(circles[i][2]);</span><br><span class="line"> //绘制圆心</span><br><span class="line"> circle(im_color,center,3,Scalar(0,255,0),-1,8,0);</span><br><span class="line"> //绘制图</span><br><span class="line"> circle(im_color, center, radius, Scalar(0, 0, 255), 3, 8, 0);</span><br><span class="line"> //cout << radius << endl;</span><br><span class="line">}</span><br><span class="line">//显示结果</span><br><span class="line">imshow("Circle in picture", im_color);</span><br><span class="line">waitKey(0);</span><br><span class="line">return 0;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>上面代码块的备注比较全,具体不做更多解释,具体的参数可以自己调整或者实验。效果是可以在绘制的图形中,画出圆心和圆周,效果如下图所示。</p>
<p><img src="https://user-images.githubusercontent.com/47737324/163091725-248a5173-5ee9-4e9c-b2e6-df246b6e2cb9.jpg" alt="效果图"></p>
]]></content>
<tags>
<tag>C++</tag>
</tags>
</entry>
<entry>
<title>深度学习——数据篇</title>
<url>/2022/09/20/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0-%E6%95%B0%E6%8D%AE%E7%AF%87/</url>
<content><![CDATA[<h1 id="介绍"><a href="#介绍" class="headerlink" title="介绍"></a>介绍</h1><p>以下使用公开数据集,是在公开数据集上进行了一些图像变换和转换。在建模之前的一些必要处理。</p>
<p>把数据相关的处理都放到这一章的内容中进行介绍。</p>
<p>注意,开源数据集有一些是可以直接通过pytorch的包load之后直接使用的,没有必要全都转化成.jpg这样的图片格式。有一些数据集下载下来是纯图片和标签,就需要我们手动进行处理,这里只是为了分享经验。如果是打算直接从图片开始,可以跳过前面的部分。</p>
<h1 id="1-数据集"><a href="#1-数据集" class="headerlink" title="1.数据集"></a>1.数据集</h1><h2 id="下载开源数据集"><a href="#下载开源数据集" class="headerlink" title="下载开源数据集"></a>下载开源数据集</h2><p>数据集去官网即可找到,这里举一个例子。可以在下面的链接下载官方版本,我下载的是FIFAR-100 Python版本。<br><a class="link" href="https://www.cs.toronto.edu/~kriz/cifar.html" >https://www.cs.toronto.edu/~kriz/cifar.html<i class="fas fa-external-link-alt"></i></a></p>
<h1 id="2-数据集转图片"><a href="#2-数据集转图片" class="headerlink" title="2.数据集转图片"></a>2.数据集转图片</h1><p>下载解压后将数据集可以转换成图片</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">import os</span><br><span class="line">import numpy as np</span><br><span class="line">import cv2</span><br><span class="line"></span><br><span class="line">source_path = os.getcwd()</span><br><span class="line">#官方给出的python3解压数据文件函数,返回数据字典</span><br><span class="line">def unpickle(file):</span><br><span class="line"> import pickle</span><br><span class="line"> with open(file, 'rb') as fo:</span><br><span class="line"> dict = pickle.load(fo, encoding='bytes')</span><br><span class="line"> return dict</span><br><span class="line"></span><br><span class="line">loc_1 = './data/cifar-10/train_cifar10/'</span><br><span class="line">loc_2 = './data/cifar-10/test_cifar10/'</span><br><span class="line"></span><br><span class="line">#判断文件夹是否存在,不存在的话创建文件夹</span><br><span class="line">if os.path.exists(loc_1) == False:</span><br><span class="line"> os.mkdir(loc_1)</span><br><span class="line">if os.path.exists(loc_2) == False:</span><br><span class="line"> os.mkdir(loc_2)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">#训练集有五个批次,每个批次10000个图片,测试集有10000张图片</span><br><span class="line">def cifar10_img(file_dir):</span><br><span class="line"> for i in range(1,6):</span><br><span class="line"> data_name = file_dir + '/'+'data_batch_'+ str(i)</span><br><span class="line"> data_dict = unpickle(data_name)</span><br><span class="line"> print(data_name + ' is processing')</span><br><span class="line"></span><br><span class="line"> for j in range(10000):</span><br><span class="line"> img = np.reshape(data_dict[b'data'][j],(3,32,32))</span><br><span class="line"> img = np.transpose(img,(1,2,0))</span><br><span class="line"> #通道顺序为RGB</span><br><span class="line"> img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)</span><br><span class="line"> #要改成不同的形式的文件只需要将文件后缀修改即可</span><br><span class="line"> img_name = loc_1 + str(data_dict[b'labels'][j]) + str((i)*10000 + j) + '.jpg'</span><br><span class="line"> cv2.imwrite(img_name,img)</span><br><span class="line"></span><br><span class="line"> print(data_name + ' is done')</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"> test_data_name = file_dir + '/test_batch'</span><br><span class="line"> print(test_data_name + ' is processing')</span><br><span class="line"> test_dict = unpickle(test_data_name)</span><br><span class="line"></span><br><span class="line"> for m in range(10000):</span><br><span class="line"> img = np.reshape(test_dict[b'data'][m], (3, 32, 32))</span><br><span class="line"> img = np.transpose(img, (1, 2, 0))</span><br><span class="line"> # 通道顺序为RGB</span><br><span class="line"> img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)</span><br><span class="line"> # 要改成不同的形式的文件只需要将文件后缀修改即可</span><br><span class="line"> img_name = loc_2 + str(test_dict[b'labels'][m]) + str(10000 + m) + '.jpg'</span><br><span class="line"> cv2.imwrite(img_name, img)</span><br><span class="line"> print(test_data_name + ' is done')</span><br><span class="line"> print('Finish transforming to image')</span><br><span class="line">if __name__ == '__main__':</span><br><span class="line"> file_dir = os.path.join(source_path,'data/cifar-10-python/cifar-10-batches-py')</span><br><span class="line"> cifar10_img(file_dir)</span><br></pre></td></tr></table></figure>
<h1 id="3-提取标签"><a href="#3-提取标签" class="headerlink" title="3.提取标签"></a>3.提取标签</h1><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">import os </span><br><span class="line">import numpy as np</span><br><span class="line">import cv2</span><br><span class="line">import json</span><br><span class="line"></span><br><span class="line">source_path = os.getcwd()</span><br><span class="line">anno_loc = os.path.join(source_path,'data/annotations/')</span><br><span class="line"></span><br><span class="line">#判断文件夹是否存在,不存在的话创建文件夹</span><br><span class="line">if os.path.exists(anno_loc) == False:</span><br><span class="line"> os.mkdir(anno_loc)</span><br><span class="line"></span><br><span class="line">#用于存放图片文件名及标注</span><br><span class="line">train_filenames = []</span><br><span class="line">train_annotations = []</span><br><span class="line"></span><br><span class="line">test_filenames = []</span><br><span class="line">test_annotations= []</span><br><span class="line"></span><br><span class="line">#训练集有五个批次,每个批次10000个图片,测试集有10000张图片</span><br><span class="line">def cifar10_annotations(file_dir):</span><br><span class="line"> print('creat train_img annotations')</span><br><span class="line"> for i in range(1,6):</span><br><span class="line"> data_name = file_dir + '/' + 'data_batch_' + str(i)</span><br><span class="line"> data_dict = unpickle(data_name)</span><br><span class="line"> print(data_name + ' is processing')</span><br><span class="line"> for j in range(10000):</span><br><span class="line"> img_name = str(data_dict[b'labels'][j]) + str((i) * 10000 + j) + '.jpg'</span><br><span class="line"> img_annotations = data_dict[b'labels'][j]</span><br><span class="line"> train_filenames.append(img_name)</span><br><span class="line"> train_annotations.append(img_annotations)</span><br><span class="line"> print(data_name + ' is done')</span><br><span class="line"></span><br><span class="line"> test_data_name = file_dir + '/test_batch'</span><br><span class="line"> print(test_data_name + ' is processing')</span><br><span class="line"> test_dict = unpickle(test_data_name)</span><br><span class="line"></span><br><span class="line"> for m in range(10000):</span><br><span class="line"> testimg_name = str(test_dict[b'labels'][m]) + str(10000 + m) + '.jpg'</span><br><span class="line"> testimg_annotations = test_dict[b'labels'][m] #str(test_dict[b'labels'][m]) test_dict[b'labels'][m]</span><br><span class="line"> test_filenames.append(testimg_name)</span><br><span class="line"> test_annotations.append(testimg_annotations)</span><br><span class="line"></span><br><span class="line"> print(test_data_name + ' is done')</span><br><span class="line"> print('Finish file processing')</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">if __name__ == '__main__':</span><br><span class="line"></span><br><span class="line"> file_dir = os.path.join(source_path,'data/cifar-10-python/cifar-10-batches-py')</span><br><span class="line"> cifar10_annotations(file_dir)</span><br><span class="line"></span><br><span class="line"> train_annot_dict = {</span><br><span class="line"> 'images': train_filenames,</span><br><span class="line"> 'categories': train_annotations</span><br><span class="line"> }</span><br><span class="line"> test_annot_dict = {</span><br><span class="line"> 'images':test_filenames,</span><br><span class="line"> 'categories':test_annotations</span><br><span class="line"> }</span><br><span class="line"> # print(annotation)</span><br><span class="line"></span><br><span class="line"> train_json = json.dumps(train_annot_dict)</span><br><span class="line"> train_file = open(os.path.join(source_path,'data/annotations/cifar10_train.json'), 'w')</span><br><span class="line"> train_file.write(train_json)</span><br><span class="line"> train_file.close()</span><br><span class="line"></span><br><span class="line"> test_json =json.dumps(test_annot_dict)</span><br><span class="line"> test_file = open(os.path.join(source_path,'data/annotations/cifar10_test.json'),'w')</span><br><span class="line"> test_file.write(test_json)</span><br><span class="line"> test_file.close()</span><br><span class="line"> print('annotations have writen to json file')</span><br></pre></td></tr></table></figure>
<h1 id="4-生成标注文件"><a href="#4-生成标注文件" class="headerlink" title="4.生成标注文件"></a>4.生成标注文件</h1> <figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line"> #生成标注文件</span><br><span class="line">import json</span><br><span class="line">save_path_json_train = r'.\jupyter\data\train.json'</span><br><span class="line">save_path_json_test = r'.\data\test.json'</span><br><span class="line"></span><br><span class="line">label = {}</span><br><span class="line">标签为0和1</span><br><span class="line">for i in name_type_df["name"]:</span><br><span class="line"> label.update({i:1})</span><br><span class="line">for i in name_type_df2["name"]:</span><br><span class="line"> label.update({i:0})</span><br></pre></td></tr></table></figure>
<h1 id="5-自己分割训练集和测试集"><a href="#5-自己分割训练集和测试集" class="headerlink" title="5.自己分割训练集和测试集"></a>5.自己分割训练集和测试集</h1> <figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">import torch</span><br><span class="line">from torch.utils.data import random_split</span><br><span class="line">dataset = list(label)</span><br><span class="line"># 以70%和30%的比例分割</span><br><span class="line">train_dataset, test_dataset = random_split(</span><br><span class="line"> dataset=dataset,</span><br><span class="line"> lengths=[int(len(label)*0.7), len(label)-int(0.7*len(label))],</span><br><span class="line"> generator=torch.Generator().manual_seed(0)</span><br><span class="line">)</span><br><span class="line"></span><br><span class="line">train_lab = {}</span><br><span class="line">for i in train_dataset:</span><br><span class="line"> train_lab.update({i:label[i]})</span><br><span class="line">test_lab = {}</span><br><span class="line">for i in test_dataset:</span><br><span class="line"> test_lab.update({i:label[i]})</span><br><span class="line"></span><br><span class="line"># 存储标签数据到json文件</span><br><span class="line">a = json.dumps(train_lab)</span><br><span class="line">f1 = open(save_path_json_train, 'w')</span><br><span class="line">f1.write(a)</span><br><span class="line">f1.close()</span><br><span class="line">b = json.dumps(test_lab)</span><br><span class="line">f2 = open(save_path_json_test, 'w')</span><br><span class="line">f2.write(b)</span><br><span class="line">f2.close()</span><br><span class="line"></span><br><span class="line"># 存储图像</span><br><span class="line">train_pic_path = r".\jupyter\data\train_pic"</span><br><span class="line">test_pic_path = r".\jupyter\data\test_pic"</span><br><span class="line"></span><br><span class="line">from torchvision import transforms</span><br><span class="line">from PIL import Image</span><br><span class="line"># 将图像归一化的尺寸,255</span><br><span class="line">crop = transforms.RandomResizedCrop(256)</span><br><span class="line">list_img_size = []</span><br><span class="line">for i in train_lab:</span><br><span class="line"> org_path = os.path.join(picture_path,i)</span><br><span class="line"> new_path = os.path.join(train_pic_path,i)</span><br><span class="line"> img = Image.open(org_path)</span><br><span class="line"> # 储存了原始的图像大小信息</span><br><span class="line"> list_img_size.append(img.size)</span><br><span class="line"> croped_img=crop(img)</span><br><span class="line"> shutil.copy2(os.path.join(picture_path,i),os.path.join(train_pic_path,i))</span><br><span class="line">for i in test_lab:</span><br><span class="line"> org_path = os.path.join(picture_path,i)</span><br><span class="line"> new_path = os.path.join(test_pic_path,i)</span><br><span class="line"> img = Image.open(org_path)</span><br><span class="line"> croped_img=crop(img)</span><br><span class="line"> shutil.copy2(os.path.join(picture_path,i),os.path.join(test_pic_path,i))</span><br></pre></td></tr></table></figure>
<h1 id="6-从切分后的图像中加载数据集"><a href="#6-从切分后的图像中加载数据集" class="headerlink" title="6.从切分后的图像中加载数据集"></a>6.从切分后的图像中加载数据集</h1><p>加载数据集这一块涉及到很多的数据调整,包括裁剪,batchsize等等。</p>
<h2 id="自定义图片数据集的加载"><a href="#自定义图片数据集的加载" class="headerlink" title="自定义图片数据集的加载"></a>自定义图片数据集的加载</h2><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">import json</span><br><span class="line">import matplotlib.pyplot as plt</span><br><span class="line">import numpy as np</span><br><span class="line">from torch.utils.data import Dataset,DataLoader</span><br><span class="line">import torch</span><br><span class="line">import torch.nn as nn</span><br><span class="line">import torch.optim as optim</span><br><span class="line">import torch.nn.functional as F</span><br><span class="line">import time</span><br><span class="line">import os</span><br><span class="line">import argparse</span><br><span class="line">import torchvision</span><br><span class="line">from torchvision import datasets, transforms</span><br><span class="line">from PIL import Image</span><br><span class="line">import numpy as np</span><br><span class="line">import matplotlib.pyplot as plt</span><br><span class="line"></span><br><span class="line">class _IMG(Dataset):</span><br><span class="line"> def __init__(self, root, train=True, transform = None, target_transform=None):</span><br><span class="line"> super(NIR_IMG, self).__init__()</span><br><span class="line"> self.train = train</span><br><span class="line"> # 加载数据集的时候调整图片大小</span><br><span class="line"> self.transform = transforms.Compose([transforms.Resize(225), transforms.CenterCrop(224), transforms.ToTensor()])</span><br><span class="line"> self.target_transform = target_transform</span><br><span class="line"></span><br><span class="line"> #如果是训练则加载训练集,如果是测试则加载测试集</span><br><span class="line"> if self.train:</span><br><span class="line"> file_annotation = os.path.join(source_path,'data/train.json')</span><br><span class="line"> img_folder = os.path.join(source_path,'data/train_pic/')</span><br><span class="line"> else:</span><br><span class="line"> file_annotation = os.path.join(source_path,'data/test.json')</span><br><span class="line"> img_folder = os.path.join(source_path,'data/test_pic/')</span><br><span class="line"> fp = open(file_annotation,'r')</span><br><span class="line"> data_dict = json.load(fp)</span><br><span class="line"></span><br><span class="line"> #如果图像数和标签数不匹配说明数据集标注生成有问题,报错提示</span><br><span class="line"> num_data = len(data_dict)</span><br><span class="line"></span><br><span class="line"> self.filenames = []</span><br><span class="line"> self.labels = []</span><br><span class="line"> self.img_folder = img_folder</span><br><span class="line"> for i in range(num_data):</span><br><span class="line"> self.filenames.append(list(data_dict.keys())[i])</span><br><span class="line"> self.labels.append(list(data_dict.values())[i])</span><br><span class="line"></span><br><span class="line"> def __getitem__(self, index):</span><br><span class="line"> img_name = self.img_folder + self.filenames[index]</span><br><span class="line"> label = self.labels[index]</span><br><span class="line"> img = plt.imread(img_name)</span><br><span class="line"> PIL_image = Image.fromarray(img) #这里ndarray_image为原来的numpy数组类型的输入</span><br><span class="line"> </span><br><span class="line"> PIL_image = self.transform(PIL_image) #可以根据指定的转化形式对数据集进行转换</span><br><span class="line"> return PIL_image, label</span><br><span class="line"> def __len__(self):</span><br><span class="line"> return len(self.filenames)</span><br><span class="line"> </span><br><span class="line">train_data = _IMG(os.path.join(source_path,'data/train.json'), train = True)</span><br><span class="line">test_data = _IMG(os.path.join(source_path,'data/test.json'), train = False)</span><br><span class="line"></span><br><span class="line">train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True, num_workers=num_workers, drop_last=True)</span><br><span class="line">test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False, num_workers=num_workers)</span><br></pre></td></tr></table></figure>
<h2 id="transforms-Compose的一些用法"><a href="#transforms-Compose的一些用法" class="headerlink" title="transforms.Compose的一些用法"></a>transforms.Compose的一些用法</h2><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">self.transform = transforms.Compose([transforms.Resize([224,224]), transforms.CenterCrop([224,224]), transforms.ToTensor(),transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])</span><br></pre></td></tr></table></figure>
<p>transforms的这个折腾了很长事件,它transforms.Compose是一个集合,在这个集合里面可以设置图片缩放大小,裁剪大小,归一化等等操作。</p>
<p>这里需要注意,当Resize只有一个参数时,是将图片的短边拉成Resize的大小,再进行中心裁剪CenterCrop时按照从中心计算的区域进行裁剪。<br>ToTensor是将向量进行Tensor的归一化计算。<br>transforms.Normalize则是规定了具体归一化的均值和方差。</p>
<h2 id="解决数据不能被batch-size整除的问题,使用drop-last"><a href="#解决数据不能被batch-size整除的问题,使用drop-last" class="headerlink" title="解决数据不能被batch_size整除的问题,使用drop_last"></a>解决数据不能被batch_size整除的问题,使用drop_last</h2><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=True, num_workers=num_workers,drop_last=True)</span><br></pre></td></tr></table></figure>
<p>drop_last=True就不加入最后不能被整除的部分了。</p>
<h1 id="7-显示单张图片"><a href="#7-显示单张图片" class="headerlink" title="7.显示单张图片"></a>7.显示单张图片</h1><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">plt.imshow(train_loader.dataset[0][0][0], cmap='gray')</span><br></pre></td></tr></table></figure>
]]></content>
<tags>
<tag>Deep Learning</tag>
</tags>
</entry>
<entry>
<title>本博客配置方案</title>
<url>/2022/05/07/%E6%9C%AC%E5%8D%9A%E5%AE%A2%E9%85%8D%E7%BD%AE%E6%96%B9%E6%A1%88/</url>
<content><![CDATA[<p>记录下本博客的配置方案,以方便之后进行查验,其他人可以参考我的配置。</p>
<h1 id="方案概述"><a href="#方案概述" class="headerlink" title="方案概述"></a>方案概述</h1><p>我的博客使用GitHub和hexo的keep主题进行搭建的,依赖数个组件,包括GitHub,hexo,hexo-theme-keep主题,PicX 图床工具,gitalk评论工具等。</p>
<h1 id="GitHub配置"><a href="#GitHub配置" class="headerlink" title="GitHub配置"></a>GitHub配置</h1><h2 id="建立一个GitHub仓库用于撰写博客"><a href="#建立一个GitHub仓库用于撰写博客" class="headerlink" title="建立一个GitHub仓库用于撰写博客"></a>建立一个GitHub仓库用于撰写博客</h2><p>我的GitHub地址为:<a class="link" href="https://github.com/yangli-os/yangli-os.github.io" >https://github.com/yangli-os/yangli-os.github.io<i class="fas fa-external-link-alt"></i></a><br>我的博客地址即为:<a href="https://yangli-os.github.io/">https://yangli-os.github.io</a><br>注意:这里的名字尽量采用用户名+github.io的形式以避免出错。</p>
<h1 id="安装相关包"><a href="#安装相关包" class="headerlink" title="安装相关包"></a>安装相关包</h1><p>安装Git,配置git ssh,安装Node.js,hexo的操作可以参考文章:<a class="link" href="https://zhuanlan.zhihu.com/p/26625249" >https://zhuanlan.zhihu.com/p/26625249<i class="fas fa-external-link-alt"></i></a> </p>
<h2 id="注意:"><a href="#注意:" class="headerlink" title="注意:"></a>注意:</h2><p>1.hexo相关的所有指令必须在blog文件夹下进行操作,否则不会执行。<br>2.配置hexo完成后,在本地localhost:4000测试后,先不用着急hexo d进行推送,我们安装完keep主题再进行推送。<br>3.如果hexo g出现报错,不建议直接进行hexo d的推送,推送成功会导致github当中的.github和index文件被删除,导致博客崩溃。<br>出现这种情况建议a.恢复_config.yml的配置文件,输入:hexo new “test”,然后hexo g,hexo d,更新github即可恢复。</p>
<h1 id="下载安装keep主题"><a href="#下载安装keep主题" class="headerlink" title="下载安装keep主题"></a>下载安装keep主题</h1><p>进入到blog文件夹下,输入:</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">cd theme</span><br><span class="line">npm install hexo-theme-keep</span><br></pre></td></tr></table></figure>
<h1 id="修改配置文件"><a href="#修改配置文件" class="headerlink" title="修改配置文件"></a>修改配置文件</h1><p>复制文件路径blog/theme/hexo-theme-keep下的_config.yml文件,更名为_config.hexo-theme-keep.yml,放到blog的主目录下。</p>
<h2 id="config-yml配置文件说明"><a href="#config-yml配置文件说明" class="headerlink" title="_config.yml配置文件说明"></a>_config.yml配置文件说明</h2><p>_config.yml文件的修改可以说是整个配置的核心了,耗时又耗力。这里先交代几个注意事项: </p>
<p>1.该文件采用的是yml的编写格式,可以使用文本文档的形式打开和编辑,但是注意每一个项目冒号后都有一个英文空格,不加空格会导致报错。<br>2.更换了主题的配置文件主要是有两个,一个是hexo的_config.yml文件,一个是theme当中的hexo-theme-keep主题的配置文件,这个文件要复制之后改名为:<br>_config.hexo-theme-keep.yml,原有的theme文件夹下hexo-theme-keep主题的配置文件_config.yml不变。<br>3.看清配置是在哪个文件下的配置文件。 </p>
<h2 id="针对hexo的-config-yml配置文件修改"><a href="#针对hexo的-config-yml配置文件修改" class="headerlink" title="针对hexo的_config.yml配置文件修改"></a>针对hexo的_config.yml配置文件修改</h2><p>在blog文件夹下找到_config.yml文件,使用文本文档打开后直接修改,修改的内容有:</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line"># Site</span><br><span class="line">title: YangLi's Blog # 博客标题</span><br><span class="line"></span><br><span class="line"># URL</span><br><span class="line">url: https://yangli-os.github.io/ # 博客路径</span><br><span class="line"></span><br><span class="line"># 博客的github地址</span><br><span class="line">deploy:</span><br><span class="line"> type: git</span><br><span class="line"> # 格式:[email protected]:用户名/仓库名.git</span><br><span class="line"> repo: [email protected]:yangli-os/yangli-os.github.io.git # 注意地址后要加.git</span><br><span class="line"> branch: master</span><br></pre></td></tr></table></figure>
<p>其他的不用动,subtitle,keywords,author之类的不重要,也不建议改,改了很可能报错。</p>
<h2 id="针对-config-hexo-theme-keep-yml的修改"><a href="#针对-config-hexo-theme-keep-yml的修改" class="headerlink" title="针对_config.hexo-theme-keep.yml的修改"></a>针对_config.hexo-theme-keep.yml的修改</h2><p>该修改可以参考keep的官方文档:<a class="link" href="https://keep.xpoet.cn/2020/11/Keep-%E4%B8%BB%E9%A2%98%E9%85%8D%E7%BD%AE%E6%8C%87%E5%8D%97/" >https://keep.xpoet.cn/2020/11/Keep-%E4%B8%BB%E9%A2%98%E9%85%8D%E7%BD%AE%E6%8C%87%E5%8D%97/<i class="fas fa-external-link-alt"></i></a><br>这里我只列我定制化的内容和需要注意踩坑的地方。</p>
<h3 id="style"><a href="#style" class="headerlink" title="style"></a>style</h3><p>style是博客当中的各种logo和图片,尽量使用svg格式的图,我的图是在<a class="link" href="https://igoutu.cn/icons/set/svg" >https://igoutu.cn/icons/set/svg<i class="fas fa-external-link-alt"></i></a>上下载的。<br>这些图片可以在blog\themes\hexo-theme-keep\source\images的路径下找到,建议把想要上传的图片放到这个路径下进行配置,hexo会自动同步到github的image文件夹下。</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">style</span><br><span class="line"># Avatar (You can use local image or image external link)</span><br><span class="line"> avatar: /images/饼干.svg # 这里使用自定义的图标了。 </span><br></pre></td></tr></table></figure>
<h3 id="social-contact"><a href="#social-contact" class="headerlink" title="social_contact"></a>social_contact</h3><p>社交网络链接需要开启首页的First screen才行,文章里面有,这里不做赘述</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">social_contact:</span><br><span class="line"> enable: true</span><br><span class="line"> links:</span><br><span class="line"> weixin: https://raw.githubusercontent.com/yangli-os/image-hosting/master/20220506/xxx.7l2uxquuvwg0.webp</span><br></pre></td></tr></table></figure>
<p>这里我是配置的PicX 图床工具进行的,在线使用地址为:<a class="link" href="https://picx.xpoet.cn/" >https://picx.xpoet.cn/<i class="fas fa-external-link-alt"></i></a><br>网站上有演示教程,很简单。<br>需要建一个存储图片的github仓,我这里是<a class="link" href="https://github.com/yangli-os/image-hosting" >https://github.com/yangli-os/image-hosting<i class="fas fa-external-link-alt"></i></a><br>每次使用可能都需要配置一遍token,然后上传图片,获取复制图片的github外链。</p>
<h3 id="gitalk评论配置"><a href="#gitalk评论配置" class="headerlink" title="gitalk评论配置"></a>gitalk评论配置</h3><p>这里说需要另外建一个仓库,其实不需要,可以使用上面图床的仓库,也可以使用自己已有的仓库,在仓库里面点击issues,添加一个评论。<br>按照:<a class="link" href="https://keep.xpoet.cn/2020/11/Keep-%E4%B8%BB%E9%A2%98%E9%85%8D%E7%BD%AE%E6%8C%87%E5%8D%97/" >keep官方文档<i class="fas fa-external-link-alt"></i></a> 中的教程在<a class="link" href="https://github.com/settings/applications/new" >applications<i class="fas fa-external-link-alt"></i></a> 上进行操作<br>_config.hexo-theme-keep.yml上对应的配置修改如下: </p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">comment:</span><br><span class="line"> enable: true</span><br><span class="line"> use: gitalk # values: valine | gitalk | twikoo</span><br><span class="line"></span><br><span class="line"> gitalk:</span><br><span class="line"> github_id: yangli-os</span><br><span class="line"> repository: image-hosting # Repository name to store issues</span><br><span class="line"> client_id: XXX # GitHub Application Client ID</span><br><span class="line"> client_secret: XXX # GitHub Application Client Secret</span><br></pre></td></tr></table></figure>
<p>配置完成后,需要执行hexo clean,清空界面文件才可以。<br>之后在博客底部会出现评论区,需要登陆github才可以正常使用。</p>
<h2 id="我的配置文件"><a href="#我的配置文件" class="headerlink" title="我的配置文件"></a>我的配置文件</h2><p>可以点击链接获取我的_config.hexo-theme-keep.yml配置文件:<a class="link" href="https://github.com/yangli-os/image-hosting/blob/master/_config.hexo-theme-keep.yml" >_config.hexo-theme-keep.yml<i class="fas fa-external-link-alt"></i></a></p>
<h1 id="补充"><a href="#补充" class="headerlink" title="补充"></a>补充</h1><p>使用这种方法搭建的博客,实际上使用的是GitHub Page功能,而这个功能对百度和Google的搜索引擎是反爬虫的,所以这上面的文章没办法搜到。<br>Google想要能搜到的方法:使用Google Search console,添加一个index文件,再添加每一个上传的博客网站(每更新一篇文章,就要去添加一次)<br>这种方法能够Google到博客的内容了。<br>但实测以一样的方法上传index文件等无法在百度上实现搜索(能Google为啥百度呢?)。</p>
]]></content>
<tags>
<tag>Technology</tag>
</tags>
</entry>
<entry>
<title>深度学习——模型训练篇</title>
<url>/2022/08/24/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0-%E6%A8%A1%E5%9E%8B%E8%AE%AD%E7%BB%83/</url>
<content><![CDATA[<h1 id="介绍"><a href="#介绍" class="headerlink" title="介绍"></a>介绍</h1><p>这一章节主要介绍模型如何搭建起来的,以及模型上的一些训练方法。 </p>
<h1 id="1-模型搭建"><a href="#1-模型搭建" class="headerlink" title="1.模型搭建"></a>1.模型搭建</h1><h2 id="1-1-直接训练"><a href="#1-1-直接训练" class="headerlink" title="1.1 直接训练"></a>1.1 直接训练</h2><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">loss_list = []</span><br><span class="line">acc_list = []</span><br><span class="line"></span><br><span class="line">class Network(nn.Module):</span><br><span class="line"> def __init__(self):</span><br><span class="line"> super(Network,self).__init__()</span><br><span class="line"> self.conv1 = nn.Conv2d(in_channels=3, out_channels=256, kernel_size=7,stride=2)</span><br><span class="line"> self.conv2 = nn.Conv2d(in_channels=256, out_channels=16, kernel_size=7,stride=2)</span><br><span class="line"> self.conv3 = nn.Conv2d(in_channels=16, out_channels=16, kernel_size=5)</span><br><span class="line"> self.conv4 = nn.Conv2d(in_channels=16, out_channels=16, kernel_size=[8,4])</span><br><span class="line"></span><br><span class="line"> self.fc1 = nn.Linear(in_features=16, out_features=16)</span><br><span class="line"> self.out = nn.Linear(in_features=16, out_features=2)</span><br><span class="line"> def forward(self, t):</span><br><span class="line"> #Layer 1</span><br><span class="line"> t = t</span><br><span class="line"> #Layer 2</span><br><span class="line"> t = self.conv1(t)</span><br><span class="line"> t = F.relu(t)</span><br><span class="line"> t = F.max_pool2d(t, kernel_size=2, stride=2)#output shape : (6,14,14)</span><br><span class="line"> #Layer 3</span><br><span class="line"> t = self.conv2(t)</span><br><span class="line"> t = F.relu(t)</span><br><span class="line"> t = F.max_pool2d(t, kernel_size=2, stride=2)#output shape : (6,14,14)</span><br><span class="line"> #Layer 4</span><br><span class="line"> t = self.conv3(t)</span><br><span class="line"> t = F.relu(t)</span><br><span class="line"> t = F.max_pool2d(t, kernel_size=2, stride=2)#output shape : (6,14,14)</span><br><span class="line"> #Layer 5 </span><br><span class="line"> t = self.conv4(t)</span><br><span class="line"> t = F.relu(t)</span><br><span class="line"> </span><br><span class="line"> #Layer 5</span><br><span class="line"> t=t.flatten(start_dim=1)</span><br><span class="line"> t = self.fc1(t)</span><br><span class="line"> t = F.relu(t)#output shape : (1,120)</span><br><span class="line"> #Layer 5</span><br><span class="line"> t = self.out(t)</span><br><span class="line"></span><br><span class="line"> return t</span><br><span class="line"></span><br><span class="line">network = Network()</span><br><span class="line"></span><br><span class="line">print(network)</span><br><span class="line">optimizer = optim.Adam(network.parameters(), lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)</span><br></pre></td></tr></table></figure>
<p>我batch_size的设置更改的比较多,再调整模型参数时主要调整了卷积的层数,神经元个数,卷积核的尺寸以及stride等。这里对参数需要注意的地方逐一记录一下。</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">nn.Conv2d(in_channels=3, out_channels=256, kernel_size=7,stride=2)</span><br></pre></td></tr></table></figure>
<p>每一层的卷积都跟它上一层的输入输出有关,在第一层的卷积时,有几个参数需要注意。<br>in_channels: 为输入的维度,是图片的通道数<br>out_channels: 是自己可以随意设置的第一层卷积的输出<br>kernel_size: 是卷积核的尺寸<br>stride: 是卷积步长<br>卷积图片尺寸的计算为:<br>(W - kernel——size)/stride+1<br>因为我的图片较大,所以设置stride为2,可以快速的提取特征,经过池化后缩小尺寸。</p>
<p>注意:在此可以设置kernel_size为原始图像大小,但是输出的图片为X<em>X</em>1*1,就无法进行池化运算了。一般卷积核大小kernel_size设置为3,5,7为宜。</p>
<p>在卷积层之后,输入到线性层之前,使用了t.flatten(start_dim=1),这里是因为卷积核输出后的数据格式是[x<em>x</em>1<em>1],相当于是一个四维数据,输入linear的时候需要是[x</em>x]的形式,所以用了flatten展平,start_dim=1是不改变原有的尺寸。</p>
<p>还有一点,在输入linear的时候实际上与数据做的是矩阵运算,所以,对于[x<em>y]维度的数据,需要[y</em>x]的权重矩阵进行乘法运算才行。</p>
<h3 id="解决t-reshape-1-…-的问题"><a href="#解决t-reshape-1-…-的问题" class="headerlink" title="解决t.reshape(-1,…)的问题"></a>解决t.reshape(-1,…)的问题</h3><p>t.reshape(-1,…)中的第二个参数是t.size()除了第一个参数之外参数的乘积,也是接下来linear层的第一个输入。<br>所以想要省事可以写成</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">from functools import reduce</span><br><span class="line"></span><br><span class="line">mul_t = reduce(lambda x,y:x*y,list(t.size()))//batch_size</span><br><span class="line">t = t.reshape(-1,(mul_t)</span><br></pre></td></tr></table></figure>
<p>reshape后的数据input=mul_t的维度即可,output可以自行设置。</p>
<h2 id="1-2-Finetune模型微调"><a href="#1-2-Finetune模型微调" class="headerlink" title="1.2 Finetune模型微调"></a>1.2 Finetune模型微调</h2><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line"># 冻结模型参数的函数</span><br><span class="line">def set_parameter_requires_grad(model, feature_extracting):</span><br><span class="line"> if feature_extracting:</span><br><span class="line"> for param in model.parameters():</span><br><span class="line"> param.requires_grad = False</span><br><span class="line"> </span><br><span class="line"># 冻结参数的梯度</span><br><span class="line">feature_extract = True</span><br><span class="line"># pretrained=True 为使用原有的模型参数进行初始化训练,为False的话就只使用模型结构。</span><br><span class="line">model = models.resnet18(pretrained=True)</span><br><span class="line">set_parameter_requires_grad(model, feature_extract)</span><br><span class="line"></span><br><span class="line"># 修改模型</span><br><span class="line">num_ftrs = model.fc.in_features</span><br><span class="line">model.fc = nn.Linear(in_features=num_ftrs, out_features=128, bias=True)</span><br><span class="line">model.conv3 = nn.Conv2d(in_channels=3, out_channels=4, kernel_size=3,stride=2)</span><br><span class="line">model.bn3 = nn.BatchNorm2d(4, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)</span><br><span class="line">model.relu</span><br><span class="line">model.conv4 = nn.Conv2d(in_channels=3, out_channels=2, kernel_size=3,stride=2)</span><br></pre></td></tr></table></figure>
<h3 id="指修改部分层的参数"><a href="#指修改部分层的参数" class="headerlink" title="指修改部分层的参数"></a>指修改部分层的参数</h3><p>Fine-tune可以只保留部分层的参数,可是使用下面代码提取</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line"># 去掉model的后5层</span><br><span class="line">self.resnet_layer = nn.Sequential(*list(model.children())[:-5])</span><br></pre></td></tr></table></figure>
<p>可以修改原始resnet模型当中的参数,使用resnet_layer[][]进行提取,print(network)的时候有一些会给出某一层或者某一sequence的标号。</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">self.resnet_layer[0] = nn.Conv2d(in_channels=1, out_channels=8, kernel_size=3,stride=1)</span><br><span class="line">self.resnet_layer[1] = nn.BatchNorm2d(8)</span><br><span class="line">self.resnet_layer[4][0] = nn.Conv2d(in_channels=8, out_channels=8, kernel_size=3,stride=1)</span><br></pre></td></tr></table></figure>
<p>修改模型是可以修改已有的resnet18当中的模型结构和模型参数的,可以直接使用model.进行修改。<br>一般图像类的模型,修改后几层的参数,而一般语义类的模型,修改前几层的参数,这于模型训练时先后提取的特征有关</p>
<h1 id="2-自定义损失函数"><a href="#2-自定义损失函数" class="headerlink" title="2.自定义损失函数"></a>2.自定义损失函数</h1><p>针对样本不均衡的问题,使用focal_loss作为损失函数 </p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">class focal_loss(nn.Module):</span><br><span class="line"> def __init__(self, alpha=0.25, gamma=2, num_classes = 3, size_average=True):</span><br><span class="line"> """</span><br><span class="line"> focal_loss损失函数, -α(1-yi)**γ *ce_loss(xi,yi)</span><br><span class="line"> 步骤详细的实现了 focal_loss损失函数.</span><br><span class="line"> :param alpha: 阿尔法α,类别权重. 当α是列表时,为各类别权重,当α为常数时,类别权重为[α, 1-α, 1-α, ....],常用于 目标检测算法中抑制背景类 , retainnet中设置为0.25</span><br><span class="line"> :param gamma: 伽马γ,难易样本调节参数. retainnet中设置为2</span><br><span class="line"> :param num_classes: 类别数量</span><br><span class="line"> :param size_average: 损失计算方式,默认取均值</span><br><span class="line"> """</span><br><span class="line"> super(focal_loss,self).__init__()</span><br><span class="line"> self.size_average = size_average</span><br><span class="line"> if isinstance(alpha,list):</span><br><span class="line"> assert len(alpha)==num_classes # α可以以list方式输入,size:[num_classes] 用于对不同类别精细地赋予权重</span><br><span class="line"> #print(" --- Focal_loss alpha = {}, 将对每一类权重进行精细化赋值 --- ".format(alpha))</span><br><span class="line"> self.alpha = torch.Tensor(alpha)</span><br><span class="line"> else:</span><br><span class="line"> assert alpha<1 #如果α为一个常数,则降低第一类的影响,在目标检测中为第一类</span><br><span class="line"> #print(" --- Focal_loss alpha = {} ,将对背景类进行衰减,请在目标检测任务中使用 --- ".format(alpha))</span><br><span class="line"> self.alpha = torch.zeros(num_classes)</span><br><span class="line"> self.alpha[0] += alpha</span><br><span class="line"> self.alpha[1:] += (1-alpha) # α 最终为 [ α, 1-α, 1-α, 1-α, 1-α, ...] size:[num_classes]</span><br><span class="line"></span><br><span class="line"> self.gamma = gamma</span><br><span class="line"></span><br><span class="line"> def forward(self, preds, labels):</span><br><span class="line"> """</span><br><span class="line"> focal_loss损失计算</span><br><span class="line"> :param preds: 预测类别. size:[B,N,C] or [B,C] 分别对应与检测与分类任务, B 批次, N检测框数, C类别数</span><br><span class="line"> :param labels: 实际类别. size:[B,N] or [B]</span><br><span class="line"> :return:</span><br><span class="line"> """</span><br><span class="line"> # assert preds.dim()==2 and labels.dim()==1</span><br><span class="line"> preds = preds.view(-1,preds.size(-1))</span><br><span class="line"> self.alpha = self.alpha.to(preds.device)</span><br><span class="line"> preds_logsoft = F.log_softmax(preds, dim=1) # log_softmax</span><br><span class="line"> preds_softmax = torch.exp(preds_logsoft) # softmax</span><br><span class="line"></span><br><span class="line"> preds_softmax = preds_softmax.gather(1,labels.view(-1,1)) # 这部分实现nll_loss ( crossempty = log_softmax + nll )</span><br><span class="line"> preds_logsoft = preds_logsoft.gather(1,labels.view(-1,1))</span><br><span class="line"> self.alpha = self.alpha.gather(0,labels.view(-1))</span><br><span class="line"> loss = -torch.mul(torch.pow((1-preds_softmax), self.gamma), preds_logsoft) # torch.pow((1-preds_softmax), self.gamma) 为focal loss中 (1-pt)**γ</span><br><span class="line"></span><br><span class="line"> loss = torch.mul(self.alpha, loss.t())</span><br><span class="line"> if self.size_average:</span><br><span class="line"> loss = loss.mean()</span><br><span class="line"> else:</span><br><span class="line"> loss = loss.sum()</span><br><span class="line"> return loss</span><br></pre></td></tr></table></figure>
<p>注意,想要使用focal_loss函数,在训练的时候loss不能是F.的形式,要进行修改。<br>原始格式:</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">preds = network(images) # Pass batch</span><br><span class="line">loss = F.cross_entropy(preds, labels) # Calculate Loss</span><br><span class="line"></span><br><span class="line">optimizer.zero_grad()</span><br><span class="line">loss.backward() # Calculate gradients</span><br><span class="line">optimizer.step() # Update weights</span><br><span class="line"></span><br></pre></td></tr></table></figure>
<p>需要修改成:</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">preds = network(images) # Pass batch</span><br><span class="line">criteria1 = nn.Cross_entropy() # 这里不能用F.需要用nn.的形式</span><br><span class="line"># criterial = focal_loss # 替换损失函数</span><br><span class="line">loss = criteria1(preds, labels)</span><br><span class="line">optimizer.zero_grad()</span><br><span class="line">loss.backward() # Calculate gradients</span><br><span class="line">optimizer.step() # Update weights</span><br></pre></td></tr></table></figure>
<h1 id="3-实际训练"><a href="#3-实际训练" class="headerlink" title="3.实际训练"></a>3.实际训练</h1><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">loss_list = []</span><br><span class="line">acc_list = []</span><br><span class="line"></span><br><span class="line">print(network)</span><br><span class="line">optimizer = optim.Adam(network.parameters(), lr=0.0001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)</span><br><span class="line">#schedule = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30,gamma=0.5)</span><br><span class="line"></span><br><span class="line">time_start = time.time()</span><br><span class="line">for epoch in range(epochs):</span><br><span class="line"> </span><br><span class="line"> total_correct = 0</span><br><span class="line"> total_loss = 0</span><br><span class="line"> for batch in train_loader: # Get batch</span><br><span class="line"> images, labels = batch # Unpack the batch into images and labels</span><br><span class="line"></span><br><span class="line"> preds = network(images) # Pass batch</span><br><span class="line"> loss = F.cross_entropy(preds, labels) # Calculate Loss</span><br><span class="line"></span><br><span class="line"> optimizer.zero_grad()</span><br><span class="line"> loss.backward() # Calculate gradients</span><br><span class="line"> optimizer.step() # Update weights</span><br><span class="line"></span><br><span class="line"> total_loss += loss.item()</span><br><span class="line"> total_correct += preds.argmax(dim=1).eq(labels).sum().item()</span><br><span class="line"> loss_list.append(total_loss)</span><br><span class="line"> acc_list.append(total_correct/(len(train_loader)*batch_size))</span><br><span class="line"> print('epoch:', epoch, "total_correct:", total_correct/(len(train_loader)*batch_size), "loss:", total_loss)</span><br><span class="line">time_end = time.time() - time_start</span><br><span class="line">print(time_end)</span><br><span class="line">print('>>> Training Complete >>>')</span><br></pre></td></tr></table></figure>
<p>训练过程就是把数据放到我们已经搭建好的模型里面跑一下,这里需要注意的是损失函数的设置,将会影响分类的准确度。</p>
<p>这里的total_correct是整体准确的个数,用它除以总数就是准确率了。</p>
]]></content>
<tags>
<tag>Deep Learning</tag>
</tags>
</entry>
<entry>
<title>深度学习——模型评价篇</title>
<url>/2022/08/15/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0-%E6%A8%A1%E5%9E%8B%E8%AF%84%E4%BC%B0%E7%AF%87/</url>
<content><![CDATA[<h1 id="1-测试集准确率的计算"><a href="#1-测试集准确率的计算" class="headerlink" title="1.测试集准确率的计算"></a>1.测试集准确率的计算</h1><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line"># 保存模型</span><br><span class="line">PATH = './NIR.pth'</span><br><span class="line">torch.save(network.state_dict(), PATH)</span><br><span class="line"># 加载模型</span><br><span class="line">network = Network()</span><br><span class="line">network.load_state_dict(torch.load(PATH))</span><br><span class="line"></span><br><span class="line">@torch.no_grad()</span><br><span class="line">def get_all_preds(model, loader):</span><br><span class="line"> all_preds = torch.tensor([])</span><br><span class="line"> print(len(loader))</span><br><span class="line"> for batch in loader:</span><br><span class="line"> images, labels = batch</span><br><span class="line"></span><br><span class="line"> preds = model(images)</span><br><span class="line"> all_preds = torch.cat((all_preds, preds) ,dim=0)</span><br><span class="line"></span><br><span class="line"> return all_preds</span><br><span class="line"></span><br><span class="line">test_preds = get_all_preds(network, test_loader)</span><br><span class="line">actual_labels_test = torch.Tensor(test_data.labels)</span><br><span class="line">preds_correct_test = test_preds.argmax(dim=1).eq(actual_labels_test).sum().item()</span><br><span class="line"></span><br><span class="line">print('total correct:', preds_correct_test)</span><br><span class="line">print('accuracy_test:', preds_correct_test / len(test_data))</span><br></pre></td></tr></table></figure>
<p>这里注意一个问题,我想知道我在训练集上的模型准确率,直接按照accuracy_test的计算方式进行计算,我适用的是train_data.laber。但是这里我发现,因为我设置了batch_size为16,而数据总数是415,所以我发现在经过batch_size后,我的train_loader为25个batch_size,即为400个,train_data为415,这个就是在batch_size无法整除数据时所造成的问题。</p>
<p>那么从另一个点上也就是说,我实际上只有400个数据参与了模型训练,而不是415个。</p>
<h1 id="2-输出最后一层的预测概率"><a href="#2-输出最后一层的预测概率" class="headerlink" title="2.输出最后一层的预测概率"></a>2.输出最后一层的预测概率</h1><p>调整输出的灵敏度和特异度的时候需要调整预测概率。验证集的训练方法如下:</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">@torch.no_grad()</span><br><span class="line">def get_all_preds(model, loader):</span><br><span class="line"> all_preds = torch.tensor([])</span><br><span class="line"> for batch in loader:</span><br><span class="line"> images, labels = batch</span><br><span class="line"> </span><br><span class="line"> preds = model(images)</span><br><span class="line"> all_preds = torch.cat((all_preds, preds) ,dim=0)</span><br><span class="line"></span><br><span class="line"> return all_preds</span><br><span class="line"></span><br><span class="line">test_preds = get_all_preds(network, test_loader)</span><br><span class="line">actual_labels_test = torch.Tensor(test_data.labels)</span><br><span class="line">preds_correct_test_list = []</span><br><span class="line">#preds_correct_test = torch.softmax(test_preds,dim=0)[1][0].cpu().item()</span><br><span class="line">for i in range(len(test_preds)):</span><br><span class="line"># preds_correct_test_list.append(torch.softmax(test_preds,dim=1)[i][0].cpu().item())</span><br><span class="line"> preds_correct_test_list.append(torch.softmax(test_preds,dim=1)[i][1].cpu().item())</span><br></pre></td></tr></table></figure>
<h1 id="3-手动计算灵敏度,特异度,准确率等参数"><a href="#3-手动计算灵敏度,特异度,准确率等参数" class="headerlink" title="3.手动计算灵敏度,特异度,准确率等参数"></a>3.手动计算灵敏度,特异度,准确率等参数</h1><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line"># 手动计算敏感度特异度</span><br><span class="line">sum_number_0 = 0</span><br><span class="line">sum_number_1 = 0</span><br><span class="line">for i in range(len(preds_correct_test_list)):</span><br><span class="line"> # 输入调整的概率</span><br><span class="line"> if preds_correct_test_list[i] > 0.00000000005:</span><br><span class="line"> preds_correct_test_list[i] = 1 </span><br><span class="line"> else:</span><br><span class="line"> preds_correct_test_list[i] = 0</span><br><span class="line"> </span><br><span class="line">TP,TN,FP,FN = 0,0,0,0</span><br><span class="line">for i in range(len(test_loader.dataset.labels)):</span><br><span class="line"> if (test_loader.dataset.labels[i]==0) and (preds_correct_test_list[i]==0):</span><br><span class="line"> TP += 1 </span><br><span class="line"> if (test_loader.dataset.labels[i]==0) and (preds_correct_test_list[i]==1):</span><br><span class="line"> FN += 1 </span><br><span class="line"> if (test_loader.dataset.labels[i]==1) and (preds_correct_test_list[i]==0):</span><br><span class="line"> FP += 1 </span><br><span class="line"> if (test_loader.dataset.labels[i]==1) and (preds_correct_test_list[i]==1):</span><br><span class="line"> TN += 1 </span><br><span class="line"></span><br><span class="line">accuracy = (TP+TN)/(TP+TN+FP+FN)</span><br><span class="line">precision = TP/(TP+FP)</span><br><span class="line"># 敏感度特异度</span><br><span class="line">sensitivity = TP/(TP+FN)</span><br><span class="line">specificity = TN/(TN+FP)</span><br><span class="line">print(accuracy,precision,sensitivity,specificity)</span><br></pre></td></tr></table></figure>
<h1 id="4-画一个matrix图,展示正确时错误的数据"><a href="#4-画一个matrix图,展示正确时错误的数据" class="headerlink" title="4.画一个matrix图,展示正确时错误的数据"></a>4.画一个matrix图,展示正确时错误的数据</h1><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line"># 画一个matrix图展示正确和错误的数据</span><br><span class="line">import itertools</span><br><span class="line">import numpy as np</span><br><span class="line">import matplotlib.pyplot as plt</span><br><span class="line"></span><br><span class="line">def plot_confusion_matrix(cm, classes,</span><br><span class="line"> normalize=False,</span><br><span class="line"> title='Confusion matrix',</span><br><span class="line"> cmap=plt.cm.Blues):</span><br><span class="line"> """</span><br><span class="line"> This function prints and plots the confusion matrix.</span><br><span class="line"> Normalization can be applied by setting `normalize=True`.</span><br><span class="line"> """</span><br><span class="line"> if normalize:</span><br><span class="line"> cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]</span><br><span class="line"> print("Normalized confusion matrix")</span><br><span class="line"> else:</span><br><span class="line"> print('Confusion matrix, without normalization')</span><br><span class="line"></span><br><span class="line"> print(cm)</span><br><span class="line"></span><br><span class="line"> plt.imshow(cm, interpolation='nearest', cmap=cmap)</span><br><span class="line"> plt.title(title)</span><br><span class="line"> plt.colorbar()</span><br><span class="line"> tick_marks = np.arange(len(classes))</span><br><span class="line"> plt.xticks(tick_marks, classes, rotation=45)</span><br><span class="line"> plt.yticks(tick_marks, classes)</span><br><span class="line"></span><br><span class="line"> fmt = '.2f' if normalize else 'd'</span><br><span class="line"> thresh = cm.max() / 2.</span><br><span class="line"> for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):</span><br><span class="line"> plt.text(j, i, format(cm[i, j], fmt),</span><br><span class="line"> horizontalalignment="center",</span><br><span class="line"> color="white" if cm[i, j] > thresh else "black")</span><br><span class="line"></span><br><span class="line"> plt.tight_layout()</span><br><span class="line"> plt.ylabel('True label')</span><br><span class="line"> plt.xlabel('Predicted label')</span><br><span class="line"></span><br><span class="line">cm = confusion_matrix(test_data.labels, test_preds.argmax(dim=1))</span><br><span class="line">classes = ('0','1')</span><br><span class="line">plt.figure(figsize=(10,10))</span><br><span class="line">plot_confusion_matrix(cm, classes)</span><br></pre></td></tr></table></figure>
<h1 id="5-画LOSS曲线图"><a href="#5-画LOSS曲线图" class="headerlink" title="5.画LOSS曲线图"></a>5.画LOSS曲线图</h1><h2 id="最简单的一种方法"><a href="#最简单的一种方法" class="headerlink" title="最简单的一种方法"></a>最简单的一种方法</h2><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">plt.plot(loss_list)</span><br><span class="line">plt.legend()</span><br><span class="line">plt.title('Compare loss for different models in training')</span><br></pre></td></tr></table></figure>
<h2 id="将LOSS和ACC画到一起的"><a href="#将LOSS和ACC画到一起的" class="headerlink" title="将LOSS和ACC画到一起的"></a>将LOSS和ACC画到一起的</h2><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">from mpl_toolkits.axes_grid1 import host_subplot</span><br><span class="line">def plot_acc_loss(loss, acc):</span><br><span class="line"> host = host_subplot(111) # row=1 col=1 first pic</span><br><span class="line"> plt.subplots_adjust(right=0.8) # ajust the right boundary of the plot window</span><br><span class="line"> par1 = host.twinx() # 共享x轴</span><br><span class="line"> </span><br><span class="line"> # set labels</span><br><span class="line"> host.set_xlabel("steps")</span><br><span class="line"> host.set_ylabel("train-loss")</span><br><span class="line"> par1.set_ylabel("train-accuracy")</span><br><span class="line"> </span><br><span class="line"> # plot curves</span><br><span class="line"> p1, = host.plot(range(len(loss)), loss, label="loss")</span><br><span class="line"> #p2, = par1.plot(range(len(acc)), acc, label="accuracy")</span><br><span class="line"> </span><br><span class="line"> # set location of the legend,</span><br><span class="line"> # 1->rightup corner, 2->leftup corner, 3->leftdown corner</span><br><span class="line"> # 4->rightdown corner, 5->rightmid ...</span><br><span class="line"> host.legend(loc=5)</span><br><span class="line"> </span><br><span class="line"> # set label color</span><br><span class="line"> host.axis["left"].label.set_color(p1.get_color())</span><br><span class="line"> par1.axis["right"].label.set_color(p2.get_color())</span><br><span class="line"> </span><br><span class="line"> # set the range of x axis of host and y axis of par1</span><br><span class="line"> # host.set_xlim([-200, 5200])</span><br><span class="line"> # par1.set_ylim([-0.1, 1.1])</span><br><span class="line"> </span><br><span class="line"> plt.draw()</span><br><span class="line"> plt.show()</span><br><span class="line"></span><br><span class="line">plot_acc_loss(loss_list, acc_list)</span><br></pre></td></tr></table></figure>]]></content>
<tags>
<tag>Deep Learning</tag>
</tags>
</entry>
<entry>
<title>社交网络分析方法以及基本图例</title>
<url>/2022/05/06/%E7%A4%BE%E4%BA%A4%E7%BD%91%E7%BB%9C%E5%88%86%E6%9E%90%E6%96%B9%E6%B3%95%E4%BB%A5%E5%8F%8A%E5%9F%BA%E6%9C%AC%E5%9B%BE%E4%BE%8B/</url>
<content><![CDATA[<h1 id="加载包和数据"><a href="#加载包和数据" class="headerlink" title="加载包和数据"></a>加载包和数据</h1><h2 id="使用数据相关包"><a href="#使用数据相关包" class="headerlink" title="使用数据相关包"></a>使用数据相关包</h2><p>我一般会使用pandas和os的包读取我的数据。</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">import pandas as pd</span><br><span class="line">import os</span><br><span class="line">import numpy as np</span><br><span class="line">from collections import Counter # 基本计数库</span><br></pre></td></tr></table></figure>
<h2 id="文件路径"><a href="#文件路径" class="headerlink" title="文件路径"></a>文件路径</h2><p>为方便数据和程序文件的移动,一般采用相对路径,可以使用“../”的形式,但建议使用os.getcwd获取当前程序路径,进行路径拼接</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">origin_path = os.getcwd()</span><br><span class="line"># 获取当前文件夹下所有的文件的列表</span><br><span class="line">File_list = os.listdir(origin_path)</span><br><span class="line">source_path = os.path.join(origin_path,"data.csv")</span><br></pre></td></tr></table></figure>
<h2 id="数据读取"><a href="#数据读取" class="headerlink" title="数据读取"></a>数据读取</h2><p>在数据读取时一般有几种数据格式:csv,xlsx,txt等。<br>数据也有几种不同的格式:GBK,UTF-8,ISO-8859-1等。<br>数据也有是否有表头,是否有行标签等。</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">df1 = pd.read_csv(source_path,header = None) # 无表头的读取方式</span><br><span class="line">df1 = pd.read_excel(source_path,encoding = 'utf-8') # 以utf-8格式存储的数据</span><br><span class="line"></span><br><span class="line"># 显示一些数据基本信息</span><br><span class="line">df1.columns # 显示数据列标签</span><br><span class="line">df1.iloc[i] # 以字典形式显示第i行</span><br></pre></td></tr></table></figure>
<h1 id="数据层面的操作"><a href="#数据层面的操作" class="headerlink" title="数据层面的操作"></a>数据层面的操作</h1><p>数据层面的操作更多依赖于pandas和numpy两个库,依赖的数据类型主要是DataFrame,array等。<br>原则:能批量处理的就批量处理,能使用库的就使用库,尽量避免使用for循环大量数据。</p>
<h2 id="分割矩阵"><a href="#分割矩阵" class="headerlink" title="分割矩阵"></a>分割矩阵</h2><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line"># 数据,等分份数,分割方式axis=0/1 上下/左右</span><br><span class="line">dx,b = np.split(data,2,axis=0) # 上下均等分离矩阵</span><br><span class="line">dy = b.reset_index(drop=True) # 对第二个矩阵重新赋index,不然会出现index缺失导致逻辑错误</span><br></pre></td></tr></table></figure>
<h2 id="新建矩阵"><a href="#新建矩阵" class="headerlink" title="新建矩阵"></a>新建矩阵</h2><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line"># 建立一个长度为Nob_dict的全0,int型矩阵</span><br><span class="line">Second_list = np.zeros((len(Nob_dict),len(Nob_dict)),dtype=np.int)</span><br><span class="line"># 为矩阵添加行列标签</span><br><span class="line">Frame_Second = pd.DataFrame(Second_list,columns = list(Nob_dict.values()),index = list(Nob_dict.values()))</span><br></pre></td></tr></table></figure>
<h2 id="获取数据标签的集合"><a href="#获取数据标签的集合" class="headerlink" title="获取数据标签的集合"></a>获取数据标签的集合</h2><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line"># 获取ID的字典,若不在ID字典当中,则扩充字典。max_dict为最后一个ID的序号,后续序号需要进行+1</span><br><span class="line">Nob_dict = {}</span><br><span class="line">for i in range(len(df1["Gotchi"])): # Gotchi为计数列</span><br><span class="line"> Nob_dict.update({df1["Gotchi"][i]:i+1})</span><br><span class="line"> max_dict = i+1</span><br></pre></td></tr></table></figure>
<h2 id="DataFrame上的批量操作"><a href="#DataFrame上的批量操作" class="headerlink" title="DataFrame上的批量操作"></a>DataFrame上的批量操作</h2><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">df2 = pd.DataFrame(columns=df1.columns)# columns可选,是否创建列名</span><br><span class="line">df2 = df2.append([dict(df1.iloc[i])],ignore_index=True) # 可以使用append的形式以行添加dict形式的一行数据ignore_index是以行形式添加</span><br><span class="line"></span><br><span class="line">boston = pd.concat([features,label],axis =1) # 合并按照列合并数据</span><br></pre></td></tr></table></figure>
<h2 id="数据排序"><a href="#数据排序" class="headerlink" title="数据排序"></a>数据排序</h2><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">sorted(list(boston['feature']),reverse=True) # 排序,以倒叙进行排序</span><br></pre></td></tr></table></figure>
<h2 id="计算特征数量"><a href="#计算特征数量" class="headerlink" title="计算特征数量"></a>计算特征数量</h2><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">Counter(df1['feature']).most_common # 按照数量最多的开始显示,可选参数为个数</span><br></pre></td></tr></table></figure>
<h2 id="计算相关系数和对应的显著性"><a href="#计算相关系数和对应的显著性" class="headerlink" title="计算相关系数和对应的显著性"></a>计算相关系数和对应的显著性</h2><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">r,p_value = stats.pearsonr(boston['feature1'],boston['feature2']) # 计算相关系数和对应的显著性</span><br><span class="line">print('feature1与feature1相关系数为{:.3f},p值为{:.5f}'.format(r,p_value)) # 相关系数保留3位小数,p值保留5位小数</span><br></pre></td></tr></table></figure>
<h2 id="透视表格重构"><a href="#透视表格重构" class="headerlink" title="透视表格重构"></a>透视表格重构</h2><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">buyer_seller = buyer_seller.pivot_table(index='feature', columns='index', values='Value').fillna(0)</span><br></pre></td></tr></table></figure>
<h2 id="计算平均度"><a href="#计算平均度" class="headerlink" title="计算平均度"></a>计算平均度</h2><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">import networkx as nx</span><br><span class="line"></span><br><span class="line"># 计算平均度</span><br><span class="line">edgeNum = 5000 # 边数</span><br><span class="line">nodeNum = 1878 # 节点数</span><br><span class="line">average_degree=edgeNum*2.0/nodeNum</span><br><span class="line">print("平均度:"+str(average_degree))</span><br><span class="line">degree_distribute=nx.degree_histogram(buyer_seller_or) # buyer_seller_or关系列</span><br><span class="line">x=range(len(degree_distribute))</span><br><span class="line">y=[z/float(sum(degree_distribute))for z in degree_distribute]</span><br><span class="line">plt.loglog(x,y)</span><br><span class="line">plt.show()</span><br></pre></td></tr></table></figure>
<p>参考链接: <a class="link" href="https://zhuanlan.zhihu.com/p/58594681" >Facebook社交网络的特征–基于小世界网络<i class="fas fa-external-link-alt"></i></a></p>
<h1 id="画图"><a href="#画图" class="headerlink" title="画图"></a>画图</h1><h2 id="绘制散点矩阵图"><a href="#绘制散点矩阵图" class="headerlink" title="绘制散点矩阵图"></a>绘制散点矩阵图</h2><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">import matplotlib.pyplot as plt</span><br><span class="line">import seaborn as seb</span><br><span class="line">seb.pairplot(data = boston,vars = [feature]) # feature是所选的特征</span><br><span class="line">plt.savefig('scatter fig.png',dpi=500,hue="species")#绘图结果存到本地</span><br></pre></td></tr></table></figure>
<p><img src="https://raw.githubusercontent.com/yangli-os/image-hosting/master/20220507/%E6%95%A3%E7%82%B9%E7%9F%A9%E9%98%B5%E5%9B%BE.3hotx4h0fvo0.webp" alt="散点矩阵图"></p>
<h2 id="绘制相关系数的热力图"><a href="#绘制相关系数的热力图" class="headerlink" title="绘制相关系数的热力图"></a>绘制相关系数的热力图</h2><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">import seaborn as seb</span><br><span class="line">r_pearson = boston.corr()</span><br><span class="line">#seb.heatmap(data = r_pearson)</span><br><span class="line">seb.heatmap(data = r_pearson,cmap="YlGnBu")</span><br></pre></td></tr></table></figure>
<p><img src="https://raw.githubusercontent.com/yangli-os/image-hosting/master/20220507/%E7%83%AD%E5%8A%9B%E5%9B%BE.563sw1l5neg0.webp" alt="热力图"></p>
<h3 id="绘制散点图"><a href="#绘制散点图" class="headerlink" title="绘制散点图"></a>绘制散点图</h3><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">plt.scatter([x for x in range(len(boston['feature']))],boston['feature']) #绘制y的曲线</span><br><span class="line">plt.show()</span><br></pre></td></tr></table></figure>
<h3 id="绘制正太曲线图"><a href="#绘制正太曲线图" class="headerlink" title="绘制正太曲线图"></a>绘制正太曲线图</h3><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">import matplotlib.mlab as mlab</span><br><span class="line">import seaborn as sns</span><br><span class="line"></span><br><span class="line">import warnings</span><br><span class="line">warnings.filterwarnings('ignore') # 不发出警告</span><br><span class="line"></span><br><span class="line">sns.set(context='notebook',font='simhei',style='whitegrid')# 设置风格尺度和显示中文</span><br><span class="line">plt.rcParams['axes.unicode_minus']=False # 用来正常显示负号</span><br><span class="line"> </span><br><span class="line"># 直方图</span><br><span class="line">from scipy.stats import norm # 使用直方图和最大似然高斯分布拟合绘制分布</span><br><span class="line"></span><br><span class="line">s=np.log(list(boston['feature'])) # 特征数据np.log(list)幂律分布</span><br><span class="line">s=boston['feature'] # 特征数据正太分布</span><br><span class="line"></span><br><span class="line">mu =np.mean(s) # 计算均值 </span><br><span class="line">sigma =np.std(s) </span><br><span class="line">num_bins = len(s) # 直方图柱子的数量 </span><br><span class="line">n, bins, patches = plt.hist(s, num_bins, density=1, facecolor='blue', alpha=0.55,width = 0.0050) </span><br><span class="line"># 直方图函数,x为x轴的值,normed=1表示为概率密度,即和为一,绿色方块,色深参数0.55,返回n个概率,直方块左边线的x值,及各个方块对象 </span><br><span class="line">y = norm.pdf(bins, mu, sigma)#拟合一条最佳正态分布曲线y</span><br><span class="line">#str='Histogram : $\mu=5.8433$'+str(mu)+',$\sigma=0.8253$';</span><br><span class="line">plt.grid(False) # 无网格</span><br><span class="line">plt.plot(bins, y, 'r--') # 绘制y的曲线 </span><br><span class="line">plt.xlabel('x-label') # 绘制x轴 </span><br><span class="line">plt.ylabel('y-label') #绘制y轴 </span><br><span class="line">plt.title(r'Histogram : $\mu={}$,$\sigma={}$'.format(mu,sigma)) # 在题目中显示mu与sigma</span><br><span class="line"></span><br><span class="line">#plt.subplots_adjust(left=0.15) # 左边距 </span><br><span class="line">plt.savefig('Probability fig.png',dpi=1000,bbox_inches = 'tight') # 绘图结果存到本地</span><br><span class="line">plt.show()</span><br></pre></td></tr></table></figure>
<p><img src="https://raw.githubusercontent.com/yangli-os/image-hosting/master/20220507/%E6%AD%A3%E5%A4%AA%E6%9B%B2%E7%BA%BF%E5%9B%BE.7jimwt3v0jg0.webp" alt="正太曲线图"></p>
<h2 id="保存数据"><a href="#保存数据" class="headerlink" title="保存数据"></a>保存数据</h2><figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">pd.DataFrame(save_data).to_csv(save_path) # save_data可以是list形式的二维表格</span><br><span class="line">save_date.to_excel(save_path,index = False)# save_data是DataFrame格式的数据# index = False表示没有行标签</span><br></pre></td></tr></table></figure>
<h2 id="生成关系网络"><a href="#生成关系网络" class="headerlink" title="生成关系网络"></a>生成关系网络</h2><p>使用Gephi或者network可以生成关系网络,直接加载两个关系列,然后需要调整K-核心和巨人组件。</p>
]]></content>
<tags>
<tag>Python</tag>
</tags>
</entry>
</search>