第 1 节 概述
在本文中,使用全连接神经网络进行简单的分类和回归。
第 2 节 回归
以 scikit-learn 上的加州房价数据集为例说明,加州房价数据集是一个用于回归计算的数据集:
1
2
3
|
from sklearn.datasets import fetch_california_housing
X, y = fetch_california_housing(return_X_y=True)
X.shape, y.shape
|
((20640, 8), (20640,))
2.1 数据预处理
首先进行数据集划分,80% 数据作为训练集,20% 数据作为测试集。
1
2
3
|
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,shuffle=True)
X_train.shape, X_test.shape, y_train.shape, y_test.shape
|
((16512, 8), (4128, 8), (16512,), (4128,))
对训练数据和测试数据都进行标准化处理,统一量纲。
因为训练数据是已知的,可以获取数据分布,而测试数据是未知的,不能获取数据分布,所以:
- 对训练数据做
fit_transform 操作
- 对测试数据做
transform 操作
1
2
3
4
5
6
7
|
from sklearn.preprocessing import StandardScaler
x_scaler = StandardScaler()
X_train_scaled = x_scaler.fit_transform(X_train)
X_test_scaled = x_scaler.transform(X_test)
y_scaler = StandardScaler()
y_train_scaled = y_scaler.fit_transform(y_train.reshape(-1,1))
y_test_scaled = y_scaler.transform(y_test.reshape(-1,1))
|
2.2 机器学习对照
传统机器学习中,随机森林比较适合用来解决加州房价的回归问题,用来和全连接神经网络做对照。
1
2
3
4
5
6
7
|
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
regressor = RandomForestRegressor()
regressor.fit(X_train_scaled,y_train)
y_pred = regressor.predict(X_test_scaled)
loss = mean_squared_error(y_pred,y_test)
loss
|
0.26758457404360897
2.3 构建数据集
在使用全连接神经网络进行训练时,为了防止过拟合,使用了"早停"的策略。
这里对训练数据再次划分,80% 为训练集,20% 为验证集。
1
2
3
|
import torch
X_train_scaled,X_val_scaled,y_train_scaled,y_val_scaled = train_test_split(X_train_scaled,y_train_scaled,test_size=0.2,shuffle=True)
X_train_scaled,X_val_scaled, X_test_scaled, y_train_scaled,y_val_scaled, y_test_scaled = torch.tensor(X_train_scaled,dtype=torch.float32),torch.tensor(X_val_scaled,dtype=torch.float32),torch.tensor(X_test_scaled,dtype=torch.float32), torch.tensor(y_train_scaled,dtype=torch.float32), torch.tensor(y_val_scaled,dtype=torch.float32),torch.tensor(y_test_scaled,dtype=torch.float32)
|
创建 DataLoader 实现按批量自动加载数据集。
batch_size 设置为 256。
- 训练集和验证集
shuffle 设置为 True,每个 epoch 会自动重排,测试集无需重排。
- 由于数据都存储在内存中,
num_workers 设置为 0。
- 设置
pin_memory 使用页锁定内存,内存不会被 OS swap,GPU 可以直接 DMA 读取,拷贝速度更快。
1
2
3
4
5
6
7
|
from torch.utils.data import TensorDataset, DataLoader
train_dataset = TensorDataset(X_train_scaled,y_train_scaled)
train_loader = DataLoader(train_dataset,batch_size=256,shuffle=True,num_workers=0,pin_memory=True)
val_dataset = TensorDataset(X_val_scaled,y_val_scaled)
val_loader = DataLoader(val_dataset,batch_size=256,shuffle=True,num_workers=0,pin_memory=True)
test_dataset = TensorDataset(X_test_scaled,y_test_scaled)
test_loader = DataLoader(test_dataset,batch_size=256,shuffle=False,num_workers=0,pin_memory=True)
|
2.4 构建网络
创建全连接神经网络,并且把模型放到 GPU 上。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
from torch import nn
import torch.nn.init as init
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = nn.Sequential(
nn.Linear(8,64),
nn.ReLU(),
nn.Linear(64,32),
nn.ReLU(),
nn.Linear(32,1))
for layer in model:
if isinstance(layer, nn.Linear):
init.kaiming_uniform_(layer.weight, nonlinearity='relu')
if layer.bias is not None:
nn.init.zeros_(layer.bias)
model.to(device)
|
Sequential(
(0): Linear(in_features=8, out_features=64, bias=True)
(1): ReLU()
(2): Linear(in_features=64, out_features=32, bias=True)
(3): ReLU()
(4): Linear(in_features=32, out_features=1, bias=True)
)
常见的初始化方式有两种,一种是恺明初始化,另一种是 xavier 初始化。
- 恺明初始化:适用于激活函数为 relu
- xavier 初始化:适用于激活函数为 tanh / sigmoid
1
2
3
4
5
|
init.kaiming_uniform_(layer.weight, nonlinearity='relu') # 可取 relu、leaky_relu、selu 等
init.zeros_(layer.bias)
init.xavier_uniform_(layer.weight)
init.zeros_(layer.bias)
|
Parameter containing:
tensor([0.], device='cuda:0', requires_grad=True)
2.5 设置优化器和损失函数
优化器使用 Adam,学习率为 10^-3,并设置 L2 正则防止过拟合,损失函数用均方误差损失。
1
2
3
|
import torch.optim as optim
optimizer = optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-4)
criterion = nn.MSELoss()
|
2.6 编写训练代码
设置最大 epoch 为 300,设置"早停",20 个 epoch 验证集没有优化就停止训练。
non_blocking 是异步拷贝,在数据从 CPU 传输到 GPU 的过程中,GPU 可以并行训练。需要在 DataLoader 开启 pin_memory=True
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
|
max_epochs = 300
# 早停
patience_counter = 0
patience = 20
best_loss = float('inf')
for epoch in range(max_epochs):
model.train()
train_loss = 0
for x, y in train_loader:
x, y = x.to(device, non_blocking=True), y.to(device, non_blocking=True)
optimizer.zero_grad()
y_hat = model(x)
loss = criterion(y_hat, y)
train_loss += loss.item()
loss.backward()
optimizer.step()
train_loss = train_loss / len(train_loader)
model.eval()
val_loss = 0
with torch.no_grad():
for x,y in val_loader:
x, y = x.to(device, non_blocking=True), y.to(device, non_blocking=True)
y_hat = model(x)
loss = criterion(y_hat,y)
val_loss += loss.item()
val_loss /= len(val_loader)
if val_loss < best_loss:
best_loss = val_loss
patience_counter = 0
torch.save(model.state_dict(), 'best_model.pth')
else:
patience_counter += 1
if patience_counter >= patience:
print(f"Early stopping at epoch {epoch + 1}")
break
if (epoch + 1) % 10 == 0:
print(f"epoch {epoch + 1}: train_loss: {train_loss},val_loss: {val_loss}")
test_loss = 0
for x,y in test_loader:
x, y = x.to(device, non_blocking=True), y.to(device, non_blocking=True)
y_hat = model(x)
loss = criterion(y_hat,y)
test_loss += loss.item()
test_loss /= len(test_loader)
print(f"epoch {epoch + 1}: train_loss: {train_loss},val_loss: {val_loss},test_loss:{test_loss}")
|
epoch 10: train_loss: 0.29649700740208995,val_loss: 0.28805418656422543
epoch 20: train_loss: 0.2750496683785549,val_loss: 0.26191359758377075
epoch 30: train_loss: 0.2403819847565431,val_loss: 0.2530410255377109
epoch 40: train_loss: 0.2293749749660492,val_loss: 0.23420465336396143
epoch 50: train_loss: 0.2189344371167513,val_loss: 0.2368529702608402
epoch 60: train_loss: 0.21395021619705054,val_loss: 0.22983338511907137
epoch 70: train_loss: 0.21009109541773796,val_loss: 0.22714151327426618
epoch 80: train_loss: 0.20461482526018068,val_loss: 0.2212988195511011
epoch 90: train_loss: 0.2007397161080287,val_loss: 0.2203361988067627
epoch 100: train_loss: 0.20478695802963698,val_loss: 0.22049094621951765
epoch 110: train_loss: 0.19856098007697326,val_loss: 0.22147137041275317
epoch 120: train_loss: 0.1933052376485788,val_loss: 0.2151113244203421
epoch 130: train_loss: 0.19220517102915508,val_loss: 0.2174885834638889
epoch 140: train_loss: 0.1903463828449066,val_loss: 0.21797557977529672
epoch 150: train_loss: 0.18772307095619348,val_loss: 0.21591514692856714
epoch 160: train_loss: 0.18299155925902036,val_loss: 0.21724496896450335
epoch 170: train_loss: 0.19431308141121498,val_loss: 0.22148504165502694
Early stopping at epoch 175
epoch 175: train_loss: 0.18683575093746185,val_loss: 0.21517686889721796,test_loss:0.20920205905156977
第 3 节 分类
以 Fashion-Minist 数据集为例说明:
Fashion-Minist 中包含的 10 个类别,分别为 t-shirt(T 恤)、trouser(裤子)、pullover(套衫)、dress(连衣裙)、coat(外套)、sandal(凉鞋)、shirt(衬衫)、sneaker(运动鞋)、bag(包)和 ankle boot(短靴)。
1
2
3
4
5
|
import torchvision
from torchvision.transforms import Compose, ToTensor, Normalize
trans = ToTensor()
mnist_train = torchvision.datasets.FashionMNIST(root="../data",transform=trans,train=True,download=True)
mnist_test = torchvision.datasets.FashionMNIST(root="../data",transform=trans,train=False, download=True)
|
训练数据集中有的 6000 张图像,测试数据集中有 10000 张图像。
1
|
len(mnist_train), len(mnist_test)
|
(60000, 10000)
每张图像的高度和宽度均为 28 像素,数据集由灰度图像组成,其通道数为 1。
1
|
mnist_train[0][0].shape
|
torch.Size([1, 28, 28])
3.1 数据预处理 & 构建数据集
编写数据预处理函数:
ToTensor: 将图像的颜色从 0~255 变为 0~1,并转为张量 [N,C,H,W]
Normalize: 对图像进行标准化,颜色从 0~1 变为 -1~1
数据集划分为训练集、验证集和测试集。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
from torch.utils.data import random_split
def load_data_fashion_mnist(batch_size, val_ratio=0.2):
trans = Compose([ToTensor(),Normalize((0.5,), (0.5,))])
total_dataset = torchvision.datasets.FashionMNIST(
root="../data", train=True, transform=trans, download=True)
test_dataset = torchvision.datasets.FashionMNIST(
root="../data", train=False, transform=trans, download=True)
total_size = len(total_dataset)
val_size = int(total_size * val_ratio)
train_size = total_size - val_size
train_dataset, val_dataset = random_split(
total_dataset, [train_size, val_size]
)
train_loader = DataLoader(
train_dataset, batch_size=batch_size, shuffle=True, num_workers=4, pin_memory=True
)
val_loader = DataLoader(
val_dataset, batch_size=batch_size, shuffle=False, num_workers=4, pin_memory=True
)
test_loader = DataLoader(
test_dataset, batch_size=batch_size, shuffle=False, num_workers=4, pin_memory=True
)
return (train_loader,val_loader,test_loader)
batch_size = 256
train_loader, val_loader, test_loader = load_data_fashion_mnist(batch_size)
|
3.2 构建网络
使用全连接神经网络构建:
- 首先使用
Flatten 将 28*28 的图像转为 784 的一维向量
- 使用
Dropout 防止过拟合
- 最后,网络有 10 个输出值,其中最大的就是其类别
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = nn.Sequential(
nn.Flatten(),
nn.Linear(784, 512),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(512, 256),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(256, 10))
for layer in model:
if isinstance(layer, nn.Linear):
nn.init.kaiming_uniform_(layer.weight, nonlinearity='relu')
nn.init.zeros_(layer.bias)
model.to(device)
|
Sequential(
(0): Flatten(start_dim=1, end_dim=-1)
(1): Linear(in_features=784, out_features=512, bias=True)
(2): ReLU()
(3): Dropout(p=0.2, inplace=False)
(4): Linear(in_features=512, out_features=256, bias=True)
(5): ReLU()
(6): Dropout(p=0.2, inplace=False)
(7): Linear(in_features=256, out_features=10, bias=True)
)
3.3 设置优化器和损失函数
优化器使用 Adam,学习率为 10^-3,并设置 L2 正则防止过拟合,损失函数用均方误差损失。
1
2
|
optimizer = optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-4)
criterion = nn.CrossEntropyLoss()
|
3.4 编写训练代码
设置最大 epoch 为 50,设置"早停",10 个 epoch 验证集没有优化就停止训练。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
|
max_epochs = 50
# 早停
patience_counter = 0
patience = 10
best_loss = float('inf')
for epoch in range(max_epochs):
model.train()
train_samples = 0
train_loss = 0
train_acc = 0
for x, y in train_loader:
x, y = x.to(device, non_blocking=True), y.to(device, non_blocking=True)
y_hat = model(x)
loss = criterion(y_hat, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
train_samples += len(x)
train_loss += loss.item() * len(x)
pred = torch.argmax(y_hat, axis=1)
train_acc += (pred == y).sum().item()
train_loss = train_loss / train_samples
train_acc = train_acc / train_samples
model.eval()
val_samples = 0
val_loss = 0
val_acc = 0
with torch.no_grad():
for x,y in val_loader:
x, y = x.to(device, non_blocking=True), y.to(device, non_blocking=True)
y_hat = model(x)
loss = criterion(y_hat, y)
val_samples += len(x)
val_loss += loss.item() * len(x)
pred = torch.argmax(y_hat, axis=1)
val_acc += (pred == y).sum().item()
val_loss = val_loss / val_samples
val_acc = val_acc / val_samples
if val_loss < best_loss:
best_loss = val_loss
patience_counter = 0
torch.save(model.state_dict(), 'best_model.pth')
else:
patience_counter += 1
if patience_counter >= patience:
print(f"Early stopping at epoch {epoch + 1}")
break
if (epoch + 1) % 5 == 0:
print(f"epoch {epoch + 1}: train_loss: {train_loss}, train_acc: {train_acc}, val_loss: {val_loss}, val_acc: {val_acc}")
test_samples = 0
test_loss = 0
test_acc = 0
for x,y in test_loader:
x, y = x.to(device, non_blocking=True), y.to(device, non_blocking=True)
y_hat = model(x)
loss = criterion(y_hat,y)
test_samples += len(x)
test_loss += loss.item() * len(x)
pred = torch.argmax(y_hat, axis=1)
test_acc += (pred == y).sum().item()
test_loss = test_loss / test_samples
test_acc = test_acc / test_samples
print(f"epoch {epoch + 1}: train_loss: {train_loss}, train_acc: {train_acc}, val_loss: {val_loss}, val_acc: {val_acc}, test_loss: {test_loss}, test_acc: {test_acc}")
|
epoch 5: train_loss: 0.3285662808418274, train_acc: 0.878875, val_loss: 0.3343446226119995, val_acc: 0.8786666666666667
epoch 10: train_loss: 0.2763984892368317, train_acc: 0.8970625, val_loss: 0.3130743578275045, val_acc: 0.8841666666666667
epoch 15: train_loss: 0.24112992405891417, train_acc: 0.9097916666666667, val_loss: 0.31883913882573445, val_acc: 0.8863333333333333
epoch 20: train_loss: 0.22001066426436106, train_acc: 0.9165833333333333, val_loss: 0.3008507702350616, val_acc: 0.8915833333333333
epoch 25: train_loss: 0.19760130242506663, train_acc: 0.9248958333333334, val_loss: 0.31735219049453733, val_acc: 0.8919166666666667
epoch 30: train_loss: 0.1833056865533193, train_acc: 0.9304791666666666, val_loss: 0.2940208122730255, val_acc: 0.8993333333333333
epoch 35: train_loss: 0.17255648855368297, train_acc: 0.93425, val_loss: 0.30067479848861695, val_acc: 0.8950833333333333
Early stopping at epoch 40
epoch 40: train_loss: 0.15898222970962525, train_acc: 0.9389166666666666, val_loss: 0.3018113072713216, val_acc: 0.9005833333333333, test_loss: 0.3316642808914185, test_acc: 0.8966