Pytorch 全连接神经网络

第 1 节 概述

在本文中,使用全连接神经网络进行简单的分类和回归。

第 2 节 回归

scikit-learn 上的加州房价数据集为例说明,加州房价数据集是一个用于回归计算的数据集:

  • 输入特征一共 8 条
  • 输出为加州房价
1
2
3
from sklearn.datasets import fetch_california_housing
X, y = fetch_california_housing(return_X_y=True)
X.shape, y.shape
((20640, 8), (20640,))

2.1 数据预处理

首先进行数据集划分,80% 数据作为训练集,20% 数据作为测试集。

1
2
3
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,shuffle=True)
X_train.shape, X_test.shape, y_train.shape, y_test.shape
((16512, 8), (4128, 8), (16512,), (4128,))

对训练数据和测试数据都进行标准化处理,统一量纲。 因为训练数据是已知的,可以获取数据分布,而测试数据是未知的,不能获取数据分布,所以:

  • 对训练数据做 fit_transform 操作
  • 对测试数据做 transform 操作
1
2
3
4
5
6
7
from sklearn.preprocessing import StandardScaler
x_scaler = StandardScaler()
X_train_scaled = x_scaler.fit_transform(X_train)
X_test_scaled = x_scaler.transform(X_test)
y_scaler = StandardScaler()
y_train_scaled = y_scaler.fit_transform(y_train.reshape(-1,1))
y_test_scaled = y_scaler.transform(y_test.reshape(-1,1))

2.2 机器学习对照

传统机器学习中,随机森林比较适合用来解决加州房价的回归问题,用来和全连接神经网络做对照。

1
2
3
4
5
6
7
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
regressor = RandomForestRegressor()
regressor.fit(X_train_scaled,y_train)
y_pred = regressor.predict(X_test_scaled)
loss = mean_squared_error(y_pred,y_test)
loss
0.26758457404360897

2.3 构建数据集

在使用全连接神经网络进行训练时,为了防止过拟合,使用了"早停"的策略。

这里对训练数据再次划分,80% 为训练集,20% 为验证集。

1
2
3
import torch
X_train_scaled,X_val_scaled,y_train_scaled,y_val_scaled = train_test_split(X_train_scaled,y_train_scaled,test_size=0.2,shuffle=True)
X_train_scaled,X_val_scaled, X_test_scaled, y_train_scaled,y_val_scaled, y_test_scaled = torch.tensor(X_train_scaled,dtype=torch.float32),torch.tensor(X_val_scaled,dtype=torch.float32),torch.tensor(X_test_scaled,dtype=torch.float32), torch.tensor(y_train_scaled,dtype=torch.float32), torch.tensor(y_val_scaled,dtype=torch.float32),torch.tensor(y_test_scaled,dtype=torch.float32)

创建 DataLoader 实现按批量自动加载数据集。

  • batch_size 设置为 256。
  • 训练集和验证集 shuffle 设置为 True,每个 epoch 会自动重排,测试集无需重排。
  • 由于数据都存储在内存中,num_workers 设置为 0。
  • 设置 pin_memory 使用页锁定内存,内存不会被 OS swap,GPU 可以直接 DMA 读取,拷贝速度更快。
1
2
3
4
5
6
7
from torch.utils.data import TensorDataset, DataLoader
train_dataset = TensorDataset(X_train_scaled,y_train_scaled)
train_loader = DataLoader(train_dataset,batch_size=256,shuffle=True,num_workers=0,pin_memory=True)
val_dataset = TensorDataset(X_val_scaled,y_val_scaled)
val_loader = DataLoader(val_dataset,batch_size=256,shuffle=True,num_workers=0,pin_memory=True)
test_dataset = TensorDataset(X_test_scaled,y_test_scaled)
test_loader = DataLoader(test_dataset,batch_size=256,shuffle=False,num_workers=0,pin_memory=True)

2.4 构建网络

创建全连接神经网络,并且把模型放到 GPU 上。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
from torch import nn
import torch.nn.init as init
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = nn.Sequential(
    nn.Linear(8,64),
    nn.ReLU(),
    nn.Linear(64,32),
    nn.ReLU(),
    nn.Linear(32,1))
for layer in model:
    if isinstance(layer, nn.Linear):
        init.kaiming_uniform_(layer.weight, nonlinearity='relu')
        if layer.bias is not None:
            nn.init.zeros_(layer.bias)
model.to(device)
Sequential(
  (0): Linear(in_features=8, out_features=64, bias=True)
  (1): ReLU()
  (2): Linear(in_features=64, out_features=32, bias=True)
  (3): ReLU()
  (4): Linear(in_features=32, out_features=1, bias=True)
)

常见的初始化方式有两种,一种是恺明初始化,另一种是 xavier 初始化。

  • 恺明初始化:适用于激活函数为 relu
  • xavier 初始化:适用于激活函数为 tanh / sigmoid
1
2
3
4
5
init.kaiming_uniform_(layer.weight, nonlinearity='relu')  # 可取 relu、leaky_relu、selu 等 
init.zeros_(layer.bias)

init.xavier_uniform_(layer.weight)
init.zeros_(layer.bias)
Parameter containing:
tensor([0.], device='cuda:0', requires_grad=True)

2.5 设置优化器和损失函数

优化器使用 Adam,学习率为 10^-3,并设置 L2 正则防止过拟合,损失函数用均方误差损失。

1
2
3
import torch.optim as optim
optimizer = optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-4)
criterion = nn.MSELoss()

2.6 编写训练代码

设置最大 epoch 为 300,设置"早停",20 个 epoch 验证集没有优化就停止训练。

non_blocking 是异步拷贝,在数据从 CPU 传输到 GPU 的过程中,GPU 可以并行训练。需要在 DataLoader 开启 pin_memory=True

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
max_epochs = 300
# 早停
patience_counter = 0
patience = 20
best_loss = float('inf')

for epoch in range(max_epochs):
    model.train()
    train_loss = 0
    for x, y in train_loader:
        x, y = x.to(device, non_blocking=True), y.to(device, non_blocking=True)
        optimizer.zero_grad()
        y_hat = model(x)
        loss = criterion(y_hat, y)
        train_loss += loss.item()
        loss.backward()
        optimizer.step()
    train_loss = train_loss / len(train_loader)
    model.eval()
    val_loss = 0
    with torch.no_grad():
        for x,y in val_loader:
            x, y = x.to(device, non_blocking=True), y.to(device, non_blocking=True)
            y_hat = model(x)
            loss = criterion(y_hat,y)
            val_loss += loss.item()
        val_loss /= len(val_loader)
        if val_loss < best_loss:
            best_loss = val_loss
            patience_counter = 0
            torch.save(model.state_dict(), 'best_model.pth')
        else:
            patience_counter += 1
            if patience_counter >= patience:
                print(f"Early stopping at epoch {epoch + 1}")
                break
    if (epoch + 1) % 10 == 0:
        print(f"epoch {epoch + 1}: train_loss: {train_loss},val_loss: {val_loss}")

test_loss = 0
for x,y in test_loader:
    x, y = x.to(device, non_blocking=True), y.to(device, non_blocking=True)
    y_hat = model(x)
    loss = criterion(y_hat,y)
    test_loss += loss.item()
test_loss /= len(test_loader)
print(f"epoch {epoch + 1}: train_loss: {train_loss},val_loss: {val_loss},test_loss:{test_loss}")
epoch 10: train_loss: 0.29649700740208995,val_loss: 0.28805418656422543
epoch 20: train_loss: 0.2750496683785549,val_loss: 0.26191359758377075
epoch 30: train_loss: 0.2403819847565431,val_loss: 0.2530410255377109
epoch 40: train_loss: 0.2293749749660492,val_loss: 0.23420465336396143
epoch 50: train_loss: 0.2189344371167513,val_loss: 0.2368529702608402
epoch 60: train_loss: 0.21395021619705054,val_loss: 0.22983338511907137
epoch 70: train_loss: 0.21009109541773796,val_loss: 0.22714151327426618
epoch 80: train_loss: 0.20461482526018068,val_loss: 0.2212988195511011
epoch 90: train_loss: 0.2007397161080287,val_loss: 0.2203361988067627
epoch 100: train_loss: 0.20478695802963698,val_loss: 0.22049094621951765
epoch 110: train_loss: 0.19856098007697326,val_loss: 0.22147137041275317
epoch 120: train_loss: 0.1933052376485788,val_loss: 0.2151113244203421
epoch 130: train_loss: 0.19220517102915508,val_loss: 0.2174885834638889
epoch 140: train_loss: 0.1903463828449066,val_loss: 0.21797557977529672
epoch 150: train_loss: 0.18772307095619348,val_loss: 0.21591514692856714
epoch 160: train_loss: 0.18299155925902036,val_loss: 0.21724496896450335
epoch 170: train_loss: 0.19431308141121498,val_loss: 0.22148504165502694
Early stopping at epoch 175
epoch 175: train_loss: 0.18683575093746185,val_loss: 0.21517686889721796,test_loss:0.20920205905156977

第 3 节 分类

以 Fashion-Minist 数据集为例说明:

Fashion-Minist 中包含的 10 个类别,分别为 t-shirt(T 恤)、trouser(裤子)、pullover(套衫)、dress(连衣裙)、coat(外套)、sandal(凉鞋)、shirt(衬衫)、sneaker(运动鞋)、bag(包)和 ankle boot(短靴)。

1
2
3
4
5
import torchvision
from torchvision.transforms import Compose, ToTensor, Normalize
trans = ToTensor()
mnist_train = torchvision.datasets.FashionMNIST(root="../data",transform=trans,train=True,download=True)
mnist_test = torchvision.datasets.FashionMNIST(root="../data",transform=trans,train=False, download=True)

训练数据集中有的 6000 张图像,测试数据集中有 10000 张图像。

1
len(mnist_train), len(mnist_test)
(60000, 10000)

每张图像的高度和宽度均为 28 像素,数据集由灰度图像组成,其通道数为 1。

1
mnist_train[0][0].shape
torch.Size([1, 28, 28])

3.1 数据预处理 & 构建数据集

编写数据预处理函数:

  • ToTensor: 将图像的颜色从 0~255 变为 0~1,并转为张量 [N,C,H,W]
  • Normalize: 对图像进行标准化,颜色从 0~1 变为 -1~1

数据集划分为训练集、验证集和测试集。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
from torch.utils.data import random_split
def load_data_fashion_mnist(batch_size, val_ratio=0.2):
    trans = Compose([ToTensor(),Normalize((0.5,), (0.5,))])
    total_dataset = torchvision.datasets.FashionMNIST(
        root="../data", train=True, transform=trans, download=True)
    test_dataset = torchvision.datasets.FashionMNIST(
        root="../data", train=False, transform=trans, download=True)
    
    total_size = len(total_dataset)
    val_size = int(total_size * val_ratio)
    train_size = total_size - val_size
    train_dataset, val_dataset = random_split(
        total_dataset, [train_size, val_size]
    )
    train_loader = DataLoader(
        train_dataset, batch_size=batch_size, shuffle=True, num_workers=4, pin_memory=True
    )
    val_loader = DataLoader(
        val_dataset, batch_size=batch_size, shuffle=False, num_workers=4, pin_memory=True
    )
    test_loader = DataLoader(
        test_dataset, batch_size=batch_size, shuffle=False, num_workers=4, pin_memory=True
    )
    
    return (train_loader,val_loader,test_loader)
batch_size = 256
train_loader, val_loader, test_loader = load_data_fashion_mnist(batch_size)

3.2 构建网络

使用全连接神经网络构建:

  • 首先使用 Flatten28*28 的图像转为 784 的一维向量
  • 使用 Dropout 防止过拟合
  • 最后,网络有 10 个输出值,其中最大的就是其类别
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = nn.Sequential(
    nn.Flatten(),
    nn.Linear(784, 512),
    nn.ReLU(),
    nn.Dropout(0.2),
    nn.Linear(512, 256),
    nn.ReLU(),
    nn.Dropout(0.2),
    nn.Linear(256, 10))
for layer in model:
    if isinstance(layer, nn.Linear):
        nn.init.kaiming_uniform_(layer.weight, nonlinearity='relu')
        nn.init.zeros_(layer.bias)
model.to(device)
Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=784, out_features=512, bias=True)
  (2): ReLU()
  (3): Dropout(p=0.2, inplace=False)
  (4): Linear(in_features=512, out_features=256, bias=True)
  (5): ReLU()
  (6): Dropout(p=0.2, inplace=False)
  (7): Linear(in_features=256, out_features=10, bias=True)
)

3.3 设置优化器和损失函数

优化器使用 Adam,学习率为 10^-3,并设置 L2 正则防止过拟合,损失函数用均方误差损失。

1
2
optimizer = optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-4)
criterion = nn.CrossEntropyLoss()

3.4 编写训练代码

设置最大 epoch 为 50,设置"早停",10 个 epoch 验证集没有优化就停止训练。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
max_epochs = 50
# 早停
patience_counter = 0
patience = 10
best_loss = float('inf')

for epoch in range(max_epochs):
    model.train()
    train_samples = 0
    train_loss = 0
    train_acc = 0
    for x, y in train_loader:
        x, y = x.to(device, non_blocking=True), y.to(device, non_blocking=True)
        y_hat = model(x)
        loss = criterion(y_hat, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        train_samples += len(x)
        train_loss += loss.item() * len(x)
        pred = torch.argmax(y_hat, axis=1)
        train_acc += (pred == y).sum().item()
    train_loss = train_loss / train_samples
    train_acc = train_acc / train_samples
    model.eval()
    val_samples = 0
    val_loss = 0
    val_acc = 0
    with torch.no_grad():
        for x,y in val_loader:
            x, y = x.to(device, non_blocking=True), y.to(device, non_blocking=True)
            y_hat = model(x)
            loss = criterion(y_hat, y)
            val_samples += len(x)
            val_loss += loss.item() * len(x)
            pred = torch.argmax(y_hat, axis=1)
            val_acc += (pred == y).sum().item()
        val_loss = val_loss / val_samples
        val_acc = val_acc / val_samples
        if val_loss < best_loss:
            best_loss = val_loss
            patience_counter = 0
            torch.save(model.state_dict(), 'best_model.pth')
        else:
            patience_counter += 1
            if patience_counter >= patience:
                print(f"Early stopping at epoch {epoch + 1}")
                break
    if (epoch + 1) % 5 == 0:
        print(f"epoch {epoch + 1}: train_loss: {train_loss}, train_acc: {train_acc}, val_loss: {val_loss}, val_acc: {val_acc}")

test_samples = 0
test_loss = 0
test_acc = 0
for x,y in test_loader:
    x, y = x.to(device, non_blocking=True), y.to(device, non_blocking=True)
    y_hat = model(x)
    loss = criterion(y_hat,y)
    test_samples += len(x)
    test_loss += loss.item() * len(x)
    pred = torch.argmax(y_hat, axis=1)
    test_acc += (pred == y).sum().item()
test_loss = test_loss / test_samples
test_acc = test_acc / test_samples
print(f"epoch {epoch + 1}: train_loss: {train_loss}, train_acc: {train_acc}, val_loss: {val_loss}, val_acc: {val_acc}, test_loss: {test_loss}, test_acc: {test_acc}")
epoch 5: train_loss: 0.3285662808418274, train_acc: 0.878875, val_loss: 0.3343446226119995, val_acc: 0.8786666666666667
epoch 10: train_loss: 0.2763984892368317, train_acc: 0.8970625, val_loss: 0.3130743578275045, val_acc: 0.8841666666666667
epoch 15: train_loss: 0.24112992405891417, train_acc: 0.9097916666666667, val_loss: 0.31883913882573445, val_acc: 0.8863333333333333
epoch 20: train_loss: 0.22001066426436106, train_acc: 0.9165833333333333, val_loss: 0.3008507702350616, val_acc: 0.8915833333333333
epoch 25: train_loss: 0.19760130242506663, train_acc: 0.9248958333333334, val_loss: 0.31735219049453733, val_acc: 0.8919166666666667
epoch 30: train_loss: 0.1833056865533193, train_acc: 0.9304791666666666, val_loss: 0.2940208122730255, val_acc: 0.8993333333333333
epoch 35: train_loss: 0.17255648855368297, train_acc: 0.93425, val_loss: 0.30067479848861695, val_acc: 0.8950833333333333
Early stopping at epoch 40
epoch 40: train_loss: 0.15898222970962525, train_acc: 0.9389166666666666, val_loss: 0.3018113072713216, val_acc: 0.9005833333333333, test_loss: 0.3316642808914185, test_acc: 0.8966