Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter3.16字误、排版微调 #39

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions docs/chapter03_DL-basics/3.10_mlp-pytorch.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,11 @@ import d2lzh_pytorch as d2l
num_inputs, num_outputs, num_hiddens = 784, 10, 256

net = nn.Sequential(
d2l.FlattenLayer(),
nn.Linear(num_inputs, num_hiddens),
nn.ReLU(),
nn.Linear(num_hiddens, num_outputs),
)
d2l.FlattenLayer(),
nn.Linear(num_inputs, num_hiddens),
nn.ReLU(),
nn.Linear(num_hiddens, num_outputs),
)

for params in net.parameters():
init.normal_(params, mean=0, std=0.01)
Expand Down
47 changes: 36 additions & 11 deletions docs/chapter03_DL-basics/3.13_dropout.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ def dropout(X, drop_prob):
# 这种情况下把全部元素都丢弃
if keep_prob == 0:
return torch.zeros_like(X)
mask = (torch.randn(X.shape) < keep_prob).float()
mask = (torch.randn(X.shape).uniform_(0, 1) < keep_prob).float()

return mask * X / keep_prob
```
Expand All @@ -61,14 +61,39 @@ X = torch.arange(16).view(2, 8)
dropout(X, 0)
```

输出:

```
tensor([[ 0., 1., 2., 0., 4., 5., 6., 7.],
[ 8., 9., 10., 0., 12., 13., 14., 15.]])
```

输入:

``` python
dropout(X, 0.5)
```

输出:

```
tensor([[ 0., 0., 4., 6., 0., 10., 12., 0.],
[ 0., 18., 20., 0., 0., 0., 28., 30.]])
```

输入:

``` python
dropout(X, 1.0)
```

输出:

```
tensor([[0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0.]])
```

### 3.13.2.1 定义模型参数

实验中,我们依然使用3.6节(softmax回归的从零开始实现)中介绍的Fashion-MNIST数据集。我们将定义一个包含两个隐藏层的多层感知机,其中两个隐藏层的输出个数都是256。
Expand Down Expand Up @@ -124,7 +149,7 @@ def evaluate_accuracy(data_iter, net):
return acc_sum / n
```

> 注:将上诉`evaluate_accuracy`写回`d2lzh_pytorch`后要重启一下jupyter kernel才会生效。
> 注:将上述`evaluate_accuracy`写回`d2lzh_pytorch`后要重启一下jupyter kernel才会生效。

### 3.13.2.3 训练和测试模型

Expand Down Expand Up @@ -155,15 +180,15 @@ epoch 5, loss 0.0016, train acc 0.849, test acc 0.850

``` python
net = nn.Sequential(
d2l.FlattenLayer(),
nn.Linear(num_inputs, num_hiddens1),
nn.ReLU(),
nn.Dropout(drop_prob1),
nn.Linear(num_hiddens1, num_hiddens2),
nn.ReLU(),
nn.Dropout(drop_prob2),
nn.Linear(num_hiddens2, 10)
)
d2l.FlattenLayer(),
nn.Linear(num_inputs, num_hiddens1),
nn.ReLU(),
nn.Dropout(drop_prob1),
nn.Linear(num_hiddens1, num_hiddens2),
nn.ReLU(),
nn.Dropout(drop_prob2),
nn.Linear(num_hiddens2, 10)
)

for param in net.parameters():
nn.init.normal_(param, mean=0, std=0.01)
Expand Down
5 changes: 5 additions & 0 deletions docs/chapter03_DL-basics/3.15_numerical-stability-and-init.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,11 @@

在神经网络中,通常需要随机初始化模型参数。下面我们来解释这样做的原因。

<div align=center>
<img width="350" src="../../img/chapter03/3.8_mlp.svg"/>
</div>
<div align=center> 图3.3 带有隐藏层的多层感知机</div>

回顾3.8节(多层感知机)图3.3描述的多层感知机。为了方便解释,假设输出层只保留一个输出单元$o_1$(删去$o_2$和$o_3$以及指向它们的箭头),且隐藏层使用相同的激活函数。如果将每个隐藏单元的参数都初始化为相等的值,那么在正向传播时每个隐藏单元将根据相同的输入计算出相同的值,并传递至输出层。在反向传播中,每个隐藏单元的参数梯度值相等。因此,这些参数在使用基于梯度的优化算法迭代后值依然相等。之后的迭代也是如此。在这种情况下,无论隐藏单元有多少,隐藏层本质上只有1个隐藏单元在发挥作用。因此,正如在前面的实验中所做的那样,我们通常将神经网络的模型参数,特别是权重参数,进行随机初始化。


Expand Down
39 changes: 23 additions & 16 deletions docs/chapter03_DL-basics/3.16_kaggle-house-price.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,14 @@

我们可以在房价预测比赛的网页上了解比赛信息和参赛者成绩,也可以下载数据集并提交自己的预测结果。该比赛的网页地址是 https://www.kaggle.com/c/house-prices-advanced-regression-techniques 。

图3.8展示了房价预测比赛的网页信息。

<div align=center>
<img width="500" src="../../img/chapter03/3.16_house_pricing.png"/>
</div>
<div align=center> 图3.8 房价预测比赛的网页信息。比赛数据集可通过点击“Data”标签获取</div>
图3.8展示了房价预测比赛的网页信息。
<div align=center> 图3.8 房价预测比赛的网页信息。</div>

比赛数据集可通过点击“Data”标签获取。

## 3.16.2 获取和读取数据集

Expand Down Expand Up @@ -97,7 +99,7 @@ all_features.shape # (2919, 354)

可以看到这一步转换将特征数从79增加到了354。

最后,通过`values`属性得到NumPy格式的数据,并转成`NDArray`方便后面的训练。
最后,通过`values`属性得到NumPy格式的数据,并转成`Tensor`方便后面的训练。

``` python
n_train = train_data.shape[0]
Expand Down Expand Up @@ -131,10 +133,12 @@ def log_rmse(net, features, labels):
with torch.no_grad():
# 将小于1的值设成1,使得取对数时数值更稳定
clipped_preds = torch.max(net(features), torch.tensor(1.0))
rmse = torch.sqrt(2 * loss(clipped_preds.log(), labels.log()).mean())
rmse = torch.sqrt(loss(clipped_preds.log(), labels.log()).mean())
return rmse.item()
```

> `torch.nn.MSELoss()` 计算规则为 `(input - target) ** 2`,因此不需要像 `mxnet.gluon.loss.L2Loss()` 其结果再 `*2` 。

下面的训练函数跟本章中前几节的不同在于使用了Adam优化算法。相对之前使用的小批量随机梯度下降,它对学习率相对不那么敏感。我们将在之后的“优化算法”一章里详细介绍它。

``` python
Expand Down Expand Up @@ -201,17 +205,6 @@ def k_fold(k, X_train, y_train, num_epochs,
print('fold %d, train rmse %f, valid rmse %f' % (i, train_ls[-1], valid_ls[-1]))
return train_l_sum / k, valid_l_sum / k
```
输出:
```
fold 0, train rmse 0.241054, valid rmse 0.221462
fold 1, train rmse 0.229857, valid rmse 0.268489
fold 2, train rmse 0.231413, valid rmse 0.238157
fold 3, train rmse 0.237733, valid rmse 0.218747
fold 4, train rmse 0.230720, valid rmse 0.258712
5-fold validation: avg train rmse 0.234155, avg valid rmse 0.241113
```
<img width="400" src="../../img/chapter03/3.16_output2.png"/>


## 3.16.6 模型选择

Expand All @@ -223,6 +216,19 @@ train_l, valid_l = k_fold(k, train_features, train_labels, num_epochs, lr, weigh
print('%d-fold validation: avg train rmse %f, avg valid rmse %f' % (k, train_l, valid_l))
```

输出:

```
fold 0, train rmse 0.170228, valid rmse 0.156995
fold 1, train rmse 0.162570, valid rmse 0.191748
fold 2, train rmse 0.164106, valid rmse 0.168666
fold 3, train rmse 0.168130, valid rmse 0.154564
fold 4, train rmse 0.163757, valid rmse 0.183091
5-fold validation: avg train rmse 0.165758, avg valid rmse 0.171013
```

<img width="400" src="../../img/chapter03/3.16_output2.png"/>

有时候你会发现一组参数的训练误差可以达到很低,但是在$K$折交叉验证上的误差可能反而较高。这种现象很可能是由过拟合造成的。因此,当训练误差降低时,我们要观察$K$折交叉验证上的误差是否也相应降低。

## 3.16.7 预测并在Kaggle提交结果
Expand All @@ -246,7 +252,8 @@ def train_and_pred(train_features, test_features, train_labels, test_data,
设计好模型并调好超参数之后,下一步就是对测试数据集上的房屋样本做价格预测。如果我们得到与交叉验证时差不多的训练误差,那么这个结果很可能是理想的,可以在Kaggle上提交结果。

``` python
train_and_pred(train_features, test_features, train_labels, test_data, num_epochs, lr, weight_decay, batch_size)
train_and_pred(train_features, test_features, train_labels, test_data,
num_epochs, lr, weight_decay, batch_size)
```
输出:
```
Expand Down
17 changes: 11 additions & 6 deletions script/prepare_wwwdocs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,15 @@ mkdir -p ${docs}
echo '根据项目README.md自动生成目录文件 ......'
cat README.md \
| awk '/^## 目录/ {print "* [前言]()"} \
/^### / && /\.md)$/ {print "* "substr($0, 5)} \
/^### / && ! /\.md)$/ {dot=$2; gsub(/\./, "\\.", dot); print "* "dot " " $3;} \
/^\[/ {print $0} /\.\.\./ {print " * "$0}' \
/^### / && /.md/ {print "* "substr($0, 5)} \
/^### / && ! /.md/ {dot=$2; gsub(/\./, "\\.", dot); print "* "dot " " $3;} \
/^\[/ {print $0} \
/\.\.\./ {print " * "$0}' \
| sed 's/https:\/\/github.com\/ShusenTang\/Dive-into-DL-PyTorch\/blob\/master\/docs\///g' \
| sed 's/^\[/ \* \[/g' \
> ${docs}/_sidebar.md

echo '根据项目根目录下README.md以及docs/README.md合并生成项目所需${docs}导航 ......'
echo "根据项目根目录下README.md以及docs/README.md合并生成项目所需${docs}导航 ......"
sredme=`cat docs/README.md`
cat README.md | awk -v sredme="${sredme}" '!/^### / && !/^\[/ && !/更新/ {print $0} /^## 目录/ {print sredme}' | sed 's/## 目录/## 说明/g' > ${docs}/README.md

Expand Down Expand Up @@ -112,7 +113,7 @@ ln -fs ../docs/chapter* .
ln -fs ../img .
cp ../script/docsify.js .

port_used=`lsof -nP -iTCP -sTCP:LISTEN | grep 3000 | wc -l`
port_used=`lsof -nP -iTCP -sTCP:LISTEN | grep ':3000' | wc -l`
if [[ ${port_used} -gt 0 ]]; then
echo '【警告】当前3000端口已被占用,请停止进程后再运行此脚本!'
exit 1
Expand All @@ -123,5 +124,9 @@ if command -v docsify > /dev/null; then
docsify serve .
else
#echo 'docsify-cli 没有安装,建议使用:npm i docsify-cli -g'
python -m SimpleHTTPServer 3000
if command -v python3 > /dev/null; then
python3 -m http.server 3000
else
python -m SimpleHTTPServer 3000
fi
fi