mirror of
https://github.com/ZhangXinNan/DL-with-Python-and-PyTorch2.git
synced 2025-10-20 23:34:18 +08:00
1182 lines
227 KiB
Plaintext
1182 lines
227 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 2.5 使用Tensor和Autograd\n",
|
||
"在神经网络中,一个重要内容就是进行参数学习,而参数学习离不开求导,PyTorch是如何进行求导的呢? \n",
|
||
"\t现在大部分深度学习架构都有自动求导的功能,PyTorch也不列外,torch.autograd包就是用来自动求导的。autograd包为张量上所有的操作提供了自动求导功能,而torch.Tensor和torch.Function为autograd包的两个核心类,它们相互连接并生成一个有向非循环图。接下来我们先简单介绍tensor如何实现自动求导,然后介绍计算图,最后用代码实现这些功能。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 2.5.1 自动求导要点\n",
|
||
"\tautograd包为对tensor进行自动求导,为实现对tensor自动求导,需考虑如下事项。 \n",
|
||
"1)创建叶子节点(leaf node)的tensor,使用requires_grad参数指定是否记录对其的操作,以便之后利用backward()方法进行梯度求解。requires_grad参数默认值为False,如果要对其求导需设置为True,与之有依赖关系的节点自动变为True。 \n",
|
||
"2)可利用requires_grad_()方法修改tensor的requires_grad属性。可以调用.detach()或with torch.no_grad():将不再计算张量的梯度,跟踪张量的历史记录。这点在评估模型、测试模型阶段常常使用。 \n",
|
||
"3)通过运算创建的tensor(即非叶子节点),会自动被赋于grad_fn属性。该属性表示梯度函数。叶子节点的grad_fn为None。 \n",
|
||
"4)最后得到的tensor执行backward()函数,此时自动计算各变在量的梯度,并将累加结果保存grad属性中。计算完成后,非叶子节点的梯度自动释放。 \n",
|
||
"5)backward()函数接受参数,该参数应和调用backward()函数的Tensor的维度相同,或者是可broadcast的维度。如果求导的tensor为标量(即一个数字),backward中参数可省略。 \n",
|
||
"6)反向传播的中间缓存会被清空,如果需要进行多次反向传播,需要指定backward中的参数retain_graph=True。多次反向传播时,梯度是累加的。 \n",
|
||
"7)非叶子节点的梯度backward调用后即被清空。 \n",
|
||
"8)可以通过用torch.no_grad()包裹代码块来阻止autograd去跟踪那些标记为.requesgrad=True的张量的历史记录。这步在测试阶段经常使用。 \n",
|
||
"\t在整个过程中,PyTorch采用计算图的形式进行组织,该计算图为动态图,它的计算图在每次正向传播时,将重新构建。其他深度学习架构,如TensorFlow、Keras一般为静态图。接下来我们介绍计算图,用图的形式来描述就更直观了,该计算图为有向无环图(DAG)。\n"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {
|
||
"image.png": {
|
||
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAKQAAAC+CAYAAAC7+KS8AAAAAXNSR0ICQMB9xQAAAAlwSFlzAAAOxAAADsQBlSsOGwAAABl0RVh0U29mdHdhcmUATWljcm9zb2Z0IE9mZmljZX/tNXEAABVfSURBVHja7Z1/SF1nmsfTUpKlEWpS0TTRqKklMbXGEFNTY2O0okIlxgVRKiEGaVklUzKtmAyZUrtuSOhsMYRQQVrCkkD+mO2G0AyZv6az/4wMdJChf0h3oe7AtIZO0mt+mkaWu+dz8c2+PeNV773nx/ue+zzw4NXjPfec+37P8z6/n1UffPDBKuHs497e3v06v/XWW1UmXJcsTkTB1tXVdXD//v3Du3bt+s3WrVsnV69ePb9q1aq44pqampjO27Ztm9WPr1279gHve+WVV/6N8xw6dKg1CNDKAkaAAUpTU9OJioqKCQW21tbW2ffffz9++fLl+BdffBGfm5uLp0KxWCzxvrGxsTjncUCZAC1A3bFjx+/a2tr+6e23394mgBROMGBAegEQgDI0NDR//fr1uN8EUK9cuRLv6+u7X1paejs/P/9vzsNw5vjx4xsEkFnInZ2d3WylgAHpBUDCpOnp6fjJkyfnnn322fuoB2ztAsgsYBYaadTR0XGbrdREQj1gay8rK5tKV9+UxTac33333RJ0QxYaaWQDTUxMxFEjXn311XFnK88VQEaEe3p6OktKSmJB6IZ+0EcffTRfVFT0fSrSUhbeXF1xoKWl5Vaq1rFpNDMzE6+urp45fPjwAQGkpewA8WxXV9ctq5HosswdS/zGSkApADBPMh5qb2//ISpgVISkr6ysvLnc9i0gMIgxANC52OaiSBg7WOACSEsYqxRDIJJoXKAjR47caWtr+5kA0gJet27dbb+l43fffRc/f/58gr/88sv41atXE6/5GQThMcCNFTgg0RVw5hKYJ8RFdEFnvnw9iK+zK6C/J1v8jRs3brzrNyAA5KlTp+IvvPBCIlZ98eJFwoCBARIDhzX3FZDEMTs6OnpffvnlXxNNAGg4RnHmEpgnxMXN66wkgQri66wH9KuqqmKcj/MSmuJzvIqbmsRk55AQEdTW+fXXXyeACDjv3r0bD5J48HgAPQUkwf19+/b9qri4+BvimG+88cbdCxcuxP2KJnBeQlN8Dp/H5+7du/djPzJOwmB2FB5iv8GAJBwcHEwAEUJCIi3V734T1nZOTs7cmTNn/iFjQHISJBSWEsH906dPxycnJ0NRjvnc0dHRONfB9XBdyW7SBubaWSi/HeFunVH9Dge1bgiTjLZsvixSjBy97w4SCvPdJOJ6uC6ur6Gh4Vyq8VNTGP3Z1MQJr8iRxPOOKjaaNiAxLAoLC2dIMQo71Wk54vpGRkbm8OWRpiXbtlmE3eDsZjNL2QBLOmltyzLRb5w0LSSObdIyyr7I7u7uWWdd3kzZMb4Axilbs0wUsf05oPwmmUVnqi7pGBn/ZZpalCmNj4/fr62tvZpyLJvF27Jly1/DMla8pqmpqXhVVdW0TaBEIGzfvv2ra9eu/RiFNfjwww8frASMfwdIvgjAaNsWvRxxP+Xl5X+xyX+JpKyurv4DOrGtKWjo9IQK6+vrP13pff/kFwfFlxzz/1Gk0LhAly5derR79+7f2uYKwmvgqB23bFOfPvnkk0ebN2++uVTceklAEqIjKhJFMCrCQMu0CCkMxvmPgUkkx3TdkuAFONqzZ89/pLMjPX5BrS3ljX4SwXyYeKp6TdhKvfabFoycSZvDiyooYULFoSJUosHBwceVh5nkHzx+QcDb7xusq6tLdEUgOkC4Ciamyk+O+U3oYnRwsD3MqGqyCQQgNQFn0Ho/Ri+ROpoSkGfQ3Nz8Sy909EAzTRQokYrEUHmtsk+CIke63EXSRCH+jY7JvQBOQIHkpIA/3W4VyUjvYqHnEpDLQNsWL+/p8RPHzQQBCG6so6MjAUaAiMQElEGRs+XdfeaZZ/4nLy9vinYgUcocYh25J72fT05OzkOkGEAig0pnVLTPPvssTlKM+xiSl/e5+/z4nW0V6JatSKU8wUEF9dWWrTJNWLzq6uox5/cZ58u+QqjR5uSMpXyaSDGARI6pzuXl5X9wQPuQtEH3MSSv19IvJUBixdkemVmJ3rNYpglfvnP/lwEnIM2mpODc3Nxpk64ptATRMGi5WCrShC2vsLBwgoWio5hNEZ5IATIo109YlKrLh8Vy9NzTLFhJSckXtudbWglIlFW861Erw0Q3drbkG+lml6NLVVVVXVizZg0lFRdsdK5bCciFiE3Viy+++G1U4tncB608vOj+qmXMX2chkaA2l1BYAUh1oSRZ2J4CxfWT6eNHK2K+I3RM3EfonAsupFwBpA+AVAo+YSoMAdMzxRfbogcGBu5x/UEYJVjlWOds6VjrtmSrWwVIxVilBQUFMVuygIhSoAe3trYOhvGFAkaXC6lKAOkhIJW0JDWNkBFBdNP0S4wwiocIfxKlMCH6wjWwjTsP8yTbenNz8zHTokLWAlL/kgmiEzcllQtpFJZFzrbM51M7QxcMKtlM9Rli+NTU1IwiNTGITIkKWQ9InXF9II0AA5VyR48efUikx6/sZs7L+Zk2oMZT8Pm2VRfyvaktHRdSkFEh1dpGsfOQnEXvdYcNw/QceKXUV7ElEX4koA/rAX2c7XqrFDdo+V0/DvB4H+/nPCQIcE7Oj2Vrsl6Wwm6TiwtJRYUAQjIJv5DQkOvBZ24AgPqAJDejYoQpvX05KTekB/SJAOnNpNxTpfhdP75QfjvM+zmPrYX/qWyd3K8eFeKekVQAlu+IY15IU86dDIxPPfXUXNgPe2QX2VZWUSHnIb3z5JNPzrtBA3Az/Qz02MUA6cW5BZARY10qJmOOZ2LAsXWjw7rPabVRI+yPqwjp5QbLYowuyPab7mdhDOpbtSkhUAGCoezWw9Et0SPdwGR7T1fHxtrnHBik1rt9hL1hjAi322U53rlz5yfbt2//9+Li4t+vW7fuv3G887dUz7Nnz55/zc/P/yrV93nJ7odJQBEyUz5w8ODBv6tpyQZ+6aWXfnSrHQIKAwBJkVU2En5mAaQAUgApbBcg9TbPfpYoCyAFkCkBko4ifra4EUBmASABEwyQ6A5C7bsCmKqD12vh+b9kkpCGDgJIAWRGpCQbzRgAFMzf6BSiwIfvUQdkMuAJIAWQGZMCmAKnkoZqlBykA9J9TAApgPRFQrIFKwmpQMrAJAVIXvO/SFImevG72rZ1HVL9nx/GjQAyS3RIBSB9yKYuLWmBqFvRbj1St7L9tLYFkGJlG0UCSAGkAFJYACmAFEAKIIUFkALICDBtkp9//vn7VFcGzdXV1bOFhYVzYXw2nJeXN+cuYxZQhMyq5XIYDBgolwjr8xdrGS2gyGKOVOcKYQGkAFJYACksgBRACvvClLtSTquY5gDUZOt/g8PsoS4LlV0W/YblmhCE3cFCFirLWO9YsVizqbA7WMgiZSGrjhVuNqGDhSxQdm7due6tG91RjBrh0BjDRd+qTWmHLYuTxUyjKgCZSRc1AaRwWkz3XT2GTKOn2traM7Rw5jX6o348rE66slgRBB4A27t378e0x2YowULbvp9k2rgbPzG0QD/OYAHet2nTpm85z759+34FaP0Gqiyi5Yzuxzwc+rgzmQLgAbDR0dHEAIFMx7ZMTU0lznP69OkEaAEqICePExeS1/3fZVEtZG0I6BTDovr6+u4z6SKoEYCAnKRiZgQxIYPxLF5Fd2SBLZOGJPQ6EuoO2damDEdlgBWDtBioxWCtTCaWyUJbwGyLDQ0N55CGY2NjcVOHoTJykNGDmcyalAU3nNHTioqKvh8ZGZkzEoWLkJrGW1xc/E2qs3Vk0Q3mxsbGj9HTwponmSlNTk4mrHvnoRowHpDoGfi7sBDdjdCZ5IWrAUvOfQzlOcj5gGFxbW3t1eHh4TtRqC7s6uq65azzL4wBpJqFqEbM4d9i1DH+LixEt0+MWYe4GrDk3MdQnnnqOAfuB86Hz80PF0RYFvTOnTv/c3x8/H4UwKiov7//dktLy9lQAAkwAAgSTp8Wq4ZwekVsZZwPn5tyQeAKwYlrqxR1jIHhd955516UwKgIYbKce8jzKAEgXL9+/T0AgoQLWv/BFYITFymquSGskJy4dRyK+TXiOWzCCi8sLJxZatqsJ18kTlpCTIAAEJryhSo3BA8ID4op49OSMSoNu0iU6eTJk3NNTU1nfAEkEpEtEictISZTiQeEB6W0tPQ2/jxTJSahP799jPR+pEGp6rLL76qxKX0j/SZULPR+TwHJgjpW4CUkoinRgpUS/jz8eo7ld9AkMGL4oWv7ff80HlXN79XvgBGg8PcghAPz0T0DJGCsqKiYcm7oka3bBnqtYzzMvv7666dMASQPCNcU1HeANKRzLhIyCMmoExGnZAnBKSvdW7Zs+SsOzyjQsWPHZuvr6z/NJgmpE/3FAWSQ5JmExCBw9MUbGApRIrbwurq6y9miQ0JKZ1RN8IMkz3RIoic4rKNIbJUm6JRBWdls0UFv04o8sbI7OzsPtbe3/+DnhaqCozAInbKgoODWUv4x8UNmTp75IXEwB7FVhwVI6L333ptvbGz8F4nU+EeeRGp4arGKgrhgAImOwcAeXvMzqK0FdQS1xJRY9rlz5x5ECYy9vb0xT2LZQbojFCAVKadtEIQx8fTTT8+Z4gbC0Dp+/HjMdiCifqDuHThw4OeeZPuQpUNiRFCA1CdG4bwNCpDq8ykLNcVpjkRhm7PVs0HQpLKy8iY2iGf5kOQskiYWxA2ouXrKwCHEFdSWjW91IcO5auvWrVdMASY6F4YA7ilbjB12myNHjtwhrJxq2eyKdBocmbZvHcsRsW4SMHRHtSnAZA2IwWOB21JT09bW9rN07lWyUBYomQVoEjCl6vD/49cbnnvuudtR9Y9dunTp0e7du3+7XGjPJImp12UDTkARVO4pkhBJHWpdNmWNlBtEDYwsorOwMyt9qhUw6c1tQpMmpCbXAShUdr4qC8GVlWneAV4PdkfOBwB5AJCESOrQO1eQiBAFV4T+pFdUVNxIp18NQKB7mCnA1B8YVTiHXxVDDQMRILknafX39//Y09OTGPHmPobk433EnVHZOB8A9LttX1quCKrIbAcjCcUOoKYzzSI3FZiLXad7ipYjYP55zZo1Ma7bfSysJOa03kSdLVuDbcm5EHowAf7S0tK/ePm02wJM9zVHZiwIWwNKdXd396wthezoVLhOyDbxK5HCJmBGck6N86W/SaaMSW4It0TEx0i5BTpVUK2LdWAS7Qo7kyhrAOl2Q1BIhUsgbKmJdYm1qSoOw6rTZtEdI2HURGBmxSQvjIQF523CBTE0NDQfRGKv3rOQrhhYl1ibplQY4lYyDZhZN1oOPdPR106wVRJ+xJ0AYPBppdvFQu9WwXk4J1LZ1dV1g0lfsqnAzOpZh3zxamg4Pi29z49iBVqdAZv+P3o/H87DOU1vAGAqMGX45gpBq7ONYEsXmOwmQaoYAkjhJYFZV1d3GoAg+d3A5Hd04kwkKe9favCm4jAbdQkYzANmLoDUgFnMTyIqgKWsrOx6JqBkhNxSYOShEAkpvBgwix36/RNPPPG/btBkAkq2aUbJLQbGvLy8qbCtf1l8QyWkkojJOBNQLrZ1A9KwpncJIA3XJfFEVFdXjyGx/AIl79XPxUMgRo3wigGKJU5ysFeg5LxKCnNeUyJIsugWbulkrOsATReUhHvZqk1yrckih8xMJ1iJKyaKTCDE7WISUITMhDuJwWcjkSHmTtETUAggBZDCAkgBpAByUVL9k5J10l3uuABSAOkp0UtJtbBJ57gAUgC5IqmnRn+o1s36hAX6JfE3jvF/9BrXAbfccQGkADIlAlAkMV+8eDEBpLq6up+MAaHvJseYwsD/pXJcACmATImQhLp0hAGYAhQgA2A66RJwueMCSAFkytIRAOrbNWDSQalAqv4HAOvdipc7LoAUQKYsJZF0sCJe63okW7P6m/56pccFkALIyJEAUgApgBQWQAogBZACSOH0AXnixImEvy8M/vzzz0P77Ndee+2BANIwZmQGjQ/C4LKysq9Wr179MKzPd/hP7uRgAUUWszQKEBZACiCFBZDCAkgBpLAAUlgAKYAUFkAKCyAFkMICSGEBpABSOChmYitdzhTX1NScpeGU/jc4zLZ8slBZJhGTNSs1pWmpLFSW8XJ9xsPsLy6AzFJO1mfchKalskBZunW7W0ab0rRUFihLmcRY0/qLCyCznFWfcVP6iwsgs4zpK65PSevq6vrHTZs2/dE9PU2sbGFP9UPlb6Reh1KBtWvXPkASMiXXPUtyMabVcrK5kn6PvpNFjAADQEDjSLtvN27ceNcBT4xJuVQzUkwVi8XSqgpcbPJuTk7OQ+aiLzjQ9wgghR+DcNeuXb9Z0AFjgGZqaiqQ8tWJiYkEQKuqqmJIX+ajezXJQRbXMh2wubn5l/n5+X8DhJcvXw69thrpOzY2FmeMNJIT6z0THVQW2hJubW0d3Lx5883BwcG56elpIwv/kZwU/xcWFs4gwQWQEWT0tOLi4m8GBgbupasLBk08MEjwioqKCYwsAWRE+MCBAz9HT5ucnLSyVcr169fjJSUlscOHDx8QQFrOLS0tZ3t7e+0QiUvQ3NxcvKmp6UZPT0+nANJSrq+v/9QxYKwHow5K5wG71dnZOSCAtIwdK/XN7u7u2aiAUadt27bNLhczFxAY5tYpKyubwSEdRUIXxkATQFrCtbW1l86fP/8okmhcILwFuLAEkBYwUY8gXDv6yBDFXk9YSEaEIomPCyANZ/x1xKGDAAVzaNRQpMUmfflJPHA8eAJIw9mxQLsdcNwOAhRIQyZ1wQARBpxBEQ9eMoe5gMEQxvrECg0CjCRkICXVds3vfkx7XYxwAeXk5Mwli3cLGAxi8hBZMD9JH6qkc1Bb9nKWtgDBICb2S7gtynTq1Kn5/fv3jwogLdm2Kysrb/otJcMi/KtFRUXfL5V1LkAwjBsaGs6NjIxEEpHt7e0/MHVCHOMWMcp+eXn5n69du/ZjlMA4PDx8p7a29upy9y8gMBSUjj75x/Hx8ftRAGN/f//txsbGj1dy7wIAgxmJYlNi7mI6I4m6bW1tv1jpPcvCG86qdMGE+plUCD24oKDgVqqlDLLoFjBZQFQYIm1MdgvhHaD0loIvjLN0ir1kwS1ipA2+SkJv+PNMSVOj/Lavr+/++vXr79GcIJOSWFloC5k4MM5lOktQvE+NNBV/QUpCJPXRo0cfEu6kQQF9J73oaiELbDnT3oQuEtREE3p0dM5ZAHrlyhVPEiYAH+dhK1bdK/gcJHVzc/Mxr7umyaJGzF3U1dV1EIDu2LHjd+Qdunv6qDYri/HQ0NC8u8cPzHnYilV/Hz/vQRYyS4wi1dnM3fhe56amphNhd0GTBRM2iuVLEDaK/w/PfCJz9/uFVwAAAABJRU5ErkJggg=="
|
||
}
|
||
},
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 2.5.2计算图\n",
|
||
"\t计算图是一种有向无环图像,用图形方式表示算子与变量之间的关系,直观高效。如图2-9所示,圆形表示变量,矩形表示算子。如表达式z=wx+b可写成两个表示式:如果y=wx,则z=y+b。其中x、w、b为变量,是用户创建的变量,不依赖于其他变量,故又称为叶子节点。为计算各叶子节点的梯度,需要把对应的张量参数requires_grad属性设置为True,这样就可自动跟踪其历史记录。y、z是计算得到的变量,非叶子节点,z为根节点。mul和add是算子(或操作或函数)。这些变量及算子就构成一个完整的计算过程(或正向传播过程)。\n",
|
||
"\n",
|
||
"<center>图2-9正向传播计算图</center>\n"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {
|
||
"image.png": {
|
||
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAG4AAAC6CAYAAABP9HjCAAAAAXNSR0ICQMB9xQAAAAlwSFlzAAAOxAAADsQBlSsOGwAAABl0RVh0U29mdHdhcmUATWljcm9zb2Z0IE9mZmljZX/tNXEAABEKSURBVHja7Z1/aJXXGccVpA7NH2ksTaymSYq/Gn9StZq0DdH5ixLa9Y+iFCSK1M6QtcMGnTixxRaFraQMaUYQpFTqH4U5wZH9MeZfWzYmONgfwf6x0D/WFrWLabRxC+Pu/YQ87vjuJrnvuee85968zwNP7nvfe+/7vjnf85zznOc8P2a9++67s5Sn54MHD27et29fq/DRo0crQz6PgpKH33zzzRU7duz48dq1a38/f/7872bNmpVbt27d0KZNmx5wRUXFfc4vWbJkoKWl5Wd79+7dpcAFYhp/1apV/Q0NDcOdnZ33L126lBsaGspNRf39/bnTp0/nWltbhx5//PFbu3bt6kpDGhWwCQlDcmj8vr6+nC0NDg7mOjo67lZXVw+98sorrytwHjmSkHeWL1/+DZLjipDSPXv23EF6fUlfpkFrbm6+fPjw4bujo6M5H4T0RuAN+AAv06D19vbey3mm69evewEvk6Bt2bLlF6dOnRrNpUSAt3Tp0s/PnDnzPQXOfj22bsWKFXdyKdMHH3ww9sILL/QqcBZMj6fnIwEhiA5Dx1HgEvLu3bt/EGmRd3KBiHUhi3oFLiHTaDReUrp27VouWpcVDRza68KFC4cjRaVGgSuQ0eqqqqqsVf+zZ886kboDBw7ca2tr+2HJACdGWF5nilIyMjKSe/7553PYJN9///3xcxwDorwmpe7u7txzzz33UerA0XOZK7h5XV3d3/kHTCMsr3LONMC6VIXTmt8uX748zgDY1dU1fu7GjRvjIPIacp5LAlhNa2trN3Y4GoCeM512ZhpgH3300W/5vYvx3YYZnhimkjTyJ598krt69eo4cCJxX3755ficx2c2xPWWL19+3TtwSAoLVibV6OHHprOWT0b8jt9zHa6XtgQi9XQgm6ESRjkBMEYSebUZKi9evJhbv379b7wCh9W8sbHxb1gZXNnzuA7X47pcP80dALZrcoGJEYjpwxtw7e3tL7m2mseH0VWrVn3takFayMjBcG07YriiSHKHX3311T1egAO0bdu2fe3Lam4Onxs2bPgqLfCampo+7unpCQYa/y+dx8U0kVdtXrNmzW3foJn/DJKXxrDJUgWtNxRwLu2VJWHL82E9n4w3btz42wsXLvw7hLQ9+eSTt11p1Q+9YTeYjcUQvZH7cv8ULCg10fryq4hS/f/YEXfpzlAyk7fL8X86TmsOF2LDlo1bLxupaDpoPDYPxprGBbnSuEoJPB+gPQScreUccmWAdbntUQhHnWQvazsfSx5xGPIB2kPA4ROIe1kSwn6HtMmWhxwLJwWU+/McaVpUxDWPOdbVNEEHRBHx6aL3v4OEwx02OzG8HjhwYPwVgyznMRHZSiHPEcKWiWLEHIs9M2kHhhhyWSM+8cQTI4wavm2y43/efvvtem6Y5EHFSi6SJ2BGav1D2yBJqaam5u6LL754MIRvPooRxmiknvZ47bXXRs6fPz9uGDaXSADLOdzvjhw5Msba8JFHHhljgU9bpvGs1hJnDpW8ioQhfUiezbYHvZYGiJSGn3DdxYsX93McYo8PAKKhbt+zzz77GdZ8cwsLYDmHw2uo53twsGjRon8MDAzkQhL35zl4nqhRLkpDwXPnzh3iHI2ZVq/OZ3nZtGlTdylsDD84YKuBLYeQZG55MGwhcSZ48Jw5c0ZD7bJHQ+J5OlDoTeGHgAvtAQVxf57DtHJUVlYOCmizZ8/+TzQHvR6ioZhz6TQTmvO+kgEOjrSq4bRNQULcl/vnM3pLg7W0tJx67LHHBiKt9XTavZ54OelAjAQlBRzrDhaNIYCbypaHFKIEyHuOkcQ0gwmrq6uvm0N2WltRBW/roCkVEyNmQxNRLf0Jhy6MxX2w7zUTc2p8rg2tpORtkKeffvoLm0WoDXEf7mfb+Egdw6cpkT6UkjhwoZWUSdcw0cMO+gaP63OfYtV7FAfmPYYznxonSgkgltRyIB94tbW1N30Nm1wXSXO5JmPeQXFgGPNheSkL4GTYZO5BcXAlfVyH7ZuJMFsvcxMaIMOn6y2isgHO1DYx89Dgtm4N/I7fc5009twmOt1FlBdXUl12wAnT4Njs4gbY+NpPjLBYywFrwYIF9/hdWpukkykvxSoTZQvcZAZYFs6mxiVGWKzlgBXK7dy0+ovyQmBKZoErV2bTtL6+/iqNb6O8KHCBGQCwvCS1OSpwJcBIHCCwfCjUGVeBKyFmwc7cV4jhWoErgjE4t7a2vuOSW1pa3mtoaPjdvHnzbj3zzDO/nOx7kcL1a0B2ff9C2dSMyw44tNaTJ0964bfeeivX3t4+6efR8Drl5745GtZHRSsuS+CySoRqK3AKXHkBh9ugretg/Dq2Afy218w0cKY/qDSWeF5jpiuUcEEkFtwlTXfNGQkcAODbSTYEcYmXV0l7IWR6WXOe39Bg4tgr15D3EKByTkDn+7BIMNeRTsCx3JtrcCxZGuRznkGeg84k184kcDQuDQWbQOBhbYJlHvM9yawgEkdDmh1Asi8IWCZwfI/vcw+egXsKMPI9juVZeI/TMPeSYVHey3NnCjhpKOnJ0mDS8FMBJxJnSiqMO73Z+HItWH4H896ULp4RICWOgnMSXxG/vzkymB0jM8CJ6zuNzbE0Kg2HJAgIAog0lpnHhFc5J0OXDI1xAESykRq5tgyX8h0+43e8F0mU++ebZ+V7mQJOhit6rBxLL46/5hvyzJ7Ob7kGkiTXke+KdPG5fEd+K9Io3zGfC5bvyjOaJNdWrXIGkgKnwClwCpwCN/OBY82TRW5sbBwpW+DEQSkU19XV3Qh172XLlv1FXA0zvwOedBM3asBLugNeZgxoxOqFdjdU4BIwYInfqM/IIAXOMePzIcDh2qfAlQlXVFR8ZXprp11aU4GzVErigY2hlRQFpgBmTsN1neARJI9jooBKLiJVOT+rQ6wCp8ApcMoKnAKnwAVjrPATgR7/F3ELk54xZNStgmQwjU9Bh/nz53/H3heBFvli3CFya8bj3MkkUUyosgKXHLDK5ubmCzQ+1ThsczOTuwXASVTgOzVj5kHDdEUinrNnzzqr/kFqENL9+iy3lmnQ2trajlFTzleqR8qtbdiw4Y8+wMssaJFC0bF79+5vfPuJXLly5V8+wMsqaHtffvnlf6bl5AN4keLy55IFjkkerUpUaZi8WnKuFHIZ47MR0VBadXWEXBd9crLekQrFFRUV99GqRJWGOzs7H5xj7SOVikNlWGXtlSQOzhXRUaiA6apOXlFWBMACkEIqFAtJpWJqctOIZvLsNBbVPG8ov0hyn+GlFgQ4qUVDErZiCwTS88l87jMFosk0Go2XlOJRrMVIXTQqjXoptTkVb9u27YyPgrcsXCkihNLgc5FdVVV112ZuM+uAF0t0eBfp7wv+4tatWz/q6Oi462tSx1qBpoea7gM45lSG56TPJfFuBCNKyK9IoA2YTCvoBKkAt3Pnzg+jHpvK3MDaygd4NgUxGMolRBnwJL6NWDcJIU5KrmrkFWJd+NH+/fu/TXMSx5rh2ouK4YlhKslzSEiyHAtxzlYz5XcoZV6BC7XmwQRFAVqXCouNxJkx3DJEQkibGalachIXoniEECWfKf0ceo6TMGAJGy42wY33OQ61n/qhNg8Xj7u2IaR84cKFw66krhit0iQB0Ja8a5XRPNMd9awxm4eTLATyakuUvaSCYuh1nCtKZR1nUwhQtC4z54iZlSAp2dTcybzlBLti0mFFJnKRNMmoY1vs1qzgqLbKAoBjXmEbP+nYL+DI5C0SZ55L+s/SgXR3oEDgbBQTNC8WqZKrSvJnSfzyVBlzpiKJR3O5NGDn+9ChQ8NpgeZjM3XyDyyyG5DySNIxSWViOWdjZZCC7tE/3cPzUHLFlRNOWtag1HfAabC0ashNRmJlQNrM+DSiZtA2i20MwJtxPiehVWfoxIkTY1u3bn1PTFb5iu8VWzlRvLzOnTvnzMsLpSqYl1do1RlinjW1sHh5aaTQxdDJ4nzz5s2/otgTndVWcWG7C7MamnBQv8qQRd1pBDZs42YrqVA8b968m42NjZ+57NFonIw0WFgAYLqdfYZYapdjKKCT8bxp7eiX5IKVHr9mzZrb+fxSGBpRUgCM4ZPoUNe750ggAIgvjUg4Ehk91x15T0wBBeeZb12tz5xt6zCEuBz/C6Fjx46NbN++/ef5nicuYQxJDKFpOB+1t7d/P5q7Pg7h5JQYOHpf1MMHivUvKZR6e3vvNTc3X046xAGe72EKaWdeLQU3w4KHDsBz7WviAjRTEsmEgC+nj4bi+rIkSdMzrWifE8Bj8sV049pchL8JBXRtQYvtarwjc6DLhiIGTuY2Mi6UDXDC2NswlqJNuVBCUL/x8KKArstGZuh0WaoaJchciri8dirAiS0TbQoti3iypBYWlhjHjx8fZaMU9duHT6XUBHexnuL/jS/+qTdXdsCZSgERnJjHpFIxbudIo5kgE6nivERusj7ER9O3EyzXR1KK3XFGKYkDF1pJcbpdQgMxxyCNZoJMpIrzIWKlaVwC7os1jYnZTYP3U2aihlAqbKoRK3CBGYNyMUqLAheQUTRsC7krcIGZ4RKlJakHmQJXIszOOlyodqjAlRAjdYUqLQpciTHzHfPedDsMClyRaykfxdGjdd6HJMlevXr1BS3g7qOneSzgToF2PNKm+vyNN97QAu62wGWVtJqVAlcewMXLSUNSI9WWtIB7CsDFy0lD4hKvBdwDAyexCwBgFpuVjAkCnFQGFuC0gHtg4ACJxjUbRBqURpPIIQFWSkdrAffAwPGPS3yeNJo5j8WLupsNqwXcAwInEiLDFb1ZwKRBzQLuwkiGFnAvAeUkXxH0fEXWzaLqWsBd13HeSYFT4BQ4BU6Bm/nAaQF3LeCetID6X0MWcF+5cuUfxC9Vd8ATsBZwL1PGN0ULuJcZ40SrBdzLkInO0QLuZcZmNKoWcC8zpUQLuJch4xqnBdzLmNUhVoFT4BQ4ZQVOgVPgFDgFToFT4BQ4BU6BU+CUhSVTrESEPvXUU3+qrKy8Ke9DZIZV4CZh9t0oCCXlscnNLBGh5CT79NNPH0SISi5mSfGb5o6BgjXBBO7jz0IyOap4JcmIK0m1qWEA4C7KjClw0zAWfuoDUBTQRZ0FACeLIElZfQ6jmR8Wo6HuCypyuPaBJA0yCVl9VVnONGiRhjjos64CGXB9VVnONGhp1Q7yAV4mgWP+8Z3RPS55kxXAUOAKZJKBk8k9bb9/lJalS5d+7srdIWsL6hoSeKddrVGoo6ODmj1dClxC3rFjx0+7uroSo0Y0aTxPig25rPmaKeBsq3MBnG2p0DhRV85FSv1MaZJYRZI2NOFNkm4DMnOUmPHhhRKmMhdlZDIDnE1JNTNxjJltAdDIpmADnKta4JkBDvshpqgkjSy5UeRYhk0kUDIp2EgwsW4KnEfgJAuQSJwknJEMQja5vBS4hIxCgGJg09Awc5skcytmjuvp6ck1NTV9rMAlWMNR1ycXmI4cOTLmIr4uU8sB9srSqjw5GbF95ML0lTVzVxfWi1Cg5au0rMAVNlxWVldXD1EhMgShHLnaHc+ckZnKkJT1TBs0V9pkZoGDacAkaXyLJYza7Ia7dGXIJHBplsgGtEiL/Lq9vf0ll/9DJoEzwevr6/MGGt5fLS0tt1yDlmngDPD6Dx065HyPjg5RW1t706ZOnQJXILe1tR2rr68fwqpRLIDMnTjR0iGKKeupwBXIbPtgimKHvLOz834S5YU9PkDHpIbig9u67+dV0PKYxiiECwDkM2ErCFfzeIEiTFd8JuWxAd3FBqkC54iZowjuyFcSjM9CJWRTcMqUtRHKlP8Lu/ilvgE6pU4AAAAASUVORK5CYII="
|
||
}
|
||
},
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"我们的目标是更新各叶子节点的梯度,根据复合函数导数的链式法则,不难算出各叶子节点的梯度。\n",
|
||
"$$\\frac{\\partial z}{\\partial x}=\\frac{\\partial z}{\\partial y}\\frac{\\partial y}{\\partial x}=w\\tag{2,1}$$\n",
|
||
"$$\\frac{\\partial z}{\\partial w}=\\frac{\\partial z}{\\partial y}\\frac{\\partial y}{\\partial w}=x\\tag{2,2}$$\n",
|
||
"$$\\frac{\\partial z}{\\partial b}=1 \\tag{2,3}$$\n",
|
||
"PyTorch调用backward(),将自动计算各节点的梯度,这是一个反向传播过程,这个过程可用图2-9表示。在反向传播过程中,autograd沿着图2-10,从当前根节点z反向溯源,利用导数链式法则,计算所有叶子节点的梯度,并梯度值将累加到grad属性中。对非叶子节点的计算操作(或function)记录在grad_fn属性中,叶子节点的grad_fn值为None。\n",
|
||
"\n",
|
||
"<center>图2-10 梯度反向传播计算图</center> \n",
|
||
"\t下面我们用代码实现这个计算图。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"source": [
|
||
"### 2.5.3 标量反向传播\n",
|
||
"PyTorch使用torch.autograd.backward来实现反向传播,backward函数的具体格式如下: \n",
|
||
"<font color=blue>\n",
|
||
"torch.autograd.backward( \n",
|
||
"\t\ttensors, \n",
|
||
"\t\tgrad_tensors=None, \n",
|
||
"\t\tretain_graph=None, \n",
|
||
"\t\tcreate_graph=False, \n",
|
||
"\t\tgrad_variables=None)\n",
|
||
"</font> \n",
|
||
"参数说明如下。 \n",
|
||
"- tensor: 用于计算梯度的tensor。\n",
|
||
"- grad_tensors: 在计算非标量的梯度时会用到。其形状一般需要和前面的tensor保持一致。\n",
|
||
"- retain_graph: 通常在调用一次backward后,pytorch会自动把计算图销毁,如果要想对某个变量重复调用backward,则需要将该参数设置为True\n",
|
||
"- create_graph: 当设置为True的时候可以用来计算更高阶的梯度\n",
|
||
"- grad_variables:这个参数后面版本中应该会丢弃,直接使用grad_tensors就好了。 \n",
|
||
"假设x、w、b都是标量,z=wx+b,对标量z调用backward(),我们无须对backward()传入参数。以下是实现自动求导的主要步骤。 \n",
|
||
"1)定义叶子节点及算子节点。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"x,w,b的require_grad属性分别为:False,True,True\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"import torch\n",
|
||
"import numpy as np\n",
|
||
"\n",
|
||
"#定义输入张量x\n",
|
||
"x=torch.Tensor([2])\n",
|
||
"#初始化权重参数W,偏移量b、并设置require_grad为True,为自动求导\n",
|
||
"w=torch.randn(1,requires_grad=True)\n",
|
||
"b=torch.randn(1,requires_grad=True)\n",
|
||
"y=torch.mul(w,x) #等价于w*x\n",
|
||
"z=torch.add(y,b) #等价于y+b\n",
|
||
"#查看x,w,b页子节点的requite_grad属性\n",
|
||
"print(\"x,w,b的require_grad属性分别为:{},{},{}\".format(x.requires_grad,w.requires_grad,b.requires_grad))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"2)查看叶子节点、非叶子节点的其他属性。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"y,z的requires_grad属性分别为:True,True\n",
|
||
"x,w,b,y,z的是否为叶子节点:True,True,True,False,False\n",
|
||
"x,w,b的grad_fn属性:None,None,None\n",
|
||
"y,z的是否为叶子节点:<MulBackward0 object at 0x000001CB4CC20DC8>,<AddBackward0 object at 0x000001CB4CC39108>\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"#查看非叶子节点的requres_grad属性,\n",
|
||
"print(\"y,z的requires_grad属性分别为:{},{}\".format(y.requires_grad,z.requires_grad))\n",
|
||
"#因与w,b有依赖关系,故y,z的requires_grad属性也是:True,True\n",
|
||
"#查看各节点是否为叶子节点\n",
|
||
"print(\"x,w,b,y,z的是否为叶子节点:{},{},{},{},{}\".format(x.is_leaf,w.is_leaf,b.is_leaf,y.is_leaf,z.is_leaf))\n",
|
||
"#x,w,b,y,z的是否为叶子节点:True,True,True,False,False\n",
|
||
"#查看叶子节点的grad_fn属性\n",
|
||
"print(\"x,w,b的grad_fn属性:{},{},{}\".format(x.grad_fn,w.grad_fn,b.grad_fn))\n",
|
||
"#因x,w,b为用户创建的,为通过其他张量计算得到,故x,w,b的grad_fn属性:None,None,None\n",
|
||
"#查看非叶子节点的grad_fn属性\n",
|
||
"print(\"y,z的是否为叶子节点:{},{}\".format(y.grad_fn,z.grad_fn))\n",
|
||
"#y,z的是否为叶子节点:<MulBackward0 object at 0x7f923e85dda0>,<AddBackward0 object at 0x7f923e85d9b0>"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"3)自动求导,实现梯度方向传播,即梯度的反向传播。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"参数w,b的梯度分别为:tensor([2.]),tensor([1.]),None\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"C:\\Users\\wumgapp\\Anaconda3\\lib\\site-packages\\ipykernel_launcher.py:10: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.\n",
|
||
" # Remove the CWD from sys.path while we load stuff.\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"非叶子节点y,z的梯度分别为:None,None\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"#基于z张量进行梯度反向传播,执行backward之后计算图会自动清空,\n",
|
||
"#如果需要多次使用backward,需要修改参数retain_graph为True,此时梯度是累加的\n",
|
||
"#z.backward(retain_graph=True)\n",
|
||
"z.backward()\n",
|
||
"#查看叶子节点的梯度,x是叶子节点但它无需求导,故其梯度为None\n",
|
||
"print(\"参数w,b的梯度分别为:{},{},{}\".format(w.grad,b.grad,x.grad))\n",
|
||
"#参数w,b的梯度分别为:tensor([2.]),tensor([1.]),None\n",
|
||
"\n",
|
||
"#非叶子节点的梯度,执行backward之后,会自动清空\n",
|
||
"print(\"非叶子节点y,z的梯度分别为:{},{}\".format(y.grad,z.grad))\n",
|
||
"#非叶子节点y,z的梯度分别为:None,None"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 2.5.4 非标量反向传播\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"source": [
|
||
"backward(gradient=None, retain_graph=None, create_graph=False)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"1、非标量简单示例 \n",
|
||
"\t我们先看目标张量为非标量的简单实例。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"ename": "RuntimeError",
|
||
"evalue": "grad can be implicitly created only for scalar outputs",
|
||
"output_type": "error",
|
||
"traceback": [
|
||
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
|
||
"\u001b[1;31mRuntimeError\u001b[0m Traceback (most recent call last)",
|
||
"\u001b[1;32m<ipython-input-2-bcdd95b78431>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m 3\u001b[0m \u001b[0mX\u001b[0m\u001b[1;33m=\u001b[0m \u001b[0mtorch\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mones\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;36m2\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mrequires_grad\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mTrue\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 4\u001b[0m \u001b[0mY\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mX\u001b[0m\u001b[1;33m**\u001b[0m\u001b[1;36m2\u001b[0m\u001b[1;33m+\u001b[0m\u001b[1;36m3\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 5\u001b[1;33m \u001b[0mY\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mbackward\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
|
||
"\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\torch\\tensor.py\u001b[0m in \u001b[0;36mbackward\u001b[1;34m(self, gradient, retain_graph, create_graph)\u001b[0m\n\u001b[0;32m 219\u001b[0m \u001b[0mretain_graph\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mretain_graph\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 220\u001b[0m create_graph=create_graph)\n\u001b[1;32m--> 221\u001b[1;33m \u001b[0mtorch\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mautograd\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mbackward\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mgradient\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mretain_graph\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mcreate_graph\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 222\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 223\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0mregister_hook\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mhook\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
|
||
"\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\torch\\autograd\\__init__.py\u001b[0m in \u001b[0;36mbackward\u001b[1;34m(tensors, grad_tensors, retain_graph, create_graph, grad_variables)\u001b[0m\n\u001b[0;32m 124\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 125\u001b[0m \u001b[0mgrad_tensors_\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0m_tensor_or_tensors_to_tuple\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mgrad_tensors\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mlen\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mtensors\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 126\u001b[1;33m \u001b[0mgrad_tensors_\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0m_make_grads\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mtensors\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mgrad_tensors_\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 127\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mretain_graph\u001b[0m \u001b[1;32mis\u001b[0m \u001b[1;32mNone\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 128\u001b[0m \u001b[0mretain_graph\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mcreate_graph\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
|
||
"\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\torch\\autograd\\__init__.py\u001b[0m in \u001b[0;36m_make_grads\u001b[1;34m(outputs, grads)\u001b[0m\n\u001b[0;32m 48\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mout\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mrequires_grad\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 49\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mout\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mnumel\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;33m!=\u001b[0m \u001b[1;36m1\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 50\u001b[1;33m \u001b[1;32mraise\u001b[0m \u001b[0mRuntimeError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"grad can be implicitly created only for scalar outputs\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 51\u001b[0m \u001b[0mnew_grads\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mtorch\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mones_like\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mout\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmemory_format\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mtorch\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mpreserve_format\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 52\u001b[0m \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
|
||
"\u001b[1;31mRuntimeError\u001b[0m: grad can be implicitly created only for scalar outputs"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"import torch\n",
|
||
"\n",
|
||
"X= torch.ones(2,requires_grad=True)\n",
|
||
"Y = X**2+3\n",
|
||
"Y.backward()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"如何避免类似错误呢?我们手工计算Y的导数。已知:\n",
|
||
"$$X=[x_1,x_2]$$\n",
|
||
"$$Y=[x_1^2+3,x_2^2+3]$$\n",
|
||
"如何求$\\frac{\\partial Y}{\\partial X}$呢?\n",
|
||
"\tY为一个向量,如果我们想办法把这个向量转变成一个标量不就好了?比如我们可以对Y求和,然后用求和得到的标量在对X求导,这样不会对结果有影响,例如:\n",
|
||
"$$Y_{sum}=\\sum y_i =x_1^2+x_2^2+6$$\n",
|
||
"$$\\frac{\\partial Y_{sum}}{\\partial x_1}=2x_1,\\frac{\\partial Y_{sum}}{\\partial x_2}=2x_2$$\n",
|
||
"这个过程可写成如下代码。\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"tensor([2., 2.])\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"x = torch.ones(2,requires_grad=True)\n",
|
||
"y = x**2+3\n",
|
||
"y.sum().backward()\n",
|
||
"print(x.grad) #tensor([2., 2.])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"可以看到对y求和后再计算梯度没有报错,结果也与预期一样。\n",
|
||
"实际上,对Y求和就是等价于Y点积一个的全为1的向量或矩阵。即,而这个向量矩阵V也就是我们需要传入的grad_tensors参数。(点积只是相对于一维向量而言的,对于矩阵或更高为的张量,可以看做是对每一个维度做点积。) \n",
|
||
"2.非标量复杂实例 \n",
|
||
"(1)定义叶子叶子节点及计算节点"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import torch\n",
|
||
"\n",
|
||
"#定义叶子节点张量x,形状为1x2\n",
|
||
"x= torch.tensor([[2, 3]], dtype=torch.float, requires_grad=True)\n",
|
||
"#初始化Jacobian矩阵\n",
|
||
"J= torch.zeros(2 ,2)\n",
|
||
"#初始化目标张量,形状为1x2\n",
|
||
"y = torch.zeros(1, 2)\n",
|
||
"#定义y与x之间的映射关系:\n",
|
||
"#y1=x1**2+3*x2,y2=x2**2+2*x1\n",
|
||
"y[0, 0] = x[0, 0] ** 2 + 3 * x[0 ,1]\n",
|
||
"y[0, 1] = x[0, 1] ** 2 + 2 * x[0, 0]\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"(2)手工计算y对x的梯度 \n",
|
||
"(这里省略手工计算过程,详细内容请参考书中对应章节)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"(3)调用backward获取y对x的梯度 \n",
|
||
"\t这里我们可以分成两步的计算。首先让v=(1,0)得到y_1对x的梯度,然后使v=(0,1),得到y_2对x的梯度。这里因需要重复使用backward(),需要使参数retain_graph=True,具体代码如下:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"tensor([[4., 3.],\n",
|
||
" [2., 6.]])\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"#生成y1对x的梯度\n",
|
||
"y.backward(torch.Tensor([[1, 0]]),retain_graph=True)\n",
|
||
"J[0]=x.grad\n",
|
||
"#梯度是累加的,故需要对x的梯度清零\n",
|
||
"x.grad = torch.zeros_like(x.grad)\n",
|
||
"#生成y2对x的梯度\n",
|
||
"y.backward(torch.Tensor([[0, 1]]))\n",
|
||
"J[1]=x.grad\n",
|
||
"#显示jacobian矩阵的值\n",
|
||
"print(J)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"这个结果与手工运行的式(2.5)结果一致。 \n",
|
||
"(4)如果V值不对,将导致错误结果。 \n",
|
||
"如果取v=[1,1]将导致错误结果,代码示例如下:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"ename": "RuntimeError",
|
||
"evalue": "Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.",
|
||
"output_type": "error",
|
||
"traceback": [
|
||
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
|
||
"\u001b[1;31mRuntimeError\u001b[0m Traceback (most recent call last)",
|
||
"\u001b[1;32m<ipython-input-6-1087ea950c0d>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0my\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mbackward\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mtorch\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mTensor\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m1\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mx\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mgrad\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 3\u001b[0m \u001b[1;31m#结果为tensor([[6., 9.]])\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
|
||
"\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\torch\\tensor.py\u001b[0m in \u001b[0;36mbackward\u001b[1;34m(self, gradient, retain_graph, create_graph)\u001b[0m\n\u001b[0;32m 219\u001b[0m \u001b[0mretain_graph\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mretain_graph\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 220\u001b[0m create_graph=create_graph)\n\u001b[1;32m--> 221\u001b[1;33m \u001b[0mtorch\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mautograd\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mbackward\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mgradient\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mretain_graph\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mcreate_graph\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 222\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 223\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0mregister_hook\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mhook\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
|
||
"\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\torch\\autograd\\__init__.py\u001b[0m in \u001b[0;36mbackward\u001b[1;34m(tensors, grad_tensors, retain_graph, create_graph, grad_variables)\u001b[0m\n\u001b[0;32m 130\u001b[0m Variable._execution_engine.run_backward(\n\u001b[0;32m 131\u001b[0m \u001b[0mtensors\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mgrad_tensors_\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mretain_graph\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mcreate_graph\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 132\u001b[1;33m allow_unreachable=True) # allow_unreachable flag\n\u001b[0m\u001b[0;32m 133\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 134\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
|
||
"\u001b[1;31mRuntimeError\u001b[0m: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time."
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"y.backward(torch.Tensor([[1, 1]]))\n",
|
||
"print(x.grad)\n",
|
||
"#结果为tensor([[6., 9.]])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"这个结果与我们手工运算的不符,显然这个结果是错误的,错在哪里呢?这个结果的计算过程是:\n",
|
||
"$$J^T∙v^T=\\left(\\begin{matrix}{4}&2 \\\\ {3}&6\\end{matrix}\\right)\\left(\\begin{matrix}1 \\\\ 1\\end{matrix}\\right)=\\left(\\begin{matrix}6 \\\\ 9\\end{matrix}\\right)\\tag{2.7}$$\n",
|
||
"\t由此,错在v的取值错误,通过这种方式得的到并不是y对x的梯度。\n"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {
|
||
"image.png": {
|
||
"image/png": ""
|
||
}
|
||
},
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"3.小结 \n",
|
||
"1)PyTorch不允许张量对张量求导,只允许标量对张量求导,求导结果是和自变量同型的张量。 \n",
|
||
"2)为避免直接对张量求导,可以利用torch.autograd.backward()函数中的参数grad_tensors, 把它转换标量来求导。 y.backward(v) 的含义是:先计算 loss = torch.sum(y * v),然后求 loss 对(能够影响到 y 的)所有变量 x 的导数。这里,y和 v是同型 Tensor。也就是说,可以理解成先按照 v对y的各个分量加权,加权求和之后得到真正的 loss,再计算这个 loss 对于所有相关变量的导数。 \n",
|
||
"3)PyTorch中的计算图是动态计算图,动态计算图有两个特点:正向传播是立即执行的;反向传播后计算图立即销毁。我们把PyTorch使用自动微分的计算图的生命周期用图2-11来表示。\t\n",
|
||
""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 2.5.5切断一些分支的反向传播\n",
|
||
"训练网络时,有时候我们希望保持一部分的网络参数不变,只对其中一部分的参数进行调整;或者只训练部分分支网络,并不让其梯度对主网络的梯度造成影响,这时候可以使用detach()函数来切断一些分支的反向传播。\n",
|
||
"\tdetach_()将张量从创建它的计算图(Graph)中分离,把它作为叶子节点,其grad_fn=None且requires_grad=False。\n",
|
||
"\t假设y是作为x的函数,而z则是y和x的函数。如果我们想计算z关于x的梯度,但由于某种原因,我们希望将y视为一个常数。为此,我们可以分离y来返回一个新变量c,c变量与y具有相同的值, 但丢弃计算图中如何计算y的任何信息。 换句话说,梯度不会向后流经c到x。 因此,下面的反向传播函数计算z=c*x关于x的偏导数,同时将c作为常数处理,即有$\\frac{∂z}{∂x}=c$,而不是把$z=x^3+3$关于x的偏导数,$\\frac{∂z}{∂x}≠3x^2$。\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"False"
|
||
]
|
||
},
|
||
"execution_count": 7,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"import torch\n",
|
||
"\n",
|
||
"x = torch.ones(2,requires_grad=True)\n",
|
||
"y = x**2+3\n",
|
||
"##对分离变量y,生成一个新变量c。\n",
|
||
"c = y.detach()\n",
|
||
"z = c*x\n",
|
||
"z.sum().backward()\n",
|
||
"x.grad==c ## tensor([True, True])\n",
|
||
"x.grad ## tensor([4., 4.])\n",
|
||
"c.grad_fn==None ## True\n",
|
||
"c.requires_grad ##False"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"由于变量c记录了y的计算结果,在y上调用反向传播, 将得到y= x**2+3关于的x的导数,即2*x。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 8,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"tensor([True, True])"
|
||
]
|
||
},
|
||
"execution_count": 8,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"x.grad.zero_()\n",
|
||
"y.sum().backward()\n",
|
||
"x.grad == 2 * x ##tensor([True, True])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"source": [
|
||
"## 2.6 使用NumPy实现机器学习"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 31,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# -*- coding: utf-8 -*-\n",
|
||
"import numpy as np\n",
|
||
"%matplotlib inline\n",
|
||
"from matplotlib import pyplot as plt\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 32,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"np.random.seed(100) \n",
|
||
"x = np.linspace(-1, 1, 100).reshape(100,1) \n",
|
||
"y = 3*np.power(x, 2) +2+ 0.2*np.random.rand(x.size).reshape(100,1) \n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 33,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "\n",
|
||
"text/plain": [
|
||
"<Figure size 432x288 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"# 画图\n",
|
||
"plt.scatter(x, y)\n",
|
||
"plt.xlabel(\"x\")\n",
|
||
"plt.ylabel(\"y\")\n",
|
||
"plt.show()\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 34,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 随机初始化参数\n",
|
||
"w1 = np.random.rand(1,1)\n",
|
||
"b1 = np.random.rand(1,1) \n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 35,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"lr =0.001 # 学习率\n",
|
||
"\n",
|
||
"for i in range(800):\n",
|
||
" # 前向传播\n",
|
||
" y_pred = np.power(x,2)*w1 + b1\n",
|
||
" # 定义损失函数\n",
|
||
" loss = 0.5 * (y_pred - y) ** 2\n",
|
||
" loss = loss.sum()\n",
|
||
" #计算梯度\n",
|
||
" grad_w=np.sum((y_pred - y)*np.power(x,2))\n",
|
||
" grad_b=np.sum((y_pred - y))\n",
|
||
" #使用梯度下降法,是loss最小\n",
|
||
" w1 -= lr * grad_w\n",
|
||
" b1 -= lr * grad_b\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 36,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "\n",
|
||
"text/plain": [
|
||
"<Figure size 432x288 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"[[2.98927619]] [[2.09818307]]\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"plt.plot(x, y_pred,'r-',label='predict',linewidth=4)\n",
|
||
"plt.scatter(x, y,color='blue',marker='o',label='true') # true data\n",
|
||
"plt.xlim(-1,1)\n",
|
||
"plt.ylim(2,6) \n",
|
||
"plt.legend()\n",
|
||
"plt.xlabel(\"x\")\n",
|
||
"plt.ylabel(\"y\")\n",
|
||
"plt.show()\n",
|
||
"print(w1,b1)\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 2.7 使用Tensor及autograd实现机器学习\n",
|
||
"使用PyTorch中张量及自动微函数(autograd)替换2.6小节中手工计算梯度的反向传播。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 37,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import torch \n",
|
||
"%matplotlib inline\n",
|
||
"from matplotlib import pyplot as plt\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 38,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "\n",
|
||
"text/plain": [
|
||
"<Figure size 432x288 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"torch.manual_seed(100) \n",
|
||
"dtype = torch.float\n",
|
||
"#生成x坐标数据,x为tenor,形状为100x1\n",
|
||
"x = torch.unsqueeze(torch.linspace(-1, 1, 100), dim=1) \n",
|
||
"#生成y坐标数据,y为tenor,形状为100x1,另加上一些噪音\n",
|
||
"y = 3*x.pow(2) +2+ 0.2*torch.rand(x.size()) \n",
|
||
"\n",
|
||
"# 画图,把tensor数据转换为numpy数据\n",
|
||
"plt.scatter(x.numpy(), y.numpy())\n",
|
||
"plt.xlabel(\"x\")\n",
|
||
"plt.ylabel(\"y\")\n",
|
||
"plt.show()\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 39,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 随机初始化参数,参数w,b为需要学习的,故需requires_grad=True\n",
|
||
"w = torch.randn(1,1, dtype=dtype,requires_grad=True)\n",
|
||
"b = torch.zeros(1,1, dtype=dtype, requires_grad=True) \n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 40,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"lr =0.001 # 学习率\n",
|
||
"\n",
|
||
"for ii in range(800):\n",
|
||
" # forward:计算loss\n",
|
||
" y_pred = x.pow(2).mm(w) + b\n",
|
||
" loss = 0.5 * (y_pred - y) ** 2\n",
|
||
" loss = loss.sum()\n",
|
||
" \n",
|
||
" # backward:自动计算梯度\n",
|
||
" loss.backward()\n",
|
||
" \n",
|
||
" # 手动更新参数,需要用torch.no_grad()更新参数\n",
|
||
" with torch.no_grad():\n",
|
||
" w -= lr * w.grad\n",
|
||
" b -= lr * b.grad\n",
|
||
" \n",
|
||
" # 因通过autigrad计算的梯度,会累加到grad中,故每次循环需把梯度清零\n",
|
||
" w.grad.zero_()\n",
|
||
" b.grad.zero_()\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 41,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "\n",
|
||
"text/plain": [
|
||
"<Figure size 432x288 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"tensor([[2.9645]], requires_grad=True) tensor([[2.1146]], requires_grad=True)\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"plt.plot(x.numpy(), y_pred.detach().numpy(),'r-',label='predict',linewidth=4)#predict\n",
|
||
"plt.scatter(x.numpy(), y.numpy(),color='blue',marker='o',label='true') # true data\n",
|
||
"plt.xlim(-1,1)\n",
|
||
"plt.ylim(2,6) \n",
|
||
"plt.legend()\n",
|
||
"plt.xlabel(\"x\")\n",
|
||
"plt.ylabel(\"y\")\n",
|
||
"plt.show()\n",
|
||
" \n",
|
||
"print(w, b)\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 2.8 使用优化器(optimizer)和自动微分(autograd)实现机器学习\n",
|
||
"使用PyTorch内置的损失函数,优化器和自动微分机制。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 42,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import torch\n",
|
||
"from torch import nn\n",
|
||
"%matplotlib inline\n",
|
||
"from matplotlib import pyplot as plt"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 43,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "\n",
|
||
"text/plain": [
|
||
"<Figure size 432x288 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"torch.manual_seed(100) \n",
|
||
"dtype = torch.float\n",
|
||
"#生成x坐标数据,x为tenor,形状为100x1\n",
|
||
"x = torch.unsqueeze(torch.linspace(-1, 1, 100), dim=1) \n",
|
||
"#生成y坐标数据,y为tenor,形状为100x1,另加上一些噪音\n",
|
||
"y = 3*x.pow(2) +2+ 0.2*torch.rand(x.size()) \n",
|
||
"\n",
|
||
"# 画图,把tensor数据转换为numpy数据\n",
|
||
"plt.scatter(x.numpy(), y.numpy())\n",
|
||
"plt.xlabel(\"x\")\n",
|
||
"plt.ylabel(\"y\")\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 44,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 随机初始化参数,参数w,b为需要学习的,故需requires_grad=True\n",
|
||
"w = torch.randn(1,1, dtype=dtype,requires_grad=True)\n",
|
||
"b = torch.zeros(1,1, dtype=dtype, requires_grad=True) "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 45,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 定义损失函数及优化器\n",
|
||
"loss_func = nn.MSELoss()\n",
|
||
"optimizer = torch.optim.SGD([w,b],lr = 0.001)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 46,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"for ii in range(10000):\n",
|
||
" # forward:计算loss\n",
|
||
" y_pred = x.pow(2).mm(w) + b\n",
|
||
" loss=loss_func(y_pred,y)\n",
|
||
" \n",
|
||
" # backward:自动计算梯度\n",
|
||
" loss.backward()\n",
|
||
" \n",
|
||
" # 更新参数\n",
|
||
" optimizer.step() \n",
|
||
" # 因通过autigrad计算的梯度,会累加到grad中,故每次循环需把梯度清零\n",
|
||
" optimizer.zero_grad() \n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 47,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "\n",
|
||
"text/plain": [
|
||
"<Figure size 432x288 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"tensor([[2.6369]], requires_grad=True) tensor([[2.2360]], requires_grad=True)\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"plt.plot(x.numpy(), y_pred.detach().numpy(),'r-',label='predict',linewidth=4)#predict\n",
|
||
"plt.scatter(x.numpy(), y.numpy(),color='blue',marker='o',label='true') # true data\n",
|
||
"plt.xlim(-1,1)\n",
|
||
"plt.ylim(2,6) \n",
|
||
"plt.legend()\n",
|
||
"plt.xlabel(\"x\")\n",
|
||
"plt.ylabel(\"y\")\n",
|
||
"plt.show()\n",
|
||
" \n",
|
||
"print(w, b)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 2.9 把数据集转换带批量的迭代器"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 48,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import torch\n",
|
||
"import numpy as np\n",
|
||
"from torch import nn\n",
|
||
"%matplotlib inline\n",
|
||
"from matplotlib import pyplot as plt"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 49,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "\n",
|
||
"text/plain": [
|
||
"<Figure size 432x288 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"torch.manual_seed(100) \n",
|
||
"dtype = torch.float\n",
|
||
"#生成x坐标数据,x为tenor,形状为100x1\n",
|
||
"x = torch.unsqueeze(torch.linspace(-1, 1, 100), dim=1) \n",
|
||
"#生成y坐标数据,y为tenor,形状为100x1,另加上一些噪音\n",
|
||
"y = 3*x.pow(2) +2+ 0.2*torch.rand(x.size()) \n",
|
||
"\n",
|
||
"# 画图,把tensor数据转换为numpy数据\n",
|
||
"plt.scatter(x.numpy(), y.numpy())\n",
|
||
"plt.xlabel(\"x\")\n",
|
||
"plt.ylabel(\"y\")\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 50,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 随机初始化参数,参数w,b为需要学习的,故需requires_grad=True\n",
|
||
"w = torch.randn(1,1, dtype=dtype,requires_grad=True)\n",
|
||
"b = torch.zeros(1,1, dtype=dtype, requires_grad=True) "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 51,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 定义损失函数及优化器\n",
|
||
"loss_func = nn.MSELoss()\n",
|
||
"optimizer = torch.optim.SGD([w,b],lr = 0.001)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 52,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import numpy as np\n",
|
||
"\n",
|
||
"# 构建数据管道迭代器\n",
|
||
"def data_iter(features, labels, batch_size=4):\n",
|
||
" num_examples = len(features)\n",
|
||
" indices = list(range(num_examples))\n",
|
||
" np.random.shuffle(indices) #样本的读取顺序是随机的\n",
|
||
" for i in range(0, num_examples, batch_size):\n",
|
||
" indexs = torch.LongTensor(indices[i: min(i + batch_size, num_examples)])\n",
|
||
" yield features.index_select(0, indexs), labels.index_select(0, indexs) \n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 53,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"for ii in range(1000):\n",
|
||
" for features, labels in data_iter(x,y,10):\n",
|
||
" # forward:计算loss\n",
|
||
" y_pred = features.pow(2).mm(w) + b\n",
|
||
" loss=loss_func(y_pred,labels)\n",
|
||
" \n",
|
||
" # backward:自动计算梯度\n",
|
||
" loss.backward()\n",
|
||
" \n",
|
||
" # 更新参数\n",
|
||
" optimizer.step() \n",
|
||
" # 因通过autigrad计算的梯度,会累加到grad中,故每次循环需把梯度清零\n",
|
||
" optimizer.zero_grad() "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 54,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "\n",
|
||
"text/plain": [
|
||
"<Figure size 432x288 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"tensor([[2.6370]], requires_grad=True) tensor([[2.2360]], requires_grad=True)\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"y_p=x.pow(2).mm(w).detach().numpy() + b.detach().numpy()\n",
|
||
"plt.plot(x.numpy(), y_p,'r-',label='predict',linewidth=4)#predict\n",
|
||
"plt.scatter(x.numpy(), y.numpy(),color='blue',marker='o',label='true') # true data\n",
|
||
"plt.xlim(-1,1)\n",
|
||
"plt.ylim(2,6) \n",
|
||
"plt.legend()\n",
|
||
"plt.xlabel(\"x\")\n",
|
||
"plt.ylabel(\"y\")\n",
|
||
"plt.show()\n",
|
||
" \n",
|
||
"print(w, b)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 2.10 使用TensorFlow架构实现机器学习"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 55,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import tensorflow as tf\n",
|
||
"import numpy as np\n",
|
||
"from matplotlib import pyplot as plt\n",
|
||
"%matplotlib inline"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 56,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"#生成训练数据\n",
|
||
"np.random.seed(100) \n",
|
||
"x = np.linspace(-1, 1, 100).reshape(100,1) \n",
|
||
"y = 3*np.power(x, 2) +2+ 0.2*np.random.rand(x.size).reshape(100,1) \n",
|
||
"\n",
|
||
"# 创建权重变量w和b,并用随机值初始化.\n",
|
||
"# TensorFlow 的变量在整个计算图保存其值.\n",
|
||
"w = tf.Variable(tf.random.uniform([1], 0, 1.0))\n",
|
||
"b = tf.Variable(tf.zeros([1]))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 57,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 定义模型\n",
|
||
"class CustNet: \n",
|
||
" #正向传播\n",
|
||
" def __call__(self,x): \n",
|
||
" return np.power(x,2)*w + b\n",
|
||
"\n",
|
||
" # 损失函数\n",
|
||
" def loss_func(self,y_true,y_pred): \n",
|
||
" return tf.reduce_mean((y_true - y_pred)**2/2)\n",
|
||
" \n",
|
||
"model=CustNet()\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 训练模型"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 58,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"epochs=14000\n",
|
||
"\n",
|
||
"for epoch in tf.range(1,epochs):\n",
|
||
" with tf.GradientTape() as tape:\n",
|
||
" predictions = model(x)\n",
|
||
" loss = model.loss_func(y, predictions)\n",
|
||
" # 反向传播求梯度\n",
|
||
" dw,db = tape.gradient(loss,[w,b])\n",
|
||
" # 梯度下降法更新参数\n",
|
||
" w.assign(w - 0.001*dw)\n",
|
||
" b.assign(b - 0.001*db) "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 59,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Text(0, 0.5, 'y')"
|
||
]
|
||
},
|
||
"execution_count": 59,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
},
|
||
{
|
||
"data": {
|
||
"image/png": "\n",
|
||
"text/plain": [
|
||
"<Figure size 432x288 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"# 可视化结果 \n",
|
||
"plt.figure() \n",
|
||
"plt.scatter(x,y,color='blue',marker='o',label='true')\n",
|
||
"plt.plot (x, b + w*x**2,'r-',label='predict',linewidth=4)\n",
|
||
"plt.xlabel(\"x\")\n",
|
||
"plt.ylabel(\"y\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.7.4"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 2
|
||
}
|