MITDeepLearning · aamini · Jan 4, 2026 · Dec 23, 2025 · Dec 23, 2025 · Dec 23, 2025
diff --git a/lab1/PT_Part1_Intro.ipynb b/lab1/PT_Part1_Intro.ipynb
@@ -27,7 +27,7 @@
       },
       "outputs": [],
       "source": [
-        "# Copyright 2025 MIT Introduction to Deep Learning. All Rights Reserved.\n",
+        "# Copyright 2026 MIT Introduction to Deep Learning. All Rights Reserved.\n",
         "#\n",
         "# Licensed under the MIT License. You may not use this file except in compliance\n",
         "# with the License. Use and/or modification of this code outside of MIT Introduction\n",
@@ -53,7 +53,7 @@
         "\n",
         "## 0.1 Install PyTorch\n",
         "\n",
-        "[PyTorch](https://pytorch.org/) is a popular deep learning library known for its flexibility and ease of use. Here we'll learn how computations are represented and how to define a simple neural network in PyTorch. For all the labs in Introduction to Deep Learning 2025, there will be a PyTorch version available.\n",
+        "[PyTorch](https://pytorch.org/) is a popular deep learning library known for its flexibility and ease of use. Here we'll learn how computations are represented and how to define a simple neural network in PyTorch. For all the labs in Introduction to Deep Learning 2026, there will be a PyTorch version available.\n",
         "\n",
         "Let's install PyTorch and a couple of dependencies."
       ]
@@ -203,7 +203,7 @@
         "\n",
         "A convenient way to think about and visualize computations in a machine learning framework like PyTorch is in terms of graphs. We can define this graph in terms of tensors, which hold data, and the mathematical operations that act on these tensors in some order. Let's look at a simple example, and define this computation using PyTorch:\n",
         "\n",
-        "![alt text](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/2025/lab1/img/add-graph.png)"
+        "![alt text](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/master/lab1/img/add-graph.png)"
       ]
     },
     {
@@ -235,7 +235,7 @@
         "\n",
         "Now let's consider a slightly more complicated example:\n",
         "\n",
-        "![alt text](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/2025/lab1/img/computation-graph.png)\n",
+        "![alt text](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/master/lab1/img/computation-graph.png)\n",
         "\n",
         "Here, we take two inputs, `a, b`, and compute an output `e`. Each node in the graph represents an operation that takes some input, does some computation, and passes its output to another node.\n",
         "\n",
@@ -306,7 +306,7 @@
         "\n",
         "Let's consider the example of a simple perceptron defined by just one dense (aka fully-connected or linear) layer: $ y = \\sigma(Wx + b) $, where $W$ represents a matrix of weights, $b$ is a bias, $x$ is the input, $\\sigma$ is the sigmoid activation function, and $y$ is the output.\n",
         "\n",
-        "![alt text](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/2025/lab1/img/computation-graph-2.png)\n",
+        "![alt text](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/master/lab1/img/computation-graph-2.png)\n",
         "\n",
         "We will use `torch.nn.Module` to define layers -- the building blocks of neural networks. Layers implement common neural networks operations. In PyTorch, when we implement a layer, we subclass `nn.Module` and define the parameters of the layer as attributes of our new class. We also define and override a function [``forward``](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.forward), which will define the forward pass computation that is performed at every step. All classes subclassing `nn.Module` should override the `forward` function.\n",
         "\n",

diff --git a/lab1/PT_Part2_Music_Generation.ipynb b/lab1/PT_Part2_Music_Generation.ipynb
@@ -27,7 +27,7 @@
       },
       "outputs": [],
       "source": [
-        "# Copyright 2025 MIT Introduction to Deep Learning. All Rights Reserved.\n",
+        "# Copyright 2026 MIT Introduction to Deep Learning. All Rights Reserved.\n",
         "#\n",
         "# Licensed under the MIT License. You may not use this file except in compliance\n",
         "# with the License. Use and/or modification of this code outside of MIT Introduction\n",
@@ -399,7 +399,7 @@
         "* [`nn.LSTM`](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html): Our LSTM network, with size `hidden_size`.\n",
         "* [`nn.Linear`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html): The output layer, with `vocab_size` outputs.\n",
         "\n",
-        "<img src=\"https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/2019/lab1/img/lstm_unrolled-01-01.png\" alt=\"Drawing\"/>\n",
+        "<img src=\"https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/master/lab1/img/lstm_unrolled-01-01.png\" alt=\"Drawing\"/>\n",
         "\n",
         "\n",
         "\n",
@@ -415,7 +415,7 @@
         "* [`tf.keras.layers.Dense`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense): The output layer, with `vocab_size` outputs.\n",
         "\n",
         "\n",
-        "<img src=\"https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/2019/lab1/img/lstm_unrolled-01-01.png\" alt=\"Drawing\"/> -->"
+        "<img src=\"https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/master/lab1/img/lstm_unrolled-01-01.png\" alt=\"Drawing\"/> -->"
       ]
     },
     {
@@ -652,6 +652,11 @@
     },
     {
       "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "GuGUJB0ZT_Uo"
+      },
+      "outputs": [],
       "source": [
         "### compute the loss on the predictions from the untrained model from earlier. ###\n",
         "y.shape  # (batch_size, sequence_length)\n",
@@ -663,12 +668,7 @@
         "\n",
         "print(f\"Prediction shape: {pred.shape} # (batch_size, sequence_length, vocab_size)\")\n",
         "print(f\"scalar_loss:      {example_batch_loss.mean().item()}\")"
-      ],
-      "metadata": {
-        "id": "GuGUJB0ZT_Uo"
-      },
-      "execution_count": null,
-      "outputs": []
+      ]
     },
     {
       "cell_type": "markdown",
@@ -875,7 +875,7 @@
         "\n",
         "* At each time step, the updated RNN state is fed back into the model, so that it now has more context in making the next prediction. After predicting the next character, the updated RNN states are again fed back into the model, which is how it learns sequence dependencies in the data, as it gets more information from the previous predictions.\n",
         "\n",
-        "![LSTM inference](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/2019/lab1/img/lstm_inference.png)\n",
+        "![LSTM inference](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/master/lab1/img/lstm_inference.png)\n",
         "\n",
         "Complete and experiment with this code block (as well as some of the aspects of network definition and training!), and see how the model performs. How do songs generated after training with a small number of epochs compare to those generated after a longer duration of training?"
       ]
@@ -906,7 +906,7 @@
         "\n",
         "  for i in tqdm(range(generation_length)):\n",
         "    '''TODO: evaluate the inputs and generate the next character predictions'''\n",
-        "    predictions, hidden_state = model('''TODO''', '''TODO''', return_state=True) # TODO\n",
+        "    predictions, state = model('''TODO''', '''TODO''', return_state=True) # TODO\n",
         "\n",
         "    # Remove the batch dimension\n",
         "    predictions = predictions.squeeze(0)\n",
@@ -1004,7 +1004,7 @@
         "*  What if you alter or augment the dataset?\n",
         "*  Does the choice of start string significantly affect the result?\n",
         "\n",
-        "Try to optimize your model and submit your best song! **Participants will be eligible for prizes during the January 2025 offering. To enter the competition, you must upload the following to [this submission link](https://www.dropbox.com/request/U8nND6enGjirujVZKX1n):**\n",
+        "Try to optimize your model and submit your best song! **Participants will be eligible for prizes during the January 2026 offering. To enter the competition, you must upload the following to [this submission link](https://www.dropbox.com/request/4hqfsOnLtX4jH1W3ynfp):**\n",
         "\n",
         "* a recording of your song;\n",
         "* iPython notebook with the code you used to generate the song;\n",

diff --git a/lab1/TF_Part1_Intro.ipynb b/lab1/TF_Part1_Intro.ipynb
@@ -27,7 +27,7 @@
       },
       "outputs": [],
       "source": [
-        "# Copyright 2025 MIT Introduction to Deep Learning. All Rights Reserved.\n",
+        "# Copyright 2026 MIT Introduction to Deep Learning. All Rights Reserved.\n",
         "#\n",
         "# Licensed under the MIT License. You may not use this file except in compliance\n",
         "# with the License. Use and/or modification of this code outside of MIT Introduction\n",
@@ -53,7 +53,7 @@
         "\n",
         "## 0.1 Install TensorFlow\n",
         "\n",
-        "TensorFlow is a software library extensively used in machine learning. Here we'll learn how computations are represented and how to define a simple neural network in TensorFlow. For all the TensorFlow labs in Introduction to Deep Learning 2025, we'll be using TensorFlow 2, which affords great flexibility and the ability to imperatively execute operations, just like in Python. You'll notice that TensorFlow 2 is quite similar to Python in its syntax and imperative execution. Let's install TensorFlow and a couple of dependencies.\n"
+        "TensorFlow is a software library extensively used in machine learning. Here we'll learn how computations are represented and how to define a simple neural network in TensorFlow. For all the TensorFlow labs in Introduction to Deep Learning 2026, we'll be using TensorFlow 2, which affords great flexibility and the ability to imperatively execute operations, just like in Python. You'll notice that TensorFlow 2 is quite similar to Python in its syntax and imperative execution. Let's install TensorFlow and a couple of dependencies.\n"
       ]
     },
     {
@@ -208,7 +208,7 @@
         "\n",
         "A convenient way to think about and visualize computations in TensorFlow is in terms of graphs. We can define this graph in terms of Tensors, which hold data, and the mathematical operations that act on these Tensors in some order. Let's look at a simple example, and define this computation using TensorFlow:\n",
         "\n",
-        "![alt text](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/2025/lab1/img/add-graph.png)"
+        "![alt text](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/master/lab1/img/add-graph.png)"
       ]
     },
     {
@@ -240,7 +240,7 @@
         "\n",
         "Now let's consider a slightly more complicated example:\n",
         "\n",
-        "![alt text](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/2025/lab1/img/computation-graph.png)\n",
+        "![alt text](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/master/lab1/img/computation-graph.png)\n",
         "\n",
         "Here, we take two inputs, `a, b`, and compute an output `e`. Each node in the graph represents an operation that takes some input, does some computation, and passes its output to another node.\n",
         "\n",
@@ -311,7 +311,7 @@
         "\n",
         "Let's first consider the example of a simple perceptron defined by just one dense layer: $ y = \\sigma(Wx + b)$, where $W$ represents a matrix of weights, $b$ is a bias, $x$ is the input, $\\sigma$ is the sigmoid activation function, and $y$ is the output. We can also visualize this operation using a graph:\n",
         "\n",
-        "![alt text](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/2025/lab1/img/computation-graph-2.png)\n",
+        "![alt text](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/master/lab1/img/computation-graph-2.png)\n",
         "\n",
         "Tensors can flow through abstract types called [```Layers```](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Layer) -- the building blocks of neural networks. ```Layers``` implement common neural networks operations, and are used to update weights, compute losses, and define inter-layer connectivity. We will first define a ```Layer``` to implement the simple perceptron defined above."
       ]
@@ -339,8 +339,14 @@
         "    d = int(input_shape[-1])\n",
         "    # Define and initialize parameters: a weight matrix W and bias b\n",
         "    # Note that parameter initialization is random!\n",
-        "    self.W = self.add_weight(\"weight\", shape=[d, self.n_output_nodes]) # note the dimensionality\n",
-        "    self.b = self.add_weight(\"bias\", shape=[1, self.n_output_nodes]) # note the dimensionality\n",
+        "    self.W = self.add_weight(\n",
+        "        shape=(d, self.n_output_nodes),\n",
+        "        name=\"weight\",\n",
+        "    )\n",
+        "    self.b = self.add_weight(\n",
+        "        shape=(1, self.n_output_nodes),\n",
+        "        name=\"bias\",\n",
+        "    )\n",
         "\n",
         "  def call(self, x):\n",
         "    '''TODO: define the operation for z (hint: use tf.matmul)'''\n",

diff --git a/lab1/TF_Part2_Music_Generation.ipynb b/lab1/TF_Part2_Music_Generation.ipynb
@@ -27,7 +27,7 @@
       },
       "outputs": [],
       "source": [
-        "# Copyright 2025 MIT Introduction to Deep Learning. All Rights Reserved.\n",
+        "# Copyright 2026 MIT Introduction to Deep Learning. All Rights Reserved.\n",
         "#\n",
         "# Licensed under the MIT License. You may not use this file except in compliance\n",
         "# with the License. Use and/or modification of this code outside of MIT Introduction\n",
@@ -399,7 +399,7 @@
         "* [`tf.keras.layers.Dense`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense): The output layer, with `vocab_size` outputs.\n",
         "\n",
         "\n",
-        "<img src=\"https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/2019/lab1/img/lstm_unrolled-01-01.png\" alt=\"Drawing\"/>"
+        "<img src=\"https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/master/lab1/img/lstm_unrolled-01-01.png\" alt=\"Drawing\"/>"
       ]
     },
     {
@@ -858,7 +858,7 @@
         "\n",
         "* At each time step, the updated RNN state is fed back into the model, so that it now has more context in making the next prediction. After predicting the next character, the updated RNN states are again fed back into the model, which is how it learns sequence dependencies in the data, as it gets more information from the previous predictions.\n",
         "\n",
-        "![LSTM inference](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/2019/lab1/img/lstm_inference.png)\n",
+        "![LSTM inference](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/master/lab1/img/lstm_inference.png)\n",
         "\n",
         "Complete and experiment with this code block (as well as some of the aspects of network definition and training!), and see how the model performs. How do songs generated after training with a small number of epochs compare to those generated after a longer duration of training?"
       ]
@@ -884,7 +884,9 @@
         "  text_generated = []\n",
         "\n",
         "  # Here batch size == 1\n",
-        "  model.reset_states()\n",
+        "  for layer in model.layers:\n",
+        "    if hasattr(layer, \"reset_states\"):\n",
+        "        layer.reset_states()\n",
         "  tqdm._instances.clear()\n",
         "\n",
         "  for i in tqdm(range(generation_length)):\n",
@@ -991,7 +993,7 @@
         "*  What if you alter or augment the dataset?\n",
         "*  Does the choice of start string significantly affect the result?\n",
         "\n",
-        "Try to optimize your model and submit your best song! **Participants will be eligible for prizes during the January 2025 offering. To enter the competition, you must upload the following to [this submission link](https://www.dropbox.com/request/U8nND6enGjirujVZKX1n):**\n",
+        "Try to optimize your model and submit your best song! **Participants will be eligible for prizes during the January 2025 offering. To enter the competition, you must upload the following to [this submission link](https://www.dropbox.com/request/4hqfsOnLtX4jH1W3ynfp):**\n",
         "\n",
         "* a recording of your song;\n",
         "* iPython notebook with the code you used to generate the song;\n",

diff --git a/lab1/solutions/PT_Part1_Intro_Solution.ipynb b/lab1/solutions/PT_Part1_Intro_Solution.ipynb
@@ -27,7 +27,7 @@
       },
       "outputs": [],
       "source": [
-        "# Copyright 2025 MIT Introduction to Deep Learning. All Rights Reserved.\n",
+        "# Copyright 2026 MIT Introduction to Deep Learning. All Rights Reserved.\n",
         "#\n",
         "# Licensed under the MIT License. You may not use this file except in compliance\n",
         "# with the License. Use and/or modification of this code outside of MIT Introduction\n",
@@ -241,7 +241,7 @@
         "\n",
         "A convenient way to think about and visualize computations in a machine learning framework like PyTorch is in terms of graphs. We can define this graph in terms of tensors, which hold data, and the mathematical operations that act on these tensors in some order. Let's look at a simple example, and define this computation using PyTorch:\n",
         "\n",
-        "![alt text](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/2025/lab1/img/add-graph.png)"
+        "![alt text](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/master/lab1/img/add-graph.png)"
       ]
     },
     {
@@ -282,7 +282,7 @@
         "\n",
         "Now let's consider a slightly more complicated example:\n",
         "\n",
-        "![alt text](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/2025/lab1/img/computation-graph.png)\n",
+        "![alt text](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/master/lab1/img/computation-graph.png)\n",
         "\n",
         "Here, we take two inputs, `a, b`, and compute an output `e`. Each node in the graph represents an operation that takes some input, does some computation, and passes its output to another node.\n",
         "\n",
@@ -364,7 +364,7 @@
         "\n",
         "Let's consider the example of a simple perceptron defined by just one dense (aka fully-connected or linear) layer: $ y = \\sigma(Wx + b) $, where $W$ represents a matrix of weights, $b$ is a bias, $x$ is the input, $\\sigma$ is the sigmoid activation function, and $y$ is the output.\n",
         "\n",
-        "![alt text](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/2025/lab1/img/computation-graph-2.png)\n",
+        "![alt text](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/master/lab1/img/computation-graph-2.png)\n",
         "\n",
         "We will use `torch.nn.Module` to define layers -- the building blocks of neural networks. Layers implement common neural networks operations. In PyTorch, when we implement a layer, we subclass `nn.Module` and define the parameters of the layer as attributes of our new class. We also define and override a function [``forward``](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.forward), which will define the forward pass computation that is performed at every step. All classes subclassing `nn.Module` should override the `forward` function.\n",
         "\n",