diff --git a/week02_value_based/seminar_vi.ipynb b/week02_value_based/seminar_vi.ipynb index 63868a8f9..188448bbe 100644 --- a/week02_value_based/seminar_vi.ipynb +++ b/week02_value_based/seminar_vi.ipynb @@ -870,7 +870,7 @@ "source": [ "# HW Part 1: Value iteration convergence\n", "\n", - "### Find an MDP for which value iteration takes long to converge (0.5 pts)\n", + "### Find an MDP for which value iteration takes long to converge (1 pts)\n", "\n", "When we ran value iteration on the small frozen lake problem, the last iteration where an action changed was iteration 6--i.e., value iteration computed the optimal policy at iteration 6. Are there any guarantees regarding how many iterations it'll take value iteration to compute the optimal policy? There are no such guarantees without additional assumptions--we can construct the MDP in such a way that the greedy policy will change after arbitrarily many iterations.\n", "\n", @@ -926,7 +926,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Value iteration convervence proof (0.5 pts)\n", + "### Value iteration convervence proof (1 pts)\n", "**Note:** Assume that $\\mathcal{S}, \\mathcal{A}$ are finite.\n", "\n", "Update of value function in value iteration can be rewritten in a form of Bellman operator:\n", @@ -963,7 +963,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Bonus. Asynchronious value iteration (2 pts)\n", + "### Asynchronious value iteration (2 pts)\n", "\n", "Consider the following algorithm:\n", "\n", @@ -997,7 +997,7 @@ "source": [ "# HW Part 2: Policy iteration\n", "\n", - "## Policy iteration implementateion (2 pts)\n", + "## Policy iteration implementateion (3 pts)\n", "\n", "Let's implement exact policy iteration (PI), which has the following pseudocode:\n", "\n", @@ -1223,9 +1223,22 @@ } ], "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", "name": "python", - "pygments_lexer": "ipython3" + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.10" } }, "nbformat": 4,