google-deepmind · isayahc · Dec 1, 2023
diff --git a/notebooks/AlphaFold.ipynb b/notebooks/AlphaFold.ipynb
@@ -10,6 +10,8 @@
         "\n",
         "This Colab notebook allows you to easily predict the structure of a protein using a slightly simplified version of [AlphaFold v2.3.2](https://doi.org/10.1038/s41586-021-03819-2). \n",
         "\n",
+        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google-deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb)\n",
+        "\n",
         "**Differences to AlphaFold v2.3.2**\n",
         "\n",
         "In comparison to AlphaFold v2.3.2, this Colab notebook uses **no templates (homologous structures)** and a selected portion of the [BFD database](https://bfd.mmseqs.com/). We have validated these changes on several thousand recent PDB structures. While accuracy will be near-identical to the full AlphaFold system on many targets, a small fraction have a large drop in accuracy due to the smaller MSA and lack of templates. For best reliability, we recommend instead using the [full open source AlphaFold](https://github.com/deepmind/alphafold/), or the [AlphaFold Protein Structure Database](https://alphafold.ebi.ac.uk/).\n",
@@ -103,14 +105,14 @@
         "      %shell rm -rf /opt/conda\n",
         "      %shell wget -q -P /tmp \\\n",
         "        https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \\\n",
-        "          \u0026\u0026 bash /tmp/Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda \\\n",
-        "          \u0026\u0026 rm /tmp/Miniconda3-latest-Linux-x86_64.sh\n",
+        "          && bash /tmp/Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda \\\n",
+        "          && rm /tmp/Miniconda3-latest-Linux-x86_64.sh\n",
         "      pbar.update(9)\n",
         "\n",
         "      PATH=%env PATH\n",
         "      %env PATH=/opt/conda/bin:{PATH}\n",
         "      %shell conda install -qy conda==23.5.2 \\\n",
-        "          \u0026\u0026 conda install -qy -c conda-forge \\\n",
+        "          && conda install -qy -c conda-forge \\\n",
         "            python=3.10 \\\n",
         "            openmm=7.7.0 \\\n",
         "            pdbfixer\n",
@@ -184,9 +186,9 @@
         "\n",
         "import jax\n",
         "if jax.local_devices()[0].platform == 'tpu':\n",
-        "  raise RuntimeError('Colab TPU runtime not supported. Change it to GPU via Runtime -\u003e Change Runtime Type -\u003e Hardware accelerator -\u003e GPU.')\n",
+        "  raise RuntimeError('Colab TPU runtime not supported. Change it to GPU via Runtime -> Change Runtime Type -> Hardware accelerator -> GPU.')\n",
         "elif jax.local_devices()[0].platform == 'cpu':\n",
-        "  raise RuntimeError('Colab CPU runtime not supported. Change it to GPU via Runtime -\u003e Change Runtime Type -\u003e Hardware accelerator -\u003e GPU.')\n",
+        "  raise RuntimeError('Colab CPU runtime not supported. Change it to GPU via Runtime -> Change Runtime Type -> Hardware accelerator -> GPU.')\n",
         "else:\n",
         "  print(f'Running with {jax.local_devices()[0].device_kind} GPU')\n",
         "\n",
@@ -206,7 +208,7 @@
       "source": [
         "## Making a prediction\n",
         "\n",
-        "Please paste the sequence of your protein in the text box below, then run the remaining cells via _Runtime_ \u003e _Run after_. You can also run the cells individually by pressing the _Play_ button on the left.\n",
+        "Please paste the sequence of your protein in the text box below, then run the remaining cells via _Runtime_ > _Run after_. You can also run the cells individually by pressing the _Play_ button on the left.\n",
         "\n",
         "Note that the search against databases and the actual prediction can take some time, from minutes to hours, depending on the length of the protein and what type of GPU you are allocated by Colab (see FAQ below)."
       ]
@@ -301,21 +303,21 @@
         "\n",
         "# Check whether total length exceeds limit.\n",
         "total_sequence_length = sum([len(seq) for seq in sequences])\n",
-        "if total_sequence_length \u003e MAX_LENGTH:\n",
+        "if total_sequence_length > MAX_LENGTH:\n",
         "  raise ValueError('The total sequence length is too long: '\n",
         "                   f'{total_sequence_length}, while the maximum is '\n",
         "                   f'{MAX_LENGTH}.')\n",
         "\n",
         "# Check whether we exceed the monomer limit.\n",
         "if model_type_to_use == ModelType.MONOMER:\n",
-        "  if len(sequences[0]) \u003e MAX_MONOMER_MODEL_LENGTH:\n",
+        "  if len(sequences[0]) > MAX_MONOMER_MODEL_LENGTH:\n",
         "    raise ValueError(\n",
         "        f'Input sequence is too long: {len(sequences[0])} amino acids, while '\n",
         "        f'the maximum for the monomer model is {MAX_MONOMER_MODEL_LENGTH}. You may '\n",
         "        'be able to run this sequence with the multimer model by selecting the '\n",
         "        'use_multimer_model_for_monomers checkbox above.')\n",
         "    \n",
-        "if total_sequence_length \u003e MAX_VALIDATED_LENGTH:\n",
+        "if total_sequence_length > MAX_VALIDATED_LENGTH:\n",
         "  print('WARNING: The accuracy of the system has not been fully validated '\n",
         "        'above 3000 residues, and you may experience long running times or '\n",
         "        f'run out of memory. Total sequence length is {total_sequence_length} '\n",
@@ -417,7 +419,7 @@
         "]\n",
         "\n",
         "# Search UniProt and construct the all_seq features only for heteromers, not homomers.\n",
-        "if model_type_to_use == ModelType.MULTIMER and len(set(sequences)) \u003e 1:\n",
+        "if model_type_to_use == ModelType.MULTIMER and len(set(sequences)) > 1:\n",
         "  MSA_DATABASES.extend([\n",
         "      # Swiss-Prot and TrEMBL are concatenated together as UniProt.\n",
         "      {'db_name': 'uniprot',\n",
@@ -451,7 +453,7 @@
         "  for sequence_index, sequence in enumerate(sorted(set(sequences)), 1):\n",
         "    fasta_path = f'target_{sequence_index:02d}.fasta'\n",
         "    with open(fasta_path, 'wt') as f:\n",
-        "      f.write(f'\u003equery\\n{sequence}')\n",
+        "      f.write(f'>query\\n{sequence}')\n",
         "    sequence_to_fasta_path[sequence] = fasta_path\n",
         "\n",
         "  # Run the search against chunks of genetic databases (since the genetic\n",
@@ -512,7 +514,7 @@
         "      num_templates=0, num_res=len(sequence)))\n",
         "\n",
         "  # Construct the all_seq features only for heteromers, not homomers.\n",
-        "  if model_type_to_use == ModelType.MULTIMER and len(set(sequences)) \u003e 1:\n",
+        "  if model_type_to_use == ModelType.MULTIMER and len(set(sequences)) > 1:\n",
         "    valid_feats = msa_pairing.MSA_FEATURES + (\n",
         "        'msa_species_identifiers',\n",
         "    )\n",
@@ -676,7 +678,7 @@
         "banded_b_factors = []\n",
         "for plddt in plddts[best_model_name]:\n",
         "  for idx, (min_val, max_val, _) in enumerate(PLDDT_BANDS):\n",
-        "    if plddt \u003e= min_val and plddt \u003c= max_val:\n",
+        "    if plddt >= min_val and plddt <= max_val:\n",
         "      banded_b_factors.append(idx)\n",
         "      break\n",
         "banded_b_factors = np.array(banded_b_factors)[:, None] * final_atom_mask\n",
@@ -689,14 +691,14 @@
         "  f.write(relaxed_pdb)\n",
         "\n",
         "\n",
-        "# --- Visualise the prediction \u0026 confidence ---\n",
+        "# --- Visualise the prediction & confidence ---\n",
         "show_sidechains = True\n",
         "def plot_plddt_legend():\n",
         "  \"\"\"Plots the legend for pLDDT.\"\"\"\n",
-        "  thresh = ['Very low (pLDDT \u003c 50)',\n",
-        "            'Low (70 \u003e pLDDT \u003e 50)',\n",
-        "            'Confident (90 \u003e pLDDT \u003e 70)',\n",
-        "            'Very high (pLDDT \u003e 90)']\n",
+        "  thresh = ['Very low (pLDDT < 50)',\n",
+        "            'Low (70 > pLDDT > 50)',\n",
+        "            'Confident (90 > pLDDT > 70)',\n",
+        "            'Very high (pLDDT > 90)']\n",
         "\n",
         "  colors = [x[2] for x in PLDDT_BANDS]\n",
         "\n",
@@ -812,13 +814,13 @@
         "id": "jeb2z8DIA4om"
       },
       "source": [
-        "## FAQ \u0026 Troubleshooting\n",
+        "## FAQ & Troubleshooting\n",
         "\n",
         "\n",
         "*   How do I get a predicted protein structure for my protein?\n",
         "    *   Click on the _Connect_ button on the top right to get started.\n",
         "    *   Paste the amino acid sequence of your protein (without any headers) into the “Enter the amino acid sequence to fold”.\n",
-        "    *   Run all cells in the Colab, either by running them individually (with the play button on the left side) or via _Runtime_ \u003e _Run all._ Make sure you run all 5 cells in order.\n",
+        "    *   Run all cells in the Colab, either by running them individually (with the play button on the left side) or via _Runtime_ > _Run all._ Make sure you run all 5 cells in order.\n",
         "    *   The predicted protein structure will be downloaded once all cells have been executed. Note: This can take minutes to hours - see below.\n",
         "*   How long will this take?\n",
         "    *   Downloading the AlphaFold source code can take up to a few minutes.\n",
@@ -827,8 +829,8 @@
         "    *   Running AlphaFold and generating the prediction can take minutes to hours, depending on the length of your protein and on which GPU-type Colab has assigned you.\n",
         "*   My Colab no longer seems to be doing anything, what should I do?\n",
         "    *   Some steps may take minutes to hours to complete.\n",
-        "    *   If nothing happens or if you receive an error message, try restarting your Colab runtime via _Runtime_ \u003e _Restart runtime_.\n",
-        "    *   If this doesn’t help, try resetting your Colab runtime via _Runtime_ \u003e _Factory reset runtime_.\n",
+        "    *   If nothing happens or if you receive an error message, try restarting your Colab runtime via _Runtime_ > _Restart runtime_.\n",
+        "    *   If this doesn’t help, try resetting your Colab runtime via _Runtime_ > _Factory reset runtime_.\n",
         "*   How does this compare to the open-source version of AlphaFold?\n",
         "    *   This Colab version of AlphaFold searches a selected portion of the BFD dataset and currently doesn’t use templates, so its accuracy is reduced in comparison to the full version of AlphaFold that is described in the [AlphaFold paper](https://doi.org/10.1038/s41586-021-03819-2) and [Github repo](https://github.com/deepmind/alphafold/) (the full version is available via the inference script).\n",
         "*   What is a Colab?\n",
@@ -837,7 +839,7 @@
         "    *   The resources allocated to your Colab vary. See the [Colab FAQ](https://research.google.com/colaboratory/faq.html) for more details.\n",
         "    *   You can execute the Colab nonetheless.\n",
         "*   I received an error “Colab CPU runtime not supported” or “No GPU/TPU found”, what do I do?\n",
-        "    *   Colab CPU runtime is not supported. Try changing your runtime via _Runtime_ \u003e _Change runtime type_ \u003e _Hardware accelerator_ \u003e _GPU_.\n",
+        "    *   Colab CPU runtime is not supported. Try changing your runtime via _Runtime_ > _Change runtime type_ > _Hardware accelerator_ > _GPU_.\n",
         "    *   The type of GPU allocated to your Colab varies. See the [Colab FAQ](https://research.google.com/colaboratory/faq.html) for more details.\n",
         "    *   If you receive “Cannot connect to GPU backend”, you can try again later to see if Colab allocates you a GPU.\n",
         "    *   [Colab Pro](https://colab.research.google.com/signup) offers priority access to GPUs.\n",