AFDB tools
+This module contains the utility functions for alpha fold db and uniprot.
+-
+
- +src.AFDB_tools.descr(pdb_path) +
Extracts the plddt (in the beta factor column) of the first atom of each residue in a PDB file and returns a descriptive statistics object. +:param pdb_path: The path to the PDB file. +:type pdb_path: str
+
-
+
- +src.AFDB_tools.filter_plddt(pdb_path, thresh=0.6, minthresh=0.5) +
Extracts the plddt (in the beta factor column) of the first atom of each residue in a PDB file and returns bool if the pdb is accepted or not.
+-
+
- Parameters: +
pdb_path (str) – The path to the PDB file.
+
+
-
+
- +src.AFDB_tools.grab_struct(uniID, structfolder, rejected=None, overwrite=False) +
Downloads a protein structure file from the AlphaFold website and saves it to the specified folder.
+Parameters: +uniID (str): The UniProt ID of the protein for which the structure is being downloaded. +structfolder (str): The path to the folder where the structure file should be saved. +overwrite (bool, optional): A flag indicating whether to overwrite an existing file with the same name in the specified folder. Defaults to False.
+Returns: +None: If the file is successfully downloaded or if overwrite is set to True and a file with the same name is found in the specified folder. +str: If an error occurs during the download or if a file with the same name is found in the specified folder and overwrite is set to False.
+Examples: +>>> grab_struct(‘P00533’, ‘/path/to/structures/’) +None +>>> grab_struct(‘P00533’, ‘/path/to/structures/’, overwrite=True) +None
+
-
+
- +src.AFDB_tools.chunk(data, csize) +
-
+
- +src.AFDB_tools.unirequest_tab(name, verbose=False) +
Makes a request to the UniProt API and returns information about a protein in tab-separated format.
+Parameters: +name (str): The name of the protein for which information is being requested. +verbose (bool, optional): A flag indicating whether to print the returned data to the console. Defaults to False.
+Returns: +pd.DataFrame: A DataFrame containing information about the protein, with one row for each hit in the search.
+Examples: +>>> unirequest_tab(‘P00533’)
++
+id … sequence
+0 sp|P00533|1A2K_HUMAN RecName: Full=Alpha-2-… … MPTSVLLLALLLAPAALVHVCRSRFPKCVVLVNVTGLFGN…
+
-
+
- +src.AFDB_tools.grab_entries(ids, verbose=True) +
Makes requests to the UniProt API for information about proteins with the given IDs.
+Parameters: +ids (list): A list of UniProt IDs for the proteins for which information is being requested. +verbose (bool, optional): A flag indicating whether to print the returned data to the console. Defaults to False.
+Returns: +pd.DataFrame: A DataFrame containing information about the proteins, with one row for each hit in the search.
+Examples: +>>> grab_entries([‘P00533’, ‘P15056’])
++
+id … sequence
+0 sp|P00533|1A2K_HUMAN RecName: Full=Alpha-2-… … MPTSVLLLALLLAPAALVHVCRSRFPKCVVLVNVTGLFGN… +1 sp|P15056|1A01_HUMAN RecName: Full=Alpha-1-… … MAAARLLPLLPLLLALALALTETSCPPASQGQRASVGDRV…
+Notes: +This function makes requests to the UniProt API for information about proteins with the given IDs. If a request is successful, the returned data is processed and added to a DataFrame. If a request is unsuccessful, an error message is printed to the console.
+
-
+
- +src.AFDB_tools.res2fasta(unires_df) +
+
+Converts a DataFrame containing protein information into a FASTA format string.
+Parameters: +unires_df (pd.DataFrame): A DataFrame containing information about proteins, with columns ‘query’ and ‘Sequence’ representing the name and sequence of each protein, respectively.
+Returns: +str: A string in FASTA format representing the proteins in the input DataFrame.
+Examples: +>>> unires_df = pd.DataFrame([{‘query’: ‘P00533’, ‘Sequence’: ‘MPTSVLLLALLLAPAALVHVCRSRFPKCVVLVNVTGLFGN’}]) +>>> res2fasta(unires_df) +‘> P00533
+MPTSVLLLALLLAPAALVHVCRSRFPKCVVLVNVTGLFGN +‘
+