+
+ +
+

AFDB tools

+

This module contains the utility functions for alpha fold db and uniprot.

+
+
+src.AFDB_tools.descr(pdb_path)
+

Extracts the plddt (in the beta factor column) of the first atom of each residue in a PDB file and returns a descriptive statistics object. +:param pdb_path: The path to the PDB file. +:type pdb_path: str

+
+ +
+
+src.AFDB_tools.filter_plddt(pdb_path, thresh=0.6, minthresh=0.5)
+

Extracts the plddt (in the beta factor column) of the first atom of each residue in a PDB file and returns bool if the pdb is accepted or not.

+
+
Parameters:
+

pdb_path (str) – The path to the PDB file.

+
+
+
+ +
+
+src.AFDB_tools.grab_struct(uniID, structfolder, rejected=None, overwrite=False)
+

Downloads a protein structure file from the AlphaFold website and saves it to the specified folder.

+

Parameters: +uniID (str): The UniProt ID of the protein for which the structure is being downloaded. +structfolder (str): The path to the folder where the structure file should be saved. +overwrite (bool, optional): A flag indicating whether to overwrite an existing file with the same name in the specified folder. Defaults to False.

+

Returns: +None: If the file is successfully downloaded or if overwrite is set to True and a file with the same name is found in the specified folder. +str: If an error occurs during the download or if a file with the same name is found in the specified folder and overwrite is set to False.

+

Examples: +>>> grab_struct(‘P00533’, ‘/path/to/structures/’) +None +>>> grab_struct(‘P00533’, ‘/path/to/structures/’, overwrite=True) +None

+
+ +
+
+src.AFDB_tools.chunk(data, csize)
+
+ +
+
+src.AFDB_tools.unirequest_tab(name, verbose=False)
+

Makes a request to the UniProt API and returns information about a protein in tab-separated format.

+

Parameters: +name (str): The name of the protein for which information is being requested. +verbose (bool, optional): A flag indicating whether to print the returned data to the console. Defaults to False.

+

Returns: +pd.DataFrame: A DataFrame containing information about the protein, with one row for each hit in the search.

+

Examples: +>>> unirequest_tab(‘P00533’)

+
+

id … sequence

+
+

0 sp|P00533|1A2K_HUMAN RecName: Full=Alpha-2-… … MPTSVLLLALLLAPAALVHVCRSRFPKCVVLVNVTGLFGN…

+
+ +
+
+src.AFDB_tools.grab_entries(ids, verbose=True)
+

Makes requests to the UniProt API for information about proteins with the given IDs.

+

Parameters: +ids (list): A list of UniProt IDs for the proteins for which information is being requested. +verbose (bool, optional): A flag indicating whether to print the returned data to the console. Defaults to False.

+

Returns: +pd.DataFrame: A DataFrame containing information about the proteins, with one row for each hit in the search.

+

Examples: +>>> grab_entries([‘P00533’, ‘P15056’])

+
+

id … sequence

+
+

0 sp|P00533|1A2K_HUMAN RecName: Full=Alpha-2-… … MPTSVLLLALLLAPAALVHVCRSRFPKCVVLVNVTGLFGN… +1 sp|P15056|1A01_HUMAN RecName: Full=Alpha-1-… … MAAARLLPLLPLLLALALALTETSCPPASQGQRASVGDRV…

+

Notes: +This function makes requests to the UniProt API for information about proteins with the given IDs. If a request is successful, the returned data is processed and added to a DataFrame. If a request is unsuccessful, an error message is printed to the console.

+
+ +
+
+src.AFDB_tools.res2fasta(unires_df)
+
+

Converts a DataFrame containing protein information into a FASTA format string.

+

Parameters: +unires_df (pd.DataFrame): A DataFrame containing information about proteins, with columns ‘query’ and ‘Sequence’ representing the name and sequence of each protein, respectively.

+

Returns: +str: A string in FASTA format representing the proteins in the input DataFrame.

+

Examples: +>>> unires_df = pd.DataFrame([{‘query’: ‘P00533’, ‘Sequence’: ‘MPTSVLLLALLLAPAALVHVCRSRFPKCVVLVNVTGLFGN’}]) +>>> res2fasta(unires_df) +‘> P00533

+
+

MPTSVLLLALLLAPAALVHVCRSRFPKCVVLVNVTGLFGN +‘

+
+ +
+ + +
+