extract_data
module
- colav.extract_data.calculate_coverage_matching_scores(reference_strucs, sample_strucs, resnum_bounds, rmsd_threshold=1.0, simultaneous=True, verbose=False)
Calculates the coverage and matching metrics for a sample set of structures/conformational ensemble compared to a reference set of structures/conformational ensemble.
The coverage and matching metrics used are defined by Xu et al. (2021) ICLR. The coverage metric measures the diversity of the sample set compared to the reference set. The matching metric measures the similarity of the sample set to the reference set.
Parameters:
- reference_strucslist of str
Array containing the file paths to reference structures.
- sample_strucslist of str
Array containing the file paths to sample/generated structures.
- resnum_boundstuple
Tuple containing the minimum and maximum (inclusive) residue number values.
- rmsd_thresholdfloat
Minimum value for two structures to be considered similar.
- simultaneousbool
Indicator for simultaneous calculation of RMSD; otherwise, calculations are sequential.
- verbosebool
Indicator for verbose output.
Returns:
- coveragefloat
Coverage score that compares the diversity of the supplied conformational ensembles.
- matchingfloat
Matching score that compares the similarity of the supplied conformational ensembles.
- colav.extract_data.calculate_dh_pw(i, j, u_pca, pw_pca, resnum_bounds, psi_idx, phi_idx, omg_idx)
- colav.extract_data.calculate_dh_rc(raw_dh_loading, quadrature=False)
Adjusts raw dihedral loading for interpretability.
Calculates a residue contribution from a raw loading of dihedral angle features to account for the application of sine and cosine functions.
Returns a residue contribution.
Parameters:
- raw_dh_loadingarray_like, (N,)
Array of raw loading from PCA.
- quadraturebool, optional
Indicator to calculate residue contributions in quadrature.
Returns:
- tranformed_dh_loadingarray_like, (N/2,)
Array of residue contribution to determine relative angle influence in the given loading.
- colav.extract_data.calculate_dh_sa(i, j, u_pca, sa_pca, shared_atom_set, psi_idx, phi_idx, omg_idx)
- colav.extract_data.calculate_pw_rc(raw_pw_loading, resnum_bounds)
Adjusts raw pairwise distance loading for interpretability.
Calculates a residue contribution from a raw loading of pairwise distance features to account for all pairings of residues.
Returns a residue contribution.
Parameters:
- raw_pw_loadingarray_like, (N,)
Array of raw loading from PCA.
- resnum_boundstuple
Tuple containing the minimum and maximum (inclusive) residue number values.
Returns:
- tranformed_pw_loadingarray_like, (N/2,)
Array of residue contribution to determine relative residue influence in the given loading.
- colav.extract_data.calculate_pw_sa(i, j, pw_pca, sa_pca, resnum_bounds, shared_atom_set)
- colav.extract_data.calculate_sa_rc(raw_sa_loading, shared_atom_list)
Adjusts raw strain or shear loading for interpretability.
Calculates a residue contribution from a raw loading of strain or shear tensor features.
Returns a residue contribution.
Parameters:
- raw_sa_loadingarray_like
Array of raw loading from PCA.
- shared_atom_listarray_like
Sorted list of shared atoms between all structures used for strain analysis.
Returns:
- tranformed_sa_loadingarray_like
Array of residue contribution to determine relative residue influence in the given loading.
- colav.extract_data.generate_dihedral_matrix(structure_list, resnum_bounds, no_psi=False, no_omega=False, no_phi=False, save=False, save_prefix=None, verbose=False)
Extracts dihedrals angles from given structures.
Extracts and returns a data matrix of (observations x features) with the given structures as observations and the linearized dihedral angles (by applying sine and cosine functions) as features. Cannot handle missing coordinates and skips structures with missing backbone atoms within the given residue numbers.
Parameters:
- structure_listlist of str
Array containing the file paths to PDB structures.
- resnum_boundstuple
Tuple containing the minimum and maximum (inclusive) residue number values.
- no_psibool, optional
Indicator to exclude psi dihedral angle from returned dihedral angles.
- no_omegabool, optional
Indicator to exclude omega dihedral angle from returned dihedral angles.
- no_phibool, optional
Indicator to exclude phi dihedral angle from returned dihedral angles.
- savebool, optional
Indicator to save results.
- save_prefixstr
If saving results, prefix for pickle save file.
- verbosebool, optional
Indicator for verbose output.
Returns:
- dh_data_matrixarray_like
Array containing dihedral angles within resnum_bounds for structures in structure_list, excluding structures missing desired atoms.
- dh_strucslist of str
List of structures ordered as stored in dh_data_matrix.
- colav.extract_data.generate_pw_matrix(structure_list, resnum_bounds, save=False, save_prefix=None, verbose=False)
Extracts pairwise distances from given structures.
Extracts and returns a data matrix of (observations x features) with the given structures as observations and the pairwise distances between alpha carbon (CA) atoms as features. Cannot handle missing coordinates and skips structures with missing CA atoms within the given residue numbers.
Parameters:
- structure_listlist of str
Array containing the file paths to PDB structures.
- resnum_boundstuple
Tuple containing the minimum and maximum (inclusive) residue number values.
- savebool, optional
Indicator to save results.
- save_prefixstr
If saving results, prefix for pickle save file.
- verbosebool, optional
Indicator for verbose output.
Returns:
- pw_data_matrixarray_like
array containing pairwise distances between desired CA atoms for structures in structure_list, excluding structures missing desired atoms.
- pw_strucslist of str
List of structures ordered as stored in pw_data_matrix.
- colav.extract_data.generate_strain_matrix(structure_list, reference_pdb, data_type, resnum_bounds, atoms=['N', 'C', 'CA', 'CB', 'O'], alt_locs=['', 'A'], save=True, save_prefix=None, save_additional=False, verbose=False)
Extracts strain tensors, shear tensors, or shear energies from given structures.
Extracts and returns a data matrix of (observations x features) with the given structures as observations and strain tensors, shear tensors, or shear energies. For tensor features, only the off-diagonal elements are included. Cannot handle missing coordinates and skips structures with missing backbone atoms within the given residue numbers.
Parameters:
- structure_listlist of str
Array containing the file paths to PDB structures.
- reference_pdbstr
File path to the reference PDB structure; this structure can be contained in structure_list.
- data_type{‘straint’, ‘sheart’, ‘sheare’}
Indicator for type of data to build data matrix.
- resnum_boundstuple
Tuple containing the minimum and maximum (inclusive) residue number values.
- atomsarray_like, optional
Array containing atom names.
- alt_locsarray_like, optional
Array containing alternate location names.
- savebool, optional
Indicator to save results.
- save_prefixstr
If saving results, prefix for pickle save file.
- save_additionalbool, optional
Indicator to save results of nested calculations.
- verbosebool, optional
Indicator for verbose output.
Returns:
- sa_data_matrixarray_like
Array containing strain or shear tensor information structures in structure_list, excluding structures missing desired atoms.
- sa_strucslist of str
List of structures ordered as stored in the sa_data_matrix.
- colav.extract_data.load_dihedral_matrix(dh_pkl)
Loads the dihedral data matrix and corresponding structures
Loads a dictionary containing the dihedral data matrix as data_matrix key and the corresponding structures as structures key
Parameters:
- dh_pklstr
File path to the dihedral dictionary pickle file.
Returns:
- dh_data_matrixarray_like
Array containing dihedral angles as calculated by generate_dihedral_matrix.
- dh_strucslist of str
List of structures ordered as stored in dh_data_matrix.
- colav.extract_data.load_pw_matrix(pw_pkl)
Loads the pairwise distance data matrix and corresponding structures
Loads a dictionary containing the pairwise distance data matrix as data_matrix key and the corresponding structures as structures key
Parameters:
- pw_pklstr
File path to the pairwise distance dictionary pickle file.
Returns:
- pw_data_matrixarray_like
Array containing dihedral angles as calculated by generate_pw_matrix.
- pw_strucslist of str
List of structures ordered as stored in pw_data_matrix.
- colav.extract_data.load_strain_matrix(strain_pkl)
Loads the strain data matrix and corresponding structures
Loads a dictionary containing the strain data matrix as data_matrix key and the corresponding structures as structures key
Parameters:
- strain_pklstr
File path to the strain dictionary pickle file.
Returns:
- sa_data_matrixarray_like
Array containing dihedral angles as calculated by generate_sa_matrix.
- sa_strucslist of str
List of structures ordered as stored in sa_data_matrix.