Skip to content

Usage

Features in a tomogram that resemble a structural 'template' can be localized in an automated fashion using 'template matching'. In this approach a 3D template is correlated with a given tomogram. In this procedure the different possible rotations and translations are sampled exhaustively using the algorithm described in Förster et al. (2010).

Requirements

For usage you need at least a set of reconstructed tomograms in the MRC format and a template structure in the MRC format. Tomograms in IMOD format (.rec) are also okay but need to be renamed (or softlinked!) to have the correct extension (.mrc). Tomograms are ideally binned 6x or 8x to prevent excessive runtimes. The template can be an EM reconstruction (from the EMDB) or a PDB that was coverted to a density (for example via Chimera molmap).

Template matching workflow

Using template matching in this software consists of the following steps:

  1. Creating a template and mask
  2. Matching the template in a tomogram
  3. Extracting particles
  4. Merging annotations for export to other software

1. Creating a template and mask

Keep in mind:

  • The template and mask need to have the same box size.
  • The template needs to have the same contrast as the tomogram (e.g. the particles are black in both the tomogram and template). Contrast can be adjusted with the --invert option.

pytom_create_template.py

Using an EM map as a reference structure generally leads to the best results. Alternatively a structure from the PDB can be converted in Chimera(X) using the molmap command to create an MRC file that models the electrostatic potential. A good ballpark for the box size of the template is 2 or 3 times the particle diameter (along its longest axis).

usage: pytom_create_template.py [-h] -i INPUT_MAP [-o OUTPUT_FILE]
                                [--input-voxel-size-angstrom INPUT_VOXEL_SIZE_ANGSTROM]
                                --output-voxel-size-angstrom
                                OUTPUT_VOXEL_SIZE_ANGSTROM [--center CENTER]
                                [--low-pass LOW_PASS] [-b BOX_SIZE]
                                [--invert INVERT] [-m MIRROR] [--log LOG]

Generate template from MRC density. -- Marten Chaillet (@McHaillet)

options:
  -h, --help            show this help message and exit
  -i INPUT_MAP, --input-map INPUT_MAP
                        Map to generate template from; MRC file.
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        Provide path to write output, needs to end in .mrc .
                        If not provided file is written to current directory
                        in the following format:
                        template_{input_map.stem}_{voxel_size}A.mrc
  --input-voxel-size-angstrom INPUT_VOXEL_SIZE_ANGSTROM
                        Voxel size of input map, in Angstrom. If not provided
                        will be read from MRC input (so make sure it is
                        annotated correctly!).
  --output-voxel-size-angstrom OUTPUT_VOXEL_SIZE_ANGSTROM
                        Output voxel size of the template, in Angstrom. Needs
                        to be equal to the voxel size of the tomograms for
                        template matching. Input map will be downsampled to
                        this spacing.
  --center CENTER       Set this flag to automatically center the density in
                        the volume by measuring the center of mass.
  --low-pass LOW_PASS   Apply a low pass filter to this resolution, in
                        Angstrom. By default a low pass filter is applied to a
                        resolution of (2 * output_spacing_angstrom) before
                        downsampling the input volume.
  -b BOX_SIZE, --box-size BOX_SIZE
                        Specify a desired size for the output box of the
                        template. Only works if it is larger than the
                        downsampled box size of the input.
  --invert INVERT       Multiply template by -1. WARNING: not needed if ctf
                        with defocus is already applied!
  -m MIRROR, --mirror MIRROR
                        Mirror the final template before writing to disk.
  --log LOG             Can be set to `info` or `debug`

pytom_create_mask.py

The mask around the template can be quite tight to remove as much noise as possible around the particles of interest. We recommend around 10%-20% overhang relative to the particle radius. You can also generate an ellipsoidal mask for particles that do not approximate well as a sphere. Though you will probably need to reorient this mask in chimera and resample to the grid of the template. Optionally you could also create a structured mask around the template in external software (via thresholding and dilation for example). Take into account that non-spherical masks roughly double the template matching computation time.

usage: pytom_create_mask.py [-h] -b BOX_SIZE [-o OUTPUT_FILE]
                            [--voxel-size VOXEL_SIZE] -r RADIUS
                            [--radius-minor1 RADIUS_MINOR1]
                            [--radius-minor2 RADIUS_MINOR2] [-s SIGMA]

Create a mask for template matching. -- Marten Chaillet (@McHaillet)

options:
  -h, --help            show this help message and exit
  -b BOX_SIZE, --box-size BOX_SIZE
                        Shape of square box for the mask.
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        Provide path to write output, needs to end in .mrc .If
                        not provided file is written to current directory in
                        the following format:
                        ./mask_b[box_size]px_r[radius]px.mrc
  --voxel-size VOXEL_SIZE
                        Provide a voxel size to annotate the MRC (currently
                        not used for any mask calculation).
  -r RADIUS, --radius RADIUS
                        Radius of the spherical mask in number of pixels. In
                        case minor1 and minor2 are provided, this will be the
                        radius of the ellipsoidal mask along the x-axis.
  --radius-minor1 RADIUS_MINOR1
                        Radius of the ellipsoidal mask along the y-axis in
                        number of pixels.
  --radius-minor2 RADIUS_MINOR2
                        Radius of the ellipsoidal mask along the z-axis in
                        number of pixels.
  -s SIGMA, --sigma SIGMA
                        Sigma of gaussian drop-off around the mask edges in
                        number of pixels. Values in the range from 0.5-1.0 are
                        usually sufficient for tomograms with 20A-10A voxel
                        sizes.

2. Matching the template in a tomogram

pytom_match_template.py

This script requires at least a tomogram, a template, a mask, the min and max tilt angles (for missing wedge constraint), an angular search, and a GPU index to run. The search can be limited along any dimension with the --search-x, --search-y, and --search-z parameters; for example to skip some empty regions in the z-dimension where the ice layer is not yet present, or to remove some reconstruction artifact region along the x-dimension. With the --volume-split option, a tomogram can be split into chunks to allow them to fit in GPU memory (useful for large tomograms). Providing multiple GPU's will allow the program to split the angular search (or the subvolume search) over multiple cards to speed up the algorithm.

The software automatically calculates the angular search based on the available resolution and provided particle diameter. The required search is found from the Crowther criterion \(\Delta \alpha = \frac{180}{\pi r_{max} d}\). For the maximal resolution the voxel size is used, unless a low-pass filter is specified as this limits the available maximal resolution. You can exploit this to reduce the angular search! For non-spherical particles we suggest choosing the particle diameter as the longest axis of the macromolecule.

In case the template matching is run with a non-spherical mask, it is essential to set the --non-spherical-mask flag. It requires a slight modification of the calculation that will roughly double the computation time, so only use non-spherical masks if absolutely necessary.

Optimizing results: per tilt weighting with CTFs and dose accumulation

Optimal results are obtained by also incorporating information for the 3D CTF. You can pass the following files (and parameters):

  • Tilt angles: a .rawtlt or .tlt file to the --tilt-angles parameter with all the tilt angles used to reconstruct the tomogram. You should then also set the --per-tilt-weighting flag.
  • CTF data: a .defocus file from IMOD or .txt file to --defocus-file. The . txt file should specify the defocus of each tilt in \(\mu m\). You can also give a single defocus value (in \(\mu m\)). The CTF will also require input for --voltage, --amplitude-contrast, and --spherical-abberation.
  • Dose weighting: a .txt file to --dose-accumulation with the accumulated dose per tilt (assuming the same ordering as .tlt). Each line contains a single float specifying the accumulated dose in \(e^{-}/\text{Å}^{2}\). Dose weighting only works in combination with --per-tilt-weighting.

(As a side note, you can also only enable --per-tilt-weighting without dose accumulation and CTFs, or with either dose accumulation or CTFs.)

When enabling the CTF model here (with the defocus file), it is important that the template is not multiplied with a CTF before passing it to this script. The template only needs to be scaled to the correct pixel size and the contrast should be adjusted to match the contrast in the tomograms.

Secondly, if the tomogram was CTF corrected, for example by using IMODs strip-based CTF correction or NovaCTF. Its important to add the parameter --tomogram-ctf-model phase-flip which modifies the template CTF to match the tomograms CTF correction.

Background corrections

The software contains two background correction methods that might improve results: --spectral-whitening or --random-phase-correction (from STOPGAP). In our experience the random phase correction is most reliable, while spectral whitening never seemed to clearly improve results.

usage: pytom_match_template.py [-h] -t TEMPLATE -v TOMOGRAM [-d DESTINATION]
                               -m MASK
                               [--non-spherical-mask NON_SPHERICAL_MASK]
                               [--particle-diameter PARTICLE_DIAMETER]
                               [--angular-search ANGULAR_SEARCH]
                               [--z-axis-rotational-symmetry Z_AXIS_ROTATIONAL_SYMMETRY]
                               [-s VOLUME_SPLIT VOLUME_SPLIT VOLUME_SPLIT]
                               [--search-x SEARCH_X SEARCH_X]
                               [--search-y SEARCH_Y SEARCH_Y]
                               [--search-z SEARCH_Z SEARCH_Z]
                               [--tomogram-mask TOMOGRAM_MASK] -a TILT_ANGLES
                               [TILT_ANGLES ...]
                               [--per-tilt-weighting PER_TILT_WEIGHTING]
                               [--voxel-size-angstrom VOXEL_SIZE_ANGSTROM]
                               [--low-pass LOW_PASS] [--high-pass HIGH_PASS]
                               [--dose-accumulation DOSE_ACCUMULATION]
                               [--defocus DEFOCUS]
                               [--amplitude-contrast AMPLITUDE_CONTRAST]
                               [--spherical-aberration SPHERICAL_ABERRATION]
                               [--voltage VOLTAGE] [--phase-shift PHASE_SHIFT]
                               [--tomogram-ctf-model {phase-flip}]
                               [--defocus-handedness {-1,0,1}]
                               [--spectral-whitening SPECTRAL_WHITENING]
                               [-r RANDOM_PHASE_CORRECTION]
                               [--half-precision HALF_PRECISION]
                               [--rng-seed RNG_SEED] -g GPU_IDS [GPU_IDS ...]
                               [--log LOG]

Run template matching. -- Marten Chaillet (@McHaillet)

options:
  -h, --help            show this help message and exit

Template, search volume, and output:
  -t TEMPLATE, --template TEMPLATE
                        Template; MRC file. Object should match the contrast
                        of the tomogram: if the tomogram has black ribosomes,
                        the reference should be black.
                        (pytom_create_template.py has an option to invert
                        contrast)
  -v TOMOGRAM, --tomogram TOMOGRAM
                        Tomographic volume; MRC file.
  -d DESTINATION, --destination DESTINATION
                        Folder to store the files produced by template
                        matching.

Mask:
  -m MASK, --mask MASK  Mask with same box size as template; MRC file.
  --non-spherical-mask NON_SPHERICAL_MASK
                        Flag to set when the mask is not spherical. It adds
                        the required computations for non-spherical masks and
                        roughly doubles computation time.

Angular search:
  --particle-diameter PARTICLE_DIAMETER
                        Provide a particle diameter (in Angstrom) to
                        automatically determine the angular sampling using the
                        Crowther criterion. For the max resolution, (2 * pixel
                        size) is used unless a low-pass filter is specified,
                        in which case the low-pass resolution is used. For
                        non-globular macromolecules choose the diameter along
                        the longest axis.
  --angular-search ANGULAR_SEARCH
                        This option overrides the angular search calculation
                        from the particle diameter. If given a float it will
                        generate an angle list with healpix for Z1 and X1 and
                        linear search for Z2. The provided angle will be used
                        as the maximum for the linear search and for the mean
                        angle difference from healpix.Alternatively, a .txt
                        file can be provided with three Euler angles (in
                        radians) per line that define the angular search.
                        Angle format is ZXZ anti-clockwise (see: https://www.c
                        cpem.ac.uk/user_help/rotation_conventions.php).
  --z-axis-rotational-symmetry Z_AXIS_ROTATIONAL_SYMMETRY
                        Integer value indicating the rotational symmetry of
                        the template around the z-axis. The length of the
                        rotation search will be shortened through division by
                        this value. Only works for template symmetry around
                        the z-axis.

Volume control:
  -s VOLUME_SPLIT VOLUME_SPLIT VOLUME_SPLIT, --volume-split VOLUME_SPLIT VOLUME_SPLIT VOLUME_SPLIT
                        Split the volume into smaller parts for the search,
                        can be relevant if the volume does not fit into GPU
                        memory. Format is x y z, e.g. --volume-split 1 2 1
  --search-x SEARCH_X SEARCH_X
                        Start and end indices of the search along the x-axis,
                        e.g. --search-x 10 490
  --search-y SEARCH_Y SEARCH_Y
                        Start and end indices of the search along the y-axis,
                        e.g. --search-x 10 490
  --search-z SEARCH_Z SEARCH_Z
                        Start and end indices of the search along the z-axis,
                        e.g. --search-x 30 230
  --tomogram-mask TOMOGRAM_MASK
                        Here you can provide a mask for matching with
                        dimensions (in pixels) equal to the tomogram. If a
                        subvolume only has values <= 0 for this mask it will
                        be skipped.

Filter control:
  -a TILT_ANGLES [TILT_ANGLES ...], --tilt-angles TILT_ANGLES [TILT_ANGLES ...]
                        Tilt angles of the tilt-series, either the minimum and
                        maximum values of the tilts (e.g. --tilt-angles -59.1
                        60.1) or a .rawtlt/.tlt file with all the angles (e.g.
                        --tilt-angles tomo101.rawtlt). In case all the tilt
                        angles are provided a more elaborate Fourier space
                        constraint can be used
  --per-tilt-weighting PER_TILT_WEIGHTING
                        Flag to activate per-tilt-weighting, only makes sense
                        if a file with all tilt angles have been provided. In
                        case not set, while a tilt angle file is provided, the
                        minimum and maximum tilt angle are used to create a
                        binary wedge. The base functionality creates a fanned
                        wedge where each tilt is weighted by cos(tilt_angle).
                        If dose accumulation and CTF parameters are provided
                        these will all be incorporated in the tilt-weighting.
  --voxel-size-angstrom VOXEL_SIZE_ANGSTROM
                        Voxel spacing of tomogram/template in angstrom, if not
                        provided will try to read from the MRC files. Argument
                        is important for band-pass filtering!
  --low-pass LOW_PASS   Apply a low-pass filter to the tomogram and template.
                        Generally desired if the template was already filtered
                        to a certain resolution. Value is the resolution in A.
  --high-pass HIGH_PASS
                        Apply a high-pass filter to the tomogram and template
                        to reduce correlation with large low frequency
                        variations. Value is a resolution in A, e.g. 500 could
                        be appropriate as the CTF is often incorrectly
                        modelled up to 50nm.
  --dose-accumulation DOSE_ACCUMULATION
                        Here you can provide a file that contains the
                        accumulated dose at each tilt angle, assuming the same
                        ordering of tilts as the tilt angle file. Format
                        should be a .txt file with on each line a dose value
                        in e-/A2.
  --defocus DEFOCUS     Here you can provide an IMOD defocus (.defocus) file
                        (version 2 or 3) , a text (.txt) file with a single
                        defocus value per line (in μm), or a single defocus
                        value (in μm). The value(s), together with the other
                        ctf parameters (amplitude contrast, voltage, spherical
                        abberation), will be used to create a 3D CTF weighting
                        function. IMPORTANT: if you provide this, the input
                        template should not be modulated with a CTF
                        beforehand. If it is a reconstruction it should
                        ideally be Wiener filtered.
  --amplitude-contrast AMPLITUDE_CONTRAST
                        Amplitude contrast fraction for CTF.
  --spherical-aberration SPHERICAL_ABERRATION
                        Spherical aberration for CTF in mm.
  --voltage VOLTAGE     Voltage for CTF in keV.
  --phase-shift PHASE_SHIFT
                        Phase shift (in degrees) for the CTF to model phase
                        plates.
  --tomogram-ctf-model {phase-flip}
                        Optionally, you can specify if and how the CTF was
                        corrected during reconstruction of the input tomogram.
                        This allows match-pick to match the weighting of the
                        template to the tomogram. Not using this option is
                        appropriate if the CTF was left uncorrected in the
                        tomogram. Option 'phase-flip' : appropriate for IMOD's
                        strip-based phase flipping or reconstructions
                        generated with novaCTF/3dctf.
  --defocus-handedness {-1,0,1}
                        Specify the defocus handedness for defocus gradient
                        correction of the CTF in each subvolumes. The more
                        subvolumes in x and z, the finer the defocus gradient
                        will be corrected, at the cost of increased computing
                        time. It will only have effect for very clean and
                        high-resolution data, such as isolated macromolecules.
                        IMPORTANT: only works in combination with --volume-
                        split ! A value of 0 means no defocus gradient
                        correction (default), 1 means correction assuming
                        correct handedness (as specified in Pyle and Zianetti
                        (2021)), -1 means the handedness will be inverted. If
                        uncertain better to leave off as an inverted
                        correction might hamper results.
  --spectral-whitening SPECTRAL_WHITENING
                        Calculate a whitening filtering from the power
                        spectrum of the tomogram; apply it to the tomogram
                        patch and template. Effectively puts more weight on
                        high resolution features and sharpens the correlation
                        peaks.

Additional options:
  -r RANDOM_PHASE_CORRECTION, --random-phase-correction RANDOM_PHASE_CORRECTION
                        Run template matching simultaneously with a phase
                        randomized version of the template, and subtract this
                        'noise' map from the final score map. For this method
                        please see STOPGAP as a reference:
                        https://doi.org/10.1107/S205979832400295X .
  --half-precision HALF_PRECISION
                        Return and save all output in float16 instead of the
                        default float32
  --rng-seed RNG_SEED   Specify a seed for the random number generator used
                        for phase randomization for consistent results!

Device control:
  -g GPU_IDS [GPU_IDS ...], --gpu-ids GPU_IDS [GPU_IDS ...]
                        GPU indices to run the program on.

Logging/debugging:
  --log LOG             Can be set to `info` or `debug`

3. Extracting particles

Both scripts run on the job file created in pytom_match_template.py which contains details about correlation statistics and the output files. The job file always has the format [TOMO_ID]_job.json.

IMPORTANT For both scripts the [-r, --radius-px] option needs to be considered carefully. The particle extraction will mask out spheres with this radius around each peak in the score volume and prevents selecting the same macromolecule twice. It is specified as an integer number of pixels (not Angstrom!) and ideally it should be the radius of the particle of interest. It can be found by dividing the particle radius by the pixel size, e.g. a ribosome (r = 290Å / 2) in a 15Å tomogram should gets a pixel radius of 9.6. As it needs to be an integer value and ribosomes are not perfect spheres, it is best to round it down to 9 pixels.

pytom_extract_candidates.py

STAR file metadata

Resulting STAR files from extraction have three colums with extraction statistics (LCCmax, CutOff, SearchStd). Dividing the LCCmax and the CutOff by the SearchStd, will express them as a number of \(\sigma\) or (3D SNR; similar to Rickgauer et al. (2017).

STAR files written out by the template matching module will have RELION compliant column headers, i.e. rlnCoordinateX and rlnAgleRot, to simplify integration with other software. The Euler angles that are written out therefore also follow the same conventions as RELION and Warp, i.e. rlnAngleRot, rlnAngleTilt, rlnAnglePsi are intrinsic clockwise ZYZ Euler angles. Hence they can be directly used for subtomogram averaging in RELION. See here for more info: https://www.ccpem.ac.uk/user_help/rotation_conventions.php.

Please see the For developers section for more details on the metadata.

Default true positive estimation

The particle extraction has been updated to use the formula in Rickgauer et al. (2017) for finding the extraction threshold based on the false alarm rate. This was not yet described in our IJMS publication but is essentially very similar to the Gaussian fit that we used. However, it is more reliable and also specific to the standard deviation \(\sigma\) of the search in each tomogram. pytom_match_template.py keeps track of \(\sigma\) and stores it in the job file. The user can specify a number of false positives to allow per tomogram with a minimum value of 1. It can be increased to make the extraction threshold more lenient which might increase the number of true positives at the expense of more false positives. The parameter should roughly correspond to the number of false positives that end up in the extracted particle list.

Template matching has a huge search space \(N_{voxels} * N_{rotations}\) which is mainly false positives, and has in comparison a tiny fraction of true positives. If we have a Gaussian for the background (with expected mean 0 and some standard deviation), the false alarm rate can be calculated for a certain cut-off value, as it is dependent on the size of the search space. For example, a false alarm rate of \((N_ {voxels} * N_{rotations})^{-1}\), indicates it would expect 1 false positive in the whole search. This can be calculated with the error function,

\[N^{-1} = \text{erfc}( \theta / ( \sigma \sqrt{2} ) ) / (2 n_{\text{FP}})\]

, where theta is the cut-off, sigma the standard deviation of the Gaussian, and N the search space. \(n_{\text{FP}}\) represents the scaling by the user of tolerated number of false positives.

Tophat transform filter

This option can be used to filter the score map for sharp peaks (steep local maxima) which usually correspond to true positives. This will be described in a forthcoming publication. For now, you can check out Marten's poster at CCPEM that shows some preliminary results: 10.5281/zenodo.13165643.

usage: pytom_extract_candidates.py [-h] -j JOB_FILE
                                   [--tomogram-mask TOMOGRAM_MASK]
                                   [--ignore_tomogram_mask IGNORE_TOMOGRAM_MASK]
                                   -n NUMBER_OF_PARTICLES
                                   [--number-of-false-positives NUMBER_OF_FALSE_POSITIVES]
                                   -r RADIUS_PX [-c CUT_OFF]
                                   [--tophat-filter TOPHAT_FILTER]
                                   [--tophat-connectivity TOPHAT_CONNECTIVITY]
                                   [--relion5-compat RELION5_COMPAT]
                                   [--log LOG] [--tophat-bins TOPHAT_BINS]
                                   [--plot-bins PLOT_BINS]

Run candidate extraction. -- Marten Chaillet (@McHaillet)

options:
  -h, --help            show this help message and exit
  -j JOB_FILE, --job-file JOB_FILE
                        JSON file that contain all data on the template
                        matching job, written out by pytom_match_template.py
                        in the destination path.
  --tomogram-mask TOMOGRAM_MASK
                        Here you can provide a mask for the extraction with
                        dimensions (in pixels) equal to the tomogram. All
                        values in the mask that are smaller or equal to 0 will
                        be removed, all values larger than 0 are considered
                        regions of interest. It can be used to extract
                        annotations only within a specific cellular region. If
                        the job was run with a tomogram mask, this file will
                        be used instead of the job mask
  --ignore_tomogram_mask IGNORE_TOMOGRAM_MASK
                        Flag to ignore the input and TM job tomogram mask.
                        Useful if the scores mrc looks reasonable, but this
                        finds 0 particles to extract
  -n NUMBER_OF_PARTICLES, --number-of-particles NUMBER_OF_PARTICLES
                        Maximum number of particles to extract from tomogram.
  --number-of-false-positives NUMBER_OF_FALSE_POSITIVES
                        Number of false positives to determine the false alarm
                        rate. Here one can increase the recall of the particle
                        of interest at the expense of more false positives.
                        The default value of 1 is recommended for particles
                        that can be distinguished well from the background
                        (high specificity). The value can also be set between
                        0 and 1 to make the cut-off more restrictive.
  -r RADIUS_PX, --radius-px RADIUS_PX
                        Particle radius in pixels in the tomogram. It is used
                        during extraction to remove areas around peaks
                        preventing double extraction.
  -c CUT_OFF, --cut-off CUT_OFF
                        Override automated extraction cutoff estimation and
                        instead extract the number-of-particles down to this
                        LCCmax value. Setting to 0 will keep extracting until
                        number-of-particles, or until there are no positive
                        values left in the score map. Values larger than 1
                        make no sense as the correlation cannot be higher than
                        1.
  --tophat-filter TOPHAT_FILTER
                        Attempt to filter only sharp correlation peaks with a
                        tophat transform
  --tophat-connectivity TOPHAT_CONNECTIVITY
                        Set kernel connectivity for ndimage binary structure
                        used for the tophat transform. Integer value in range
                        1-3. 1 is the most restrictive, 3 the least
                        restrictive. Generally recommended to leave at 1.
  --relion5-compat RELION5_COMPAT
                        Write out centered coordinates in Angstrom for
                        RELION5.
  --log LOG             Can be set to `info` or `debug`
  --tophat-bins TOPHAT_BINS
                        Number of bins to use in the histogram of occurences
                        in the tophat transform code (for both the estimation
                        and the plotting).
  --plot-bins PLOT_BINS
                        Number of bins to use for the occurences vs LCC_max
                        plot.

pytom_estimate_roc.py

This script runs the Gaussian fit as described in the IJMS publication. It requires installation with plotting dependencies as it writes out or displays a figure showing the Gaussian fit and estimated ROC curve. The benefit is that it estimates some classification statistics (such as false discovery rate and sensitivity). You can use it to esimate an extraction threshold for a representative tomogram and then supply this threshold as the [-c, --cut-off] parameter for pytom_extract_candidates.py.

usage: pytom_estimate_roc.py [-h] -j JOB_FILE -n NUMBER_OF_PARTICLES -r
                             RADIUS_PX [--bins BINS]
                             [--gaussian-peak GAUSSIAN_PEAK]
                             [--force-peak FORCE_PEAK] [--crop-plot CROP_PLOT]
                             [--show-plot SHOW_PLOT] [--log LOG]
                             [--ignore_tomogram_mask IGNORE_TOMOGRAM_MASK]

Estimate ROC curve from TMJob file. -- Marten Chaillet (@McHaillet)

options:
  -h, --help            show this help message and exit
  -j JOB_FILE, --job-file JOB_FILE
                        JSON file that contain all data on the template
                        matching job, written out by pytom_match_template.py
                        in the destination path.
  -n NUMBER_OF_PARTICLES, --number-of-particles NUMBER_OF_PARTICLES
                        The number of particles to extract and estimate the
                        ROC on, recommended is to multiply the expected number
                        of particles by 3.
  -r RADIUS_PX, --radius-px RADIUS_PX
                        Particle radius in pixels in the tomogram. It is used
                        during extraction to remove areas around peaks
                        preventing double extraction.
  --bins BINS           Number of bins for the histogram to fit Gaussians on.
  --gaussian-peak GAUSSIAN_PEAK
                        Expected index of the histogram peak of the Gaussian
                        fitted to the particle population.
  --force-peak FORCE_PEAK
                        Force the particle peak to the provided peak index.
  --crop-plot CROP_PLOT
                        Flag to crop the plot relative to the height of the
                        particle population.
  --show-plot SHOW_PLOT
                        Flag to use a pop-up window for the plot instead of
                        writing it to the location of the job file.
  --log LOG             Can be set to `info` or `debug`
  --ignore_tomogram_mask IGNORE_TOMOGRAM_MASK
                        Flag to ignore the TM job tomogram mask. Useful if the
                        scores mrc looks reasonable, but this finds 0
                        particles

4. Merging annotations for export to other software

After running template matching and candidate extraction on multiple tomograms, each tomogram will have an individual starfile with particle annotations. Each starfile will contain the MicrographName column which refers back to the tomogram name. Multiple starfiles can therefore be appended to results in a large list which can be used in other software (such as RELION, WarpM) to load annotations. These software will link the annotations to specific tilt-series using the MicrographName column.

pytom_merge_stars.py

Without providing any parameters the script will try to merge all the starfiles in the current working directory and save them to a new file particles.star.

usage: pytom_merge_stars.py [-h] [-i INPUT_DIR] [-o OUTPUT_FILE] [--log LOG]

Merge multiple star files in the same directory. -- Marten Chaillet
(@McHaillet)

options:
  -h, --help            show this help message and exit
  -i INPUT_DIR, --input-dir INPUT_DIR
                        Directory with star files, script will try to merge
                        all files that end in '.star'.
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        Output star file name.
  --log LOG             Can be set to `info` or `debug`