2.1. frags package¶
2.1.1. Submodules¶
2.1.2. frags.context module¶
2.1.3. frags.core module¶
Contains generic functions used by FRAG
-
frags.core.
build_graph
(ref, k)[source]¶ - Index each k-mers of a genome
Aho-Corasick implementation, requires pypi package pyahocorasick
- Parameters
ref (str) – the reference to index
k (int) – k-mer size
-
frags.core.
find_hits
(graph, a_read)[source]¶ - Find all kmers of ref present in a read
All hits are on the form: start_pos_read: start_pos_ref
- Parameters
graph (
pyahocorasick
) – the graph to parsea_read (str) – the read to search in the index
-
frags.core.
get_all_queries
(file, nb_proc, k, gap, graph1, graph2=None)[source]¶ Launch all parallel process to get all queries from a file
- Parameters
file (string) – the filename of the file where to take sequences from
nb_proc (int) – number of precess to run in parallel
k (int) – size of kmers
gap (int) – maximum authorized gap size for continuous hits
graph1 – the graph to parse for genome1
graph2 – the graph to parse for genome2
-
frags.core.
get_recombinations
(offset_start, offset_end, file, k, gap, graph1, graph2=None)[source]¶ - Main parallelized function that retrieve each read
of a offset range and find matches and breakpoint of them.
- Parameters
offset_start (int) – where to start taking sequences in the file
offset_end (int) – where to stop taking sequences in the file
file (string) – the filename of the file where to take sequences from
k (int) – size of kmers
gap (int) – maximum authorized gap size for continuous hits
graph1 – the graph to parse for genome1
graph2 – the graph to parse for genome2
-
frags.core.
get_reference
(input_file)[source]¶ - Get the reference genome in one-string.
Only take the first sequence of the file. Can be fasta or fastq, gzipped or not.
- Parameters
input_file (str) – fasta/fastq file to use as reference
-
frags.core.
next_read
(file, offset_start, offset_end)[source]¶ - Return each sequence between offsets range of a file
as a tuple (header, seq) using a generator. Can be fasta or fastq, gzipped or not. WARNING: spaces in headers are replaced by _
- Parameters
file (str) – fasta/fastq file to read
offset_start (int) – offset in the file from where to read
offset_end (int) – offset in the file until where to read
-
frags.core.
prepare_blast_file
(breakpoint_file, all_queries, minsizeblast)[source]¶ - Prepare a fasta file to be Blasted containing all breakpoints
of at least minsizeblast nucleotides. WARNING: this wrote a FASTA file, regardless of the format of the original file WARNING: headers of original files are modified to add the information of which breakpoint(s) of a specific read are Blasted: original_header_#bp
- Parameters
breakpoint_file (string) – the filename of the file to be written
all_queries (list(
Read
)) – all queries that may contain breakpointsminsizeblast (int) – minimal size of breakpoint accepted
-
frags.core.
process_blast_res
(compressed_file, res_blast_file, sep, all_breakpoints)[source]¶ - Compress Blast result to only show the bests hits and output
result in a fasta-like file. Header is the original header with breakpoint id, e-value and bit-score. WARNING: this wrote a FASTA file, regardless of the format of the original file
- Parameters
compressed_file (string) – the filename of the file to be written
res_blast_file (string) – Blast result file
sep (list(char)) – separator to use in the result file
all_breakpoints (dict) – dict of Breakpoints/index created before the Blast
2.1.4. frags.read module¶
Contains class and functions related to read definition and use
-
class
frags.read.
Breakpoint
(beg_pos_read, size)[source]¶ Bases:
object
Define a breakpoint.
- Parameters
beg_pos_read (int) – starting position in the read of this match
size (int) – size of the match
-
class
frags.read.
Match
(beg_pos_read, beg_pos_ref, strand, ref, size, inserts, seq_l)[source]¶ Bases:
object
Define a match.
- Parameters
beg_pos_read (int) – starting position in the read of this match
beg_pos_ref (int) – starting position in the ref of this match
strand (int) – strand of this match
ref (int) – the ref index for this match
size (int) – size of the match
inserts (list(int)) – size of potential insertions (possible to have several insertions in ONE match)
seq_l (int) – size of the read (needed for rev comp computation)
-
is_include_in
(other)[source]¶ Check if this match is included in another match
- Parameters
other (
Match
) – the match to compare with
-
class
frags.read.
Read
(header, sequence)[source]¶ Bases:
object
Define a read.
- Parameters
header (str) – header of the read
sequence (str) – sequence of the read
-
add_a_match
(match)[source]¶ - Test if this match should be added or not.
It must be added if it is not a subpart of an already added other match. In some case, some already added matches are subparts of the match to add. If so, they are removed.
- Parameters
match (
Match
) – the match to add
-
get_matches
(hits, gap, k, strand, ref)[source]¶ Populate matches list from all hits, for one strand
- Parameters
hits (dict) – matching position on read and ref
gap (int) – maximum authorized gap size for continuous hits
k (int) – k-mer size
strand (int) – the strand of this hit
ref (int) – the reference index of this hit
2.1.5. Module contents¶
Contains everything related to FRAGS software