2.1. frags package¶

2.1.1. Submodules¶

2.1.2. frags.context module¶

2.1.3. frags.core module¶

Contains generic functions used by FRAG

frags.core.build_graph(ref, k)[source]¶

Index each k-mers of a genome: Aho-Corasick implementation, requires pypi package pyahocorasick

Parameters

ref (str) – the reference to index
k (int) – k-mer size

frags.core.find_hits(graph, a_read)[source]¶

Find all kmers of ref present in a read: All hits are on the form: start_pos_read: start_pos_ref

Parameters

graph (pyahocorasick) – the graph to parse
a_read (str) – the read to search in the index

frags.core.get_all_queries(file, nb_proc, k, gap, graph1, graph2=None)[source]¶

Launch all parallel process to get all queries from a file

Parameters

file (string) – the filename of the file where to take sequences from
nb_proc (int) – number of precess to run in parallel
k (int) – size of kmers
gap (int) – maximum authorized gap size for continuous hits
graph1 – the graph to parse for genome1
graph2 – the graph to parse for genome2

frags.core.get_recombinations(offset_start, offset_end, file, k, gap, graph1, graph2=None)[source]¶

Main parallelized function that retrieve each read: of a offset range and find matches and breakpoint of them.

Parameters

offset_start (int) – where to start taking sequences in the file
offset_end (int) – where to stop taking sequences in the file
file (string) – the filename of the file where to take sequences from
k (int) – size of kmers
gap (int) – maximum authorized gap size for continuous hits
graph1 – the graph to parse for genome1
graph2 – the graph to parse for genome2

frags.core.get_reference(input_file)[source]¶

Get the reference genome in one-string.: Only take the first sequence of the file. Can be fasta or fastq, gzipped or not.

Parameters: input_file (str) – fasta/fastq file to use as reference

frags.core.next_read(file, offset_start, offset_end)[source]¶

Return each sequence between offsets range of a file: as a tuple (header, seq) using a generator. Can be fasta or fastq, gzipped or not. WARNING: spaces in headers are replaced by _

Parameters

file (str) – fasta/fastq file to read
offset_start (int) – offset in the file from where to read
offset_end (int) – offset in the file until where to read

frags.core.prepare_blast_file(breakpoint_file, all_queries, minsizeblast)[source]¶

Prepare a fasta file to be Blasted containing all breakpoints: of at least minsizeblast nucleotides. WARNING: this wrote a FASTA file, regardless of the format of the original file WARNING: headers of original files are modified to add the information of which breakpoint(s) of a specific read are Blasted: original_header_#bp

Parameters

breakpoint_file (string) – the filename of the file to be written
all_queries (list(Read)) – all queries that may contain breakpoints
minsizeblast (int) – minimal size of breakpoint accepted

frags.core.process_blast_res(compressed_file, res_blast_file, sep, all_breakpoints)[source]¶

Compress Blast result to only show the bests hits and output: result in a fasta-like file. Header is the original header with breakpoint id, e-value and bit-score. WARNING: this wrote a FASTA file, regardless of the format of the original file

Parameters

compressed_file (string) – the filename of the file to be written
res_blast_file (string) – Blast result file
sep (list(char)) – separator to use in the result file
all_breakpoints (dict) – dict of Breakpoints/index created before the Blast

frags.core.reverse_complement(seq)[source]¶

Take an input sequence and return its revcomp

Parameters: seq (str) – the seq to compute

frags.core.write_header(output_file, sep='\t')[source]¶

Write header of CSV output files

Parameters

output_file (str) – CSV file to write in
sep (char) – Separator to use between CSV columns

2.1.4. frags.read module¶

Contains class and functions related to read definition and use

class frags.read.Breakpoint(beg_pos_read, size)[source]¶

Bases: object

Define a breakpoint.

Parameters

beg_pos_read (int) – starting position in the read of this match
size (int) – size of the match

output(sep)[source]¶

Proper output of a line in the result file

Parameters: sep (list(char)) – Separator to use in CSV

class frags.read.Match(beg_pos_read, beg_pos_ref, strand, ref, size, inserts, seq_l)[source]¶

Bases: object

Define a match.

Parameters

beg_pos_read (int) – starting position in the read of this match
beg_pos_ref (int) – starting position in the ref of this match
strand (int) – strand of this match
ref (int) – the ref index for this match
size (int) – size of the match
inserts (list(int)) – size of potential insertions (possible to have several insertions in ONE match)
seq_l (int) – size of the read (needed for rev comp computation)

is_include_in(other)[source]¶

Check if this match is included in another match

Parameters: other (Match) – the match to compare with

output_read(sep)[source]¶

Correct output of read infos

Parameters: sep (list(char)) – Separator to use in CSV

output_ref(sep)[source]¶

Correct output of ref infos

Parameters: sep (list(char)) – Separator to use in CSV

class frags.read.Read(header, sequence)[source]¶

Bases: object

Define a read.

Parameters

header (str) – header of the read
sequence (str) – sequence of the read

add_a_match(match)[source]¶

Test if this match should be added or not.: It must be added if it is not a subpart of an already added other match. In some case, some already added matches are subparts of the match to add. If so, they are removed.

Parameters: match (Match) – the match to add

get_breakpoints()[source]¶: Populate breakpoints list using all hits, for both strands

get_matches(hits, gap, k, strand, ref)[source]¶

Populate matches list from all hits, for one strand

Parameters

hits (dict) – matching position on read and ref
gap (int) – maximum authorized gap size for continuous hits
k (int) – k-mer size
strand (int) – the strand of this hit
ref (int) – the reference index of this hit

get_ref()[source]¶: Compute the ref of this Read (0=nothing / 1=ref1 / 2=ref2 / 3=ref1 AND ref2)

get_strand()[source]¶: Compute the strand of this Read (-1=nothing / 0=normal / 1=revcomp / 2=normal AND revcomp)

output(sep)[source]¶

Proper output of a line of the result file

Parameters: sep (list(char)) – Separator to use in CSV

2.1.5. Module contents¶

Contains everything related to FRAGS software