mirror of
https://github.com/biopython/biopython.git
synced 2025-10-20 13:43:47 +08:00
Use C code to parse alignments in which dashes represent gaps (#4737)
* update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * fix * update * update * update * update * update * Use int64_t instead of long * avoid testing numpy array output * update * update * update * test * test * test * test * update * update * update * update * update * no more compiler warnings * update * update * update * update * change submodule name --------- Co-authored-by: Michiel de Hoon <mdehoon@tkx249.genome.gsc.riken.jp>
This commit is contained in:
@ -150,8 +150,14 @@ aligned sequences as follows:
|
||||
CGGTTTTT
|
||||
AG-TTT--
|
||||
AGGTTT--
|
||||
>>> lines = [line.encode() for line in lines] # convert to bytes
|
||||
>>> lines
|
||||
[b'CGGTTTTT', b'AG-TTT--', b'AGGTTT--']
|
||||
>>> sequences, coordinates = Alignment.parse_printed_alignment(lines)
|
||||
>>> sequences
|
||||
[b'CGGTTTTT', b'AGTTT', b'AGGTTT']
|
||||
>>> sequences = [sequence.decode() for sequence in sequences]
|
||||
>>> sequences
|
||||
['CGGTTTTT', 'AGTTT', 'AGGTTT']
|
||||
>>> coordinates
|
||||
array([[0, 2, 3, 6, 8],
|
||||
@ -4568,6 +4574,12 @@ dictionary. Please refer to the test script ``test_Align_bigbed.py`` in
|
||||
the ``Tests`` subdirectory in the Biopython distribution for more
|
||||
examples of writing alignment files in the bigBed format.
|
||||
|
||||
Optional arguments are ``compress`` (default value is ``True``), ``blockSize``
|
||||
(default value is 256), and ``itemsPerSlot`` (default value is 512). See the
|
||||
documentation of UCSC's ``bedToBigBed`` program for a description of these
|
||||
arguments. Searching a ``bigBed`` file can be faster by using
|
||||
``compress=False`` and ``itemsPerSlot=1`` when creating the bigBed file.
|
||||
|
||||
.. _`subsec:align_psl`:
|
||||
|
||||
Pattern Space Layout (PSL)
|
||||
@ -4929,6 +4941,12 @@ See section :ref:`subsec:align_psl` for an explanation on how the
|
||||
number of matches, mismatches, repeat region matches, and matches to
|
||||
unknown nucleotides are obtained.
|
||||
|
||||
Further optional arguments are ``blockSize`` (default value is 256), and
|
||||
``itemsPerSlot`` (default value is 512). See the documentation of UCSC's
|
||||
``bedToBigBed`` program for a description of these arguments. Searching a
|
||||
``bigPsl`` file can be faster by using ``compress=False`` and
|
||||
``itemsPerSlot=1`` when creating the bigPsl file.
|
||||
|
||||
.. _`subsec:align_maf`:
|
||||
|
||||
Multiple Alignment Format (MAF)
|
||||
@ -5185,8 +5203,8 @@ bigMaf
|
||||
|
||||
A bigMaf file is a bigBed file with a BED3+1 format consisting of the 3
|
||||
required BED fields plus a custom field that stores a MAF alignment
|
||||
block as a string, crearing an indexed binary version of a MAF file (see
|
||||
section :ref:`subsec:align_bigmaf`). The associated AutoSql file
|
||||
block as a string, creating an indexed binary version of a MAF file (see
|
||||
section :ref:`subsec:align_maf`). The associated AutoSql file
|
||||
`bigMaf.as <https://genome.ucsc.edu/goldenPath/help/examples/bigMaf.as>`__
|
||||
is provided by UCSC. To create a bigMaf file, you can either use the
|
||||
``mafToBigMaf`` and ``bedToBigBed`` programs from UCSC. or you can use
|
||||
@ -5344,6 +5362,9 @@ be of the form ``reference.chromosome``, where ``reference`` refers to
|
||||
the reference species. ``Bio.Align.write`` has the additional keyword
|
||||
argument ``compress`` (``True`` by default) specifying whether the data
|
||||
should be compressed using zlib.
|
||||
Further optional arguments are ``blockSize`` (default value is 256), and
|
||||
``itemsPerSlot`` (default value is 512). See the documentation of UCSC's
|
||||
``bedToBigBed`` program for a description of these arguments.
|
||||
|
||||
As a bigMaf file is a special case of a bigBed file, you can use the
|
||||
``search`` method on the ``alignments`` object to find alignments to
|
||||
@ -5374,6 +5395,9 @@ start and end positions may be ``None`` to start searching from position
|
||||
respectively. Note that we can search on genomic position for the
|
||||
reference species only.
|
||||
|
||||
Searching a ``bigMaf`` file can be faster by using ``compress=False`` and
|
||||
``itemsPerSlot=1`` when creating the bigMaf file.
|
||||
|
||||
.. _`subsec:align_chain`:
|
||||
|
||||
UCSC chain file format
|
||||
|
Reference in New Issue
Block a user