Commit Graph

137 Commits

Author SHA1 Message Date
6fbcec6d1d Ignore but warn on invalid EMBL DR lines (e.g. from RepBase)
Solves issue #1579.

Adds example with invalid DR structure, Tests/EMBL/RepBase23.02.embl
2018-04-03 13:35:53 +01:00
99d9c00a10 Cope with corner case of no seq line
See Tests/EMBL/embl_with_0_line.embl
2017-08-22 10:28:28 +01:00
4a7b63d913 Handle EMBL SQ lines with no coordinates
This should fix issue #1368.
2017-08-22 10:28:28 +01:00
83c088af2e Warn about malformed qualifier values
As suggested in https://github.com/biopython/biopython/pull/1299#issuecomment-311092091
2017-07-17 15:09:03 +01:00
f39f4beb8b Ignore leading spaces in value
Fixes parsing of quoted, multi-line qualifier values in GenBank feature tables. The following would previously raise `ValueError: Problem with 'CDS' feature:[...]` because the quote was not properly detected:

```
FH   Key             Location/Qualifiers
FT   CDS             1..756
FT                   /*tag=  a
FT                   /product= "Lactobacillus kefir T76I/V95M/S96L/E145A/F147L
FT                   /V148I/T152A/L153M/Y190G/A202F/M206C/Y249F mutant
FT                   ketoreductase (KRED) protein"
FT                   /partial
FT                   /note= "No stop codon is shown"
```
2017-07-17 15:09:03 +01:00
1abacd6739 Use USA spelling of initialise 2017-06-16 16:42:34 +01:00
c4ac80d66a docstring work for Bio.GenBank
Making both pydocstyle and RST validator happy.
2017-06-16 15:33:16 +01:00
80988e819a corrections for pydocstyle rule D204 2017-05-07 22:36:29 +01:00
d4c407e8dd Fixes for pydocstyle rule D209 2017-04-22 21:35:43 +01:00
69d20d74f4 docstring capitalization changes per pydocstyle rule D403 2017-04-22 13:25:35 +01:00
7bfb2903a8 autopep8 --in-place --select E305 Bio/*/*.py 2017-04-20 17:15:31 +01:00
2290ca8544 Tidy up GenBank processing code
Squashed commit of pull request #1031 

- Remove redundant parentheses
- Decorate static methods as such
- Alpha-order long lists of attributes
- Define 'constant' attributes during object instantiation, instead of repeatedly setting them in methods
- Fix line indents
2017-01-08 11:16:05 +09:00
d8767369a6 Write structured GenBank comments; robust parsing
Squashed commit of pull request #1029, itself a replacement of pull request #945.

The original implementation was not robust against malformed structured comments, causing crashes.

Any structured comment information was previously discarded from the record when written.

Adding output files for unit tests as well.
2016-12-22 14:39:52 +09:00
2b06528144 Explicitly record EMBL/GenBank molecule type 2016-11-24 10:51:41 +00:00
b61e52b3cc Update IMGT parser for new IPD-IMGT/HLA database files.
After v3.16.0, IPD-IMGT/HLA adopted a new format ID line.

Closes GitHub issue 988.
2016-11-23 17:38:50 +00:00
b50051937f Handle EMBL patent files from KIPO 2016-11-23 17:12:44 +00:00
55504850d8 Strip white space in old EMBL patent ID lines 2016-11-23 14:55:06 +00:00
fd7b171993 Explicitly handle EMBL/GenBank topology in scanner/consumer
This fixed a few corner cases not capturing the topology.
2016-11-18 17:30:32 +00:00
e3b7b81a44 Decorate static methods and return dummy data as tuple
The tuple return is to prevent errant type conversion introduced in a recent commit
2016-10-03 15:09:30 +01:00
3231fa0636 fix issue #615
https://github.com/biopython/biopython/issues/615
ensure that the filed DEFINITION ends with a period as in Genbank
format specifications.
2016-08-29 16:53:21 +01:00
81a2cee65d To deal with 0 nt sequence line in embl SQ section 2016-08-22 16:54:55 +01:00
642f78eb71 Warn if GenBank identifier over 16 chars 2016-06-10 09:57:10 +01:00
1119425f45 PEP8 E402 module level imports vs __docformat__ placement
This was mostly due to the latest version of the pep8
tool being stricter and wanting the __docformat__ line
after the module level imports.

Rather than moving them all, I removed them - and we'll
switch to using reStructuredText as the default when
converting the docstrings into API HTML pages for the
website.

This commit also includes assorted other PEP8 fixes which
our recommend git pre-commit hook spotted, and I fixed by
hand.
2016-05-10 17:13:46 +01:00
4f41c6eb8a Python 2.6 fix for structured comments work 2016-01-04 20:45:18 +09:00
5c05de6183 Don't create empty structured comments in record annotation dict 2016-01-04 19:21:01 +09:00
2d917f664d Parse GenBank structured comments.
Peter: This is a squashed commit of GitHub pull request #613 by Brian,
with minor PEP8 white space changes, and leaving out the test output
changes for test_GenBank.py (see my subsequent commit).
2016-01-04 19:01:41 +09:00
6f8a70b1c2 Two kinds of EMBL PR lines, patent priority vs project references
The patent lines are described within this document,

http://www.ebi.ac.uk/sites/ebi.ac.uk/files/groups/external_services/patentdata/Non-redundant%20databases-user%20manual_v4.pdf
2015-12-03 14:06:29 +00:00
ed3fa5d669 Cope with multi-line DBLINK entries in GenBank files 2015-12-03 09:53:36 +00:00
2fa24c349c Don't append empty lineage '.' onto GenBank ORGANISM field
Also fixed two cases of PEP8 spacing.
2015-08-19 14:41:23 +01:00
6fc6ad695c GenBank: Improve error message from EMBL parser
When there is content in an EMBL file after SQ or CO lines that is not // or whitespace, the
parser throws an AssertionError. Unfortunately, the error message is less than helpful.
Improve the error message.

This fixes issue #431

Signed-off-by: Kai Blin <kblin@biosustain.dtu.dk>
2015-08-05 17:36:21 +01:00
bab0067ce5 Tolerate GenBank locations not split at comma
(Spotting by mis-matched brackets; issues a warning)
2015-06-03 09:23:10 +01:00
17aa4f4de6 GenBank: Avoid infinite loop while parsing
While parsing input files that for some reason end while in the Features
table, the GenBank code designed to skip empty lines triggered an
infinite loop.

This patch fixes the infinite loop by breaking out of the "consume empty
lines" loop when readline() returns '' (readline()'s way of
saying "end of file") while still supporting the original "consume empty
lines" use case where readline() will return '\n'.

Please note that the provided test case causes the unpatched code to get
stuck in an infinite loop without the provided patch.
This fixes issue #510.

Signed-off-by: Kai Blin <kblin@biosustain.dtu.dk>
2015-04-08 11:53:18 +02:00
ab7ac2968b Resolve slash-n in RST docstrings.
Solves temporarily disabling RST markup as of commit
3cfb6334a17ce8b783c93f8e00baf214cdcb8668 by the simple
trick of putting the docstrings in raw string mode.
2014-11-14 16:08:47 +09:00
3cfb6334a1 epydoc RST does not like the slash-n in the docstring/doctest 2014-11-11 17:16:33 +09:00
1e47cee152 explicit docformat definitions 2014-11-11 17:06:07 +09:00
9c81e9815a restructured text progress 2 2014-11-11 17:04:56 +09:00
c4ba18bb45 PEP8 fixes E265, GenBank 2014-10-24 09:14:27 +03:00
fdc32f5621 Fixes for PEP8 E113 (unexpected indentation). 2014-10-20 18:58:54 +01:00
6fba5dfbd5 Fix PEP8 E111 (indentation is not a multiple of four). 2014-10-20 18:58:54 +01:00
0f8f1fc597 PEP8 fixes for E231 (missing whitespace after delimiters). 2014-10-20 10:33:37 +02:00
29d490b07f Update Scanner.py 2014-01-30 10:55:24 +00:00
942e98bf20 Changed a typo in comment 2014-01-30 10:55:07 +00:00
b06ab99c96 Handle EMBL-bank patent files (no sequence, only checksum)
See http://www.ebi.ac.uk/sites/ebi.ac.uk/files/groups/external_services/patentdata/Non-redundant_databases-user_manual_v3.pdf
2013-10-24 21:38:31 +01:00
f065811560 Selected changes based on the 2to3 filter fixer
I've generally replaced the 2to3 fixer's default dummy
variable of _f with something else.
2013-10-05 14:38:56 +01:00
4429497000 Fix a few more stray print statements
Aim here is to minimise the differences from running 2to3
to facilitate moving to a single code-base without needing
to run 2to3 at all.
2013-09-28 23:26:55 +01:00
561438347a Selected changes from $ 2to3 --no-diffs -n -w -f next Bio
I have not included changes of the 'next' method on
our objects to '__next__' since that changes the API
and may break things... this issue needs more review.
2013-09-28 14:46:00 +01:00
de12c5e08f Add: from __future__ import print_statement
This is currently redundant as we are carefully only
using this simple print style which is both a print
statement (with redundant brackets) under Python 2
and a print function under Python 3:

print(variable)

However, adding the __future__ import to any file using
a print should catch any accidental usage of the print
statement in the near future (even if not testing under
Python 3 where it would be spotted since we've turned
off the print fixer during the 2to3 conversion).

This was automated as follows:

<python>
MAGIC = "from __future__ import print_function"

import os
import sys

def should_mark(filename):
    handle = open(filename, "rU")
    lines = [line.strip() for line in handle if "print" in line]
    handle.close()
    if MAGIC in lines:
        #print("%s is marked" % filename)
        return False
    if "print" in lines:
        print("TODO - %s has a naked print" % filename)
        sys.exit(1)
    for line in lines:
        if "print" not in line:
            continue
        #print(line)
        line = line.strip(" #")
        if line.startswith(">>>") or line.startswith("..."):
            #doctest
            line = line[3:].strip()
        if line.startswith("print ") or line.startswith("print("):
            return True
    print("%s has no print statements" % filename)
    return False

def mark_file(filename, marker=MAGIC):
    with open(filename, "rU") as h:
        lines = list(h.readlines())
    with open(filename, "w") as h:
        while (lines[0].startswith("#") or not lines[0].strip()):
            h.write(lines.pop(0))
        if lines[0].startswith('"""') or lines[0].startswith('r"""'):
            # Module docstring
            if lines[0].strip() == '"""':
                print("Non-PEP8 module docstring in %s" % filename)
            if lines[0].rstrip().endswith('"""') and lines[0].strip() != '"""':
                # One liner
                print("One line module docstring in %s" % filename)
                h.write(lines.pop(0))
            else:
                h.write(lines.pop(0))
                while not lines[0].strip().endswith('"""'):
                    h.write(lines.pop(0))
                h.write(lines.pop(0))
        while (lines[0].startswith("#") or not lines[0].strip()):
            h.write(lines.pop(0))
        h.write(marker + "\n\n")
        h.write("".join(lines))

for dirpath, dirnames, filenames in os.walk("."):
    if dirpath.startswith("./build/"):
        continue
    for f in filenames:
        if not f.endswith(".py"):
            continue
        f = os.path.join(dirpath, f)
        if should_mark(f):
            print("Marking %s" % f)
            mark_file(f)
</python>
2013-09-09 21:17:13 +01:00
51a4653f8a Use print function style in misc modules & example scripts 2013-09-08 17:16:08 +01:00
fb6bc576b6 Import StringIO via Bio._py3k 2013-09-07 13:05:16 +01:00
7378e8aa50 Partially migrated to print-function-like syntax
For now we only handle the 'print' statement with a single argument,
  i. e.:

      print ... -> print(...)

  Migration was performed using a 2to3 fixer class:

      from lib2to3 import fixer_base, patcomp
      from lib2to3.fixer_util import Name, Call

      parend_expr = patcomp.compile_pattern(
          """atom< '(' [atom|term|testlist_gexp|STRING|NAME] ')' >""")

      class FixSinglePrint(fixer_base.BaseFix):
          PATTERN = "print_stmt"
          BM_compatible = True

          def transform(self, node, results):
              assert results
              assert node.children[0] == Name(u"print")
              args = node.children[1:]
              if len(args) != 1 or parend_expr.match(args[0]):
                  # We only fix 'print' statements which have _exactly_ one
                  # non-parenthesized argument.
                  return

              l_args = [arg.clone() for arg in args]
              l_args[0].prefix = u""
              n_stmt = Call(Name(u"print"), l_args)
              n_stmt.prefix = node.prefix
              return n_stmt
2013-08-31 00:54:26 +04:00