Fixes parsing of quoted, multi-line qualifier values in GenBank feature tables. The following would previously raise `ValueError: Problem with 'CDS' feature:[...]` because the quote was not properly detected:
```
FH Key Location/Qualifiers
FT CDS 1..756
FT /*tag= a
FT /product= "Lactobacillus kefir T76I/V95M/S96L/E145A/F147L
FT /V148I/T152A/L153M/Y190G/A202F/M206C/Y249F mutant
FT ketoreductase (KRED) protein"
FT /partial
FT /note= "No stop codon is shown"
```
Squashed commit of pull request #1031
- Remove redundant parentheses
- Decorate static methods as such
- Alpha-order long lists of attributes
- Define 'constant' attributes during object instantiation, instead of repeatedly setting them in methods
- Fix line indents
Squashed commit of pull request #1029, itself a replacement of pull request #945.
The original implementation was not robust against malformed structured comments, causing crashes.
Any structured comment information was previously discarded from the record when written.
Adding output files for unit tests as well.
This was mostly due to the latest version of the pep8
tool being stricter and wanting the __docformat__ line
after the module level imports.
Rather than moving them all, I removed them - and we'll
switch to using reStructuredText as the default when
converting the docstrings into API HTML pages for the
website.
This commit also includes assorted other PEP8 fixes which
our recommend git pre-commit hook spotted, and I fixed by
hand.
Peter: This is a squashed commit of GitHub pull request #613 by Brian,
with minor PEP8 white space changes, and leaving out the test output
changes for test_GenBank.py (see my subsequent commit).
When there is content in an EMBL file after SQ or CO lines that is not // or whitespace, the
parser throws an AssertionError. Unfortunately, the error message is less than helpful.
Improve the error message.
This fixes issue #431
Signed-off-by: Kai Blin <kblin@biosustain.dtu.dk>
While parsing input files that for some reason end while in the Features
table, the GenBank code designed to skip empty lines triggered an
infinite loop.
This patch fixes the infinite loop by breaking out of the "consume empty
lines" loop when readline() returns '' (readline()'s way of
saying "end of file") while still supporting the original "consume empty
lines" use case where readline() will return '\n'.
Please note that the provided test case causes the unpatched code to get
stuck in an infinite loop without the provided patch.
This fixes issue #510.
Signed-off-by: Kai Blin <kblin@biosustain.dtu.dk>
Solves temporarily disabling RST markup as of commit
3cfb6334a17ce8b783c93f8e00baf214cdcb8668 by the simple
trick of putting the docstrings in raw string mode.
I have not included changes of the 'next' method on
our objects to '__next__' since that changes the API
and may break things... this issue needs more review.
This is currently redundant as we are carefully only
using this simple print style which is both a print
statement (with redundant brackets) under Python 2
and a print function under Python 3:
print(variable)
However, adding the __future__ import to any file using
a print should catch any accidental usage of the print
statement in the near future (even if not testing under
Python 3 where it would be spotted since we've turned
off the print fixer during the 2to3 conversion).
This was automated as follows:
<python>
MAGIC = "from __future__ import print_function"
import os
import sys
def should_mark(filename):
handle = open(filename, "rU")
lines = [line.strip() for line in handle if "print" in line]
handle.close()
if MAGIC in lines:
#print("%s is marked" % filename)
return False
if "print" in lines:
print("TODO - %s has a naked print" % filename)
sys.exit(1)
for line in lines:
if "print" not in line:
continue
#print(line)
line = line.strip(" #")
if line.startswith(">>>") or line.startswith("..."):
#doctest
line = line[3:].strip()
if line.startswith("print ") or line.startswith("print("):
return True
print("%s has no print statements" % filename)
return False
def mark_file(filename, marker=MAGIC):
with open(filename, "rU") as h:
lines = list(h.readlines())
with open(filename, "w") as h:
while (lines[0].startswith("#") or not lines[0].strip()):
h.write(lines.pop(0))
if lines[0].startswith('"""') or lines[0].startswith('r"""'):
# Module docstring
if lines[0].strip() == '"""':
print("Non-PEP8 module docstring in %s" % filename)
if lines[0].rstrip().endswith('"""') and lines[0].strip() != '"""':
# One liner
print("One line module docstring in %s" % filename)
h.write(lines.pop(0))
else:
h.write(lines.pop(0))
while not lines[0].strip().endswith('"""'):
h.write(lines.pop(0))
h.write(lines.pop(0))
while (lines[0].startswith("#") or not lines[0].strip()):
h.write(lines.pop(0))
h.write(marker + "\n\n")
h.write("".join(lines))
for dirpath, dirnames, filenames in os.walk("."):
if dirpath.startswith("./build/"):
continue
for f in filenames:
if not f.endswith(".py"):
continue
f = os.path.join(dirpath, f)
if should_mark(f):
print("Marking %s" % f)
mark_file(f)
</python>
For now we only handle the 'print' statement with a single argument,
i. e.:
print ... -> print(...)
Migration was performed using a 2to3 fixer class:
from lib2to3 import fixer_base, patcomp
from lib2to3.fixer_util import Name, Call
parend_expr = patcomp.compile_pattern(
"""atom< '(' [atom|term|testlist_gexp|STRING|NAME] ')' >""")
class FixSinglePrint(fixer_base.BaseFix):
PATTERN = "print_stmt"
BM_compatible = True
def transform(self, node, results):
assert results
assert node.children[0] == Name(u"print")
args = node.children[1:]
if len(args) != 1 or parend_expr.match(args[0]):
# We only fix 'print' statements which have _exactly_ one
# non-parenthesized argument.
return
l_args = [arg.clone() for arg in args]
l_args[0].prefix = u""
n_stmt = Call(Name(u"print"), l_args)
n_stmt.prefix = node.prefix
return n_stmt