biopython

mirror of https://github.com/biopython/biopython.git synced 2025-11-11 14:41:41 +08:00

Author	SHA1	Message	Date
Oscar GG	6fbcec6d1d	Ignore but warn on invalid EMBL DR lines (e.g. from RepBase) Solves issue #1579. Adds example with invalid DR structure, Tests/EMBL/RepBase23.02.embl	2018-04-03 13:35:53 +01:00
Peter Cock	99d9c00a10	Cope with corner case of no seq line See Tests/EMBL/embl_with_0_line.embl	2017-08-22 10:28:28 +01:00
Peter Cock	4a7b63d913	Handle EMBL SQ lines with no coordinates This should fix issue #1368.	2017-08-22 10:28:28 +01:00
Erik Cederstrand	83c088af2e	Warn about malformed qualifier values As suggested in https://github.com/biopython/biopython/pull/1299#issuecomment-311092091	2017-07-17 15:09:03 +01:00
Erik Cederstrand	f39f4beb8b	Ignore leading spaces in value Fixes parsing of quoted, multi-line qualifier values in GenBank feature tables. The following would previously raise `ValueError: Problem with 'CDS' feature:[...]` because the quote was not properly detected: ``` FH Key Location/Qualifiers FT CDS 1..756 FT /*tag= a FT /product= "Lactobacillus kefir T76I/V95M/S96L/E145A/F147L FT /V148I/T152A/L153M/Y190G/A202F/M206C/Y249F mutant FT ketoreductase (KRED) protein" FT /partial FT /note= "No stop codon is shown" ```	2017-07-17 15:09:03 +01:00
Peter Cock	1abacd6739	Use USA spelling of initialise	2017-06-16 16:42:34 +01:00
Peter Cock	c4ac80d66a	docstring work for Bio.GenBank Making both pydocstyle and RST validator happy.	2017-06-16 15:33:16 +01:00
morrme	80988e819a	corrections for pydocstyle rule D204	2017-05-07 22:36:29 +01:00
morrme	d4c407e8dd	Fixes for pydocstyle rule D209	2017-04-22 21:35:43 +01:00
morrme	69d20d74f4	docstring capitalization changes per pydocstyle rule D403	2017-04-22 13:25:35 +01:00
Peter Cock	7bfb2903a8	autopep8 --in-place --select E305 Bio//.py	2017-04-20 17:15:31 +01:00
Steve Bond	2290ca8544	Tidy up GenBank processing code Squashed commit of pull request #1031 - Remove redundant parentheses - Decorate static methods as such - Alpha-order long lists of attributes - Define 'constant' attributes during object instantiation, instead of repeatedly setting them in methods - Fix line indents	2017-01-08 11:16:05 +09:00
Steve Bond	d8767369a6	Write structured GenBank comments; robust parsing Squashed commit of pull request #1029, itself a replacement of pull request #945. The original implementation was not robust against malformed structured comments, causing crashes. Any structured comment information was previously discarded from the record when written. Adding output files for unit tests as well.	2016-12-22 14:39:52 +09:00
peterjc	2b06528144	Explicitly record EMBL/GenBank molecule type	2016-11-24 10:51:41 +00:00
peterjc	b61e52b3cc	Update IMGT parser for new IPD-IMGT/HLA database files. After v3.16.0, IPD-IMGT/HLA adopted a new format ID line. Closes GitHub issue 988.	2016-11-23 17:38:50 +00:00
peterjc	b50051937f	Handle EMBL patent files from KIPO	2016-11-23 17:12:44 +00:00
peterjc	55504850d8	Strip white space in old EMBL patent ID lines	2016-11-23 14:55:06 +00:00
Peter Cock	fd7b171993	Explicitly handle EMBL/GenBank topology in scanner/consumer This fixed a few corner cases not capturing the topology.	2016-11-18 17:30:32 +00:00
biologyguy	e3b7b81a44	Decorate static methods and return dummy data as tuple The tuple return is to prevent errant type conversion introduced in a recent commit	2016-10-03 15:09:30 +01:00
Bertrand Néron	3231fa0636	fix issue #615 https://github.com/biopython/biopython/issues/615 ensure that the filed DEFINITION ends with a period as in Genbank format specifications.	2016-08-29 16:53:21 +01:00
xzhuo	81a2cee65d	To deal with 0 nt sequence line in embl SQ section	2016-08-22 16:54:55 +01:00
peterjc	642f78eb71	Warn if GenBank identifier over 16 chars	2016-06-10 09:57:10 +01:00
peterjc	1119425f45	PEP8 E402 module level imports vs __docformat__ placement This was mostly due to the latest version of the pep8 tool being stricter and wanting the __docformat__ line after the module level imports. Rather than moving them all, I removed them - and we'll switch to using reStructuredText as the default when converting the docstrings into API HTML pages for the website. This commit also includes assorted other PEP8 fixes which our recommend git pre-commit hook spotted, and I fixed by hand.	2016-05-10 17:13:46 +01:00
Peter Cock	4f41c6eb8a	Python 2.6 fix for structured comments work	2016-01-04 20:45:18 +09:00
Peter Cock	5c05de6183	Don't create empty structured comments in record annotation dict	2016-01-04 19:21:01 +09:00
Brian Osborne	2d917f664d	Parse GenBank structured comments. Peter: This is a squashed commit of GitHub pull request #613 by Brian, with minor PEP8 white space changes, and leaving out the test output changes for test_GenBank.py (see my subsequent commit).	2016-01-04 19:01:41 +09:00
peterjc	6f8a70b1c2	Two kinds of EMBL PR lines, patent priority vs project references The patent lines are described within this document, http://www.ebi.ac.uk/sites/ebi.ac.uk/files/groups/external_services/patentdata/Non-redundant%20databases-user%20manual_v4.pdf	2015-12-03 14:06:29 +00:00
Peter Cock	ed3fa5d669	Cope with multi-line DBLINK entries in GenBank files	2015-12-03 09:53:36 +00:00
peterjc	2fa24c349c	Don't append empty lineage '.' onto GenBank ORGANISM field Also fixed two cases of PEP8 spacing.	2015-08-19 14:41:23 +01:00
Kai Blin	6fc6ad695c	GenBank: Improve error message from EMBL parser When there is content in an EMBL file after SQ or CO lines that is not // or whitespace, the parser throws an AssertionError. Unfortunately, the error message is less than helpful. Improve the error message. This fixes issue #431 Signed-off-by: Kai Blin <kblin@biosustain.dtu.dk>	2015-08-05 17:36:21 +01:00
peterjc	bab0067ce5	Tolerate GenBank locations not split at comma (Spotting by mis-matched brackets; issues a warning)	2015-06-03 09:23:10 +01:00
Kai Blin	17aa4f4de6	GenBank: Avoid infinite loop while parsing While parsing input files that for some reason end while in the Features table, the GenBank code designed to skip empty lines triggered an infinite loop. This patch fixes the infinite loop by breaking out of the "consume empty lines" loop when readline() returns '' (readline()'s way of saying "end of file") while still supporting the original "consume empty lines" use case where readline() will return '\n'. Please note that the provided test case causes the unpatched code to get stuck in an infinite loop without the provided patch. This fixes issue #510. Signed-off-by: Kai Blin <kblin@biosustain.dtu.dk>	2015-04-08 11:53:18 +02:00
Peter Cock	ab7ac2968b	Resolve slash-n in RST docstrings. Solves temporarily disabling RST markup as of commit 3cfb6334a17ce8b783c93f8e00baf214cdcb8668 by the simple trick of putting the docstrings in raw string mode.	2014-11-14 16:08:47 +09:00
peterjc	3cfb6334a1	epydoc RST does not like the slash-n in the docstring/doctest	2014-11-11 17:16:33 +09:00
Travis Wrightsman	1e47cee152	explicit docformat definitions	2014-11-11 17:06:07 +09:00
Travis Wrightsman	9c81e9815a	restructured text progress 2	2014-11-11 17:04:56 +09:00
carlosp420	c4ba18bb45	PEP8 fixes E265, GenBank	2014-10-24 09:14:27 +03:00
Christian Brueffer	fdc32f5621	Fixes for PEP8 E113 (unexpected indentation).	2014-10-20 18:58:54 +01:00
Christian Brueffer	6fba5dfbd5	Fix PEP8 E111 (indentation is not a multiple of four).	2014-10-20 18:58:54 +01:00
Christian Brueffer	0f8f1fc597	PEP8 fixes for E231 (missing whitespace after delimiters).	2014-10-20 10:33:37 +02:00
Zhaorong Ma	29d490b07f	Update Scanner.py	2014-01-30 10:55:24 +00:00
Zhaorong Ma	942e98bf20	Changed a typo in comment	2014-01-30 10:55:07 +00:00
Peter Cock	b06ab99c96	Handle EMBL-bank patent files (no sequence, only checksum) See http://www.ebi.ac.uk/sites/ebi.ac.uk/files/groups/external_services/patentdata/Non-redundant_databases-user_manual_v3.pdf	2013-10-24 21:38:31 +01:00
peterjc	f065811560	Selected changes based on the 2to3 filter fixer I've generally replaced the 2to3 fixer's default dummy variable of _f with something else.	2013-10-05 14:38:56 +01:00
peterjc	4429497000	Fix a few more stray print statements Aim here is to minimise the differences from running 2to3 to facilitate moving to a single code-base without needing to run 2to3 at all.	2013-09-28 23:26:55 +01:00
peterjc	561438347a	Selected changes from $ 2to3 --no-diffs -n -w -f next Bio I have not included changes of the 'next' method on our objects to '__next__' since that changes the API and may break things... this issue needs more review.	2013-09-28 14:46:00 +01:00
Peter Cock	de12c5e08f	Add: from __future__ import print_statement This is currently redundant as we are carefully only using this simple print style which is both a print statement (with redundant brackets) under Python 2 and a print function under Python 3: print(variable) However, adding the __future__ import to any file using a print should catch any accidental usage of the print statement in the near future (even if not testing under Python 3 where it would be spotted since we've turned off the print fixer during the 2to3 conversion). This was automated as follows: <python> MAGIC = "from __future__ import print_function" import os import sys def should_mark(filename): handle = open(filename, "rU") lines = [line.strip() for line in handle if "print" in line] handle.close() if MAGIC in lines: #print("%s is marked" % filename) return False if "print" in lines: print("TODO - %s has a naked print" % filename) sys.exit(1) for line in lines: if "print" not in line: continue #print(line) line = line.strip(" #") if line.startswith(">>>") or line.startswith("..."): #doctest line = line[3:].strip() if line.startswith("print ") or line.startswith("print("): return True print("%s has no print statements" % filename) return False def mark_file(filename, marker=MAGIC): with open(filename, "rU") as h: lines = list(h.readlines()) with open(filename, "w") as h: while (lines[0].startswith("#") or not lines[0].strip()): h.write(lines.pop(0)) if lines[0].startswith('"""') or lines[0].startswith('r"""'): # Module docstring if lines[0].strip() == '"""': print("Non-PEP8 module docstring in %s" % filename) if lines[0].rstrip().endswith('"""') and lines[0].strip() != '"""': # One liner print("One line module docstring in %s" % filename) h.write(lines.pop(0)) else: h.write(lines.pop(0)) while not lines[0].strip().endswith('"""'): h.write(lines.pop(0)) h.write(lines.pop(0)) while (lines[0].startswith("#") or not lines[0].strip()): h.write(lines.pop(0)) h.write(marker + "\n\n") h.write("".join(lines)) for dirpath, dirnames, filenames in os.walk("."): if dirpath.startswith("./build/"): continue for f in filenames: if not f.endswith(".py"): continue f = os.path.join(dirpath, f) if should_mark(f): print("Marking %s" % f) mark_file(f) </python>	2013-09-09 21:17:13 +01:00
Peter Cock	51a4653f8a	Use print function style in misc modules & example scripts	2013-09-08 17:16:08 +01:00
Peter Cock	fb6bc576b6	Import StringIO via Bio._py3k	2013-09-07 13:05:16 +01:00
Sergei Lebedev	7378e8aa50	Partially migrated to print-function-like syntax For now we only handle the 'print' statement with a single argument, i. e.: print ... -> print(...) Migration was performed using a 2to3 fixer class: from lib2to3 import fixer_base, patcomp from lib2to3.fixer_util import Name, Call parend_expr = patcomp.compile_pattern( """atom< '(' [atom\|term\|testlist_gexp\|STRING\|NAME] ')' >""") class FixSinglePrint(fixer_base.BaseFix): PATTERN = "print_stmt" BM_compatible = True def transform(self, node, results): assert results assert node.children[0] == Name(u"print") args = node.children[1:] if len(args) != 1 or parend_expr.match(args[0]): # We only fix 'print' statements which have _exactly_ one # non-parenthesized argument. return l_args = [arg.clone() for arg in args] l_args[0].prefix = u"" n_stmt = Call(Name(u"print"), l_args) n_stmt.prefix = node.prefix return n_stmt	2013-08-31 00:54:26 +04:00

1 2 3

137 Commits