From a102a1325881c44e5050bd8efbfeb04c101bb4a5 Mon Sep 17 00:00:00 2001 From: Peter Cock Date: Tue, 5 Nov 2019 09:56:49 +0000 Subject: [PATCH] Sample problematic GCG MSF file from IPD-IMGT/HLA database https://github.com/ANHIG/IMGTHLA/blob/3300/msf/W_prot.msf as of commit d99d8aca3f01f7431741a998ea5cc2417d53ac9c (26 Oct 2017), i.e. from v3.30.0 of the IMGTHLA dtabase. This file has a discrepancy between the alignment length (99 columns) and four of the sequences (only 93 letters without trailing gap padding). The initial Biopython GCG MSF parser will accept this file (and apply the missing padding) with a warning. --- Tests/msf/W_prot.msf | 61 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) create mode 100644 Tests/msf/W_prot.msf diff --git a/Tests/msf/W_prot.msf b/Tests/msf/W_prot.msf new file mode 100644 index 000000000..71d886eed --- /dev/null +++ b/Tests/msf/W_prot.msf @@ -0,0 +1,61 @@ +!!AA_MULTIPLE_ALIGNMENT + + MSF: 99 Type: P Oct 18, 2017 11:35 Check: 0 .. + + Name: W*01:01:01:01 Len: 99 Check: 7236 Weight: 1.00 + Name: W*01:01:01:02 Len: 99 Check: 7236 Weight: 1.00 + Name: W*01:01:01:03 Len: 99 Check: 7236 Weight: 1.00 + Name: W*01:01:01:04 Len: 99 Check: 7236 Weight: 1.00 + Name: W*01:01:01:05 Len: 99 Check: 7236 Weight: 1.00 + Name: W*01:01:01:06 Len: 99 Check: 7236 Weight: 1.00 + Name: W*02:01 Len: 93 Check: 9483 Weight: 1.00 + Name: W*03:01:01:01 Len: 93 Check: 9974 Weight: 1.00 + Name: W*03:01:01:02 Len: 93 Check: 9974 Weight: 1.00 + Name: W*04:01 Len: 93 Check: 9169 Weight: 1.00 + Name: W*05:01 Len: 99 Check: 7331 Weight: 1.00 +// + + W*01:01:01:01 GLTPFNGYTA ATWTRTAVSS VGMNIPYHGA SYLVRNQELR SWTAADKAAQ + W*01:01:01:02 GLTPFNGYTA ATWTRTAVSS VGMNIPYHGA SYLVRNQELR SWTAADKAAQ + W*01:01:01:03 GLTPFNGYTA ATWTRTAVSS VGMNIPYHGA SYLVRNQELR SWTAADKAAQ + W*01:01:01:04 GLTPFNGYTA ATWTRTAVSS VGMNIPYHGA SYLVRNQELR SWTAADKAAQ + W*01:01:01:05 GLTPFNGYTA ATWTRTAVSS VGMNIPYHGA SYLVRNQELR SWTAADKAAQ + W*01:01:01:06 GLTPFNGYTA ATWTRTAVSS VGMNIPYHGA SYLVRNQELR SWTAADKAAQ + W*02:01 GLTPSNGYTA ATWTRTAASS VGMNIPYDGA SYLVRNQELR SWTAADKAAQ + W*03:01:01:01 GLTPSSGYTA ATWTRTAVSS VGMNIPYHGA SYLVRNQELR SWTAADKAAQ + W*03:01:01:02 GLTPSSGYTA ATWTRTAVSS VGMNIPYHGA SYLVRNQELR SWTAADKAAQ + W*04:01 GLTPSNGYTA ATWTRTAASS VGMNIPYDGA SYLVRNQELR SWTAADKAAQ + W*05:01 GLTPSSGYTA ATWTRTAVSS VGMNIPYHGA SYLVRNQELR SWTAADKAAQ + + W*01:01:01:01 MPWRRNRQSC SKPTCREGGR SGSAKSLRMG RRGCSAQNPK DSHDPPPHL + W*01:01:01:02 MPWRRNRQSC SKPTCREGGR SGSAKSLRMG RRGCSAQNPK DSHDPPPHL + W*01:01:01:03 MPWRRNRQSC SKPTCREGGR SGSAKSLRMG RRGCSAQNPK DSHDPPPHL + W*01:01:01:04 MPWRRNRQSC SKPTCREGGR SGSAKSLRMG RRGCSAQNPK DSHDPPPHL + W*01:01:01:05 MPWRRNRQSC SKPTCREGGR SGSAKSLRMG RRGCSAQNPK DSHDPPPHL + W*01:01:01:06 MPWRRNRQSC SKPTCREGGR SGSAKSLRMG RRGCSAQNPK DSHDPPPHL + W*02:01 MPWRRNMQSC SKPTCREGGR SGSAKSLRMG RRRCTAQNPK RLT + W*03:01:01:01 MPWRRNRQSC SKPTCREGGR SGSAKSLRMG RRGCSAQNPK RLT + W*03:01:01:02 MPWRRNRQSC SKPTCREGGR SGSAKSLRMG RRGCSAQNPK RLT + W*04:01 MPWRRNMQSC SKPTCREGGR SGSAKSLRMG RRGCSAQNPK RLT + W*05:01 MPWRRNRQSC SKPTCREGGR SGSAKSLRMG RRGCSAQNPK DSHDPPPHL + + + + + + + + + + + + + + + + + + + + +