From a102a1325881c44e5050bd8efbfeb04c101bb4a5 Mon Sep 17 00:00:00 2001
From: Peter Cock
Date: Tue, 5 Nov 2019 09:56:49 +0000
Subject: [PATCH] Sample problematic GCG MSF file from IPD-IMGT/HLA database
https://github.com/ANHIG/IMGTHLA/blob/3300/msf/W_prot.msf
as of commit d99d8aca3f01f7431741a998ea5cc2417d53ac9c
(26 Oct 2017), i.e. from v3.30.0 of the IMGTHLA dtabase.
This file has a discrepancy between the alignment length
(99 columns) and four of the sequences (only 93 letters
without trailing gap padding).
The initial Biopython GCG MSF parser will accept this file
(and apply the missing padding) with a warning.
---
Tests/msf/W_prot.msf | 61 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 61 insertions(+)
create mode 100644 Tests/msf/W_prot.msf
diff --git a/Tests/msf/W_prot.msf b/Tests/msf/W_prot.msf
new file mode 100644
index 000000000..71d886eed
--- /dev/null
+++ b/Tests/msf/W_prot.msf
@@ -0,0 +1,61 @@
+!!AA_MULTIPLE_ALIGNMENT
+
+ MSF: 99 Type: P Oct 18, 2017 11:35 Check: 0 ..
+
+ Name: W*01:01:01:01 Len: 99 Check: 7236 Weight: 1.00
+ Name: W*01:01:01:02 Len: 99 Check: 7236 Weight: 1.00
+ Name: W*01:01:01:03 Len: 99 Check: 7236 Weight: 1.00
+ Name: W*01:01:01:04 Len: 99 Check: 7236 Weight: 1.00
+ Name: W*01:01:01:05 Len: 99 Check: 7236 Weight: 1.00
+ Name: W*01:01:01:06 Len: 99 Check: 7236 Weight: 1.00
+ Name: W*02:01 Len: 93 Check: 9483 Weight: 1.00
+ Name: W*03:01:01:01 Len: 93 Check: 9974 Weight: 1.00
+ Name: W*03:01:01:02 Len: 93 Check: 9974 Weight: 1.00
+ Name: W*04:01 Len: 93 Check: 9169 Weight: 1.00
+ Name: W*05:01 Len: 99 Check: 7331 Weight: 1.00
+//
+
+ W*01:01:01:01 GLTPFNGYTA ATWTRTAVSS VGMNIPYHGA SYLVRNQELR SWTAADKAAQ
+ W*01:01:01:02 GLTPFNGYTA ATWTRTAVSS VGMNIPYHGA SYLVRNQELR SWTAADKAAQ
+ W*01:01:01:03 GLTPFNGYTA ATWTRTAVSS VGMNIPYHGA SYLVRNQELR SWTAADKAAQ
+ W*01:01:01:04 GLTPFNGYTA ATWTRTAVSS VGMNIPYHGA SYLVRNQELR SWTAADKAAQ
+ W*01:01:01:05 GLTPFNGYTA ATWTRTAVSS VGMNIPYHGA SYLVRNQELR SWTAADKAAQ
+ W*01:01:01:06 GLTPFNGYTA ATWTRTAVSS VGMNIPYHGA SYLVRNQELR SWTAADKAAQ
+ W*02:01 GLTPSNGYTA ATWTRTAASS VGMNIPYDGA SYLVRNQELR SWTAADKAAQ
+ W*03:01:01:01 GLTPSSGYTA ATWTRTAVSS VGMNIPYHGA SYLVRNQELR SWTAADKAAQ
+ W*03:01:01:02 GLTPSSGYTA ATWTRTAVSS VGMNIPYHGA SYLVRNQELR SWTAADKAAQ
+ W*04:01 GLTPSNGYTA ATWTRTAASS VGMNIPYDGA SYLVRNQELR SWTAADKAAQ
+ W*05:01 GLTPSSGYTA ATWTRTAVSS VGMNIPYHGA SYLVRNQELR SWTAADKAAQ
+
+ W*01:01:01:01 MPWRRNRQSC SKPTCREGGR SGSAKSLRMG RRGCSAQNPK DSHDPPPHL
+ W*01:01:01:02 MPWRRNRQSC SKPTCREGGR SGSAKSLRMG RRGCSAQNPK DSHDPPPHL
+ W*01:01:01:03 MPWRRNRQSC SKPTCREGGR SGSAKSLRMG RRGCSAQNPK DSHDPPPHL
+ W*01:01:01:04 MPWRRNRQSC SKPTCREGGR SGSAKSLRMG RRGCSAQNPK DSHDPPPHL
+ W*01:01:01:05 MPWRRNRQSC SKPTCREGGR SGSAKSLRMG RRGCSAQNPK DSHDPPPHL
+ W*01:01:01:06 MPWRRNRQSC SKPTCREGGR SGSAKSLRMG RRGCSAQNPK DSHDPPPHL
+ W*02:01 MPWRRNMQSC SKPTCREGGR SGSAKSLRMG RRRCTAQNPK RLT
+ W*03:01:01:01 MPWRRNRQSC SKPTCREGGR SGSAKSLRMG RRGCSAQNPK RLT
+ W*03:01:01:02 MPWRRNRQSC SKPTCREGGR SGSAKSLRMG RRGCSAQNPK RLT
+ W*04:01 MPWRRNMQSC SKPTCREGGR SGSAKSLRMG RRGCSAQNPK RLT
+ W*05:01 MPWRRNRQSC SKPTCREGGR SGSAKSLRMG RRGCSAQNPK DSHDPPPHL
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+