1Department of Bioinformatics, Sathyabama University, Chennai, India, 2Department of Biochemistry and Molecular Biology, University of Nebraska Medical Center, Omaha, NE 68198-5870, USA, 3Bioinformatics Bishop Heber College, Tiruchirappalli, Tamil Nadu, India.
Email: shashank.shrishrimal@unmc.edu
Received: 27 Nov 2014 Revised and Accepted: 25 Dec 2014
ABSTRACT
Objective: To evaluate the variation of neuraminidase (NA) protein from various H1N1 strain for drug designing.
Methods: In this study we have used 12 sequences of NA protein from various countries, retrieved from Uniprot KB database. We have performed structural analysis, antigenic and glycosylation site prediction between NA proteins of influenza A strains.
Results: Antigenic variants in sequence of NA H1N1 strain from Italy were found to be unique and were not present in any other NA H1N1. Strains from Italy and Thailand were found to be distantly related while others are closely related. We observed the maximum similarity from position 84 to 448 using disorder prediction analysis of different strains. Sequences from 1 to 83 and 449 to 469 showed the maximum dissimilarity among the NA Proteins.
Conclusion: This study focuses on the regions of sequence similarity and dissimilarity of NA in H1N1 strains from different countries. The results in this paper are based on currently available sequences for NA of H1N1 strains and bioinformatic tools. Our study will help in understanding of the regions of high variability due to mutations and conserved domains that can be potential targets for drug development.
Keywords: Influenza, Neuraminidase, Glycosylation, Vaccine.
INTRODUCTION
H1N1 has caused widespread outbreaks, including epidemics and pandemics, of acute upper or lower respiratory tract infection. The 2009 re-emergence of the strain led to declaration of a global H1N1 pandemic by the World Health Organization (WHO) and was the first ever, global pandemic since the 1968 Hong Kong flu. By 2010, the strain had spread to more than 214 countries and caused 18, 138 deaths [1]. This is strain popularly known as “Swine Flu” (swine influenza) because of its origins from pigs by genetic re-assortment [2].
H1N1 also called influenza type A subtype H1N1, belongs to the family of Orthomyxoviridae. The segmented RNA genome of the virus consists of eight strands, made from a single species or multiple species that confers the virus cross-species infectivity. Influenza subtype type A H1N1 strains consist of one strand derived from human flu strains, two from avian (bird) strains, and five from swine strains. The nucleic acid of influenza virus translated into approximately 10 proteins, out of which two are viral membrane glycoproteins: hemagglutinin (H) and neuraminidase (N). They are used to classify the different subtypes of Influenza A virus [1]. They are essential for viral infection and release from infected cells. There are 16 known H proteins and 9 known N proteins, forming different subtypes like the H1N1 subtype. The name H1N1 corresponds to hemagglutinin type 1 (H1) and neuraminidase type 1 (N1) antigens, present on the viral coat [3].
Neuraminidase a major surface glycoprotein, possess enzymatic activity that cleaves the Sialic acid receptor, enabling the virus to release from the host cell after replication. Most of the anti-influenza drugs target the neuraminidase (NA) protein, inhibiting its function and providing some treatment. However, drug resistant mutant strains emerge, limiting the capacity of treatment by most drugs. Mutant strains are also resistant to antibodies generated from the available vaccines [4-7]. The need to design new influenza vaccines every year to keep up with the new strains is challenging and an understanding of the variations in the strains is critical for designing future drugs.
In our study, we evaluate the variation of the neuraminidase protein from various H1N1 strain by comparative analysis. Nucleotide and protein sequence similarity analysis performed using the available tools such as T-coffee, Garnier, Pepstats. We also perform analysis of glycosylation and antigenic variants among different protein sequences of NA H1N1 strain. Divergence among the sequences of NA receptor protein ofH1N1has been done by construction of the phylogenetic tree, using the distance method. Protein disorder analysis helps in understanding the function of short motifs.
MATERIALS AND METHODS
Collection of influenza NA H1N1 strain sequence
The protein sequences of NA H1N1 strain among various countries retrieved [8] from Uniprot KB database [9]. The sequences are from the following countries: Malaysia (1954), New Jersey(USA)(1976), India (1980), Memphis (USA)(1996), New Zealand (2000), Russia (2006), Italy (2009), Thailand (2010), Texas(USA)(2010), Brazil (2011), China (2012) and Kenya (2013).
Sequence similarity analysis
Multiple sequence alignment (MSA) technique [10] was used to identify the divergence and mutations in the protein sequence of NA H1N1 strain obtained from various countries. T-Coffee tool [11] (http: //tcoffee. crg. cat/apps/tcoffee/do: expresso) was used to obtain the MSA. It generates the multiple sequence alignment on the basis of pair wise alignment between possible pairs of the sequence.
Protein sequence analysis
Primary and secondary structure analysis of all the protein sequences was done using Garnier and Pepstats tool from EMBOSS 2.10.0-0.8 package [12].
Analysis of glycosylation and antigenic variants
Variations in the NA glycosylation sites were determined by using NetNGlyc [13] 1.0 online tool. The antigenic divergence was also determined by using CTL Pred tool [14].
Phylogenetic analysis
For the construction of a phylogenetic tree distance method i.e. UPGMA (Un weighted Paired Group Method of Arithmetic mean) by MEGA (Molecular Evolutionary Genetics Analysis) 5.2 was used [15].
Disorder analysis
The disorder regions present within the proteins sequences are predicted using the PONDR®s VLXT software [16-18].
RESULTS AND DISCUSSION
Protein sequence analysis
Primary structures of the proteins were analysed by using Pepstats tool from the EMBOSS package. The result indicates (table 1) the maximum similarity among the amino acid composition from various protein sequences of NA H1N1 strain. The secondary structure of protein was analysed by using Garnier tool from the EMBOSS package (table-2) shows, there is a slight variation among the protein sequences.
Table 1: Amino acid composition among various NA H1N1 strains
Amino Acid % | Malaysia | New jersey (USA) | India | Memphis (USA) | New Zealand | Russia | Italy | Thailand | Texas (USA) | Brazil | China | Kenya |
Ala | 3.4 | 3.8 | 3.8 | 4.3 | 4.3 | 4.3 | 3.8 | 3.3 | 3.4 | 3.4 | 3.4 | 3.4 |
Cys | 4.0 | 4.1 | 4.0 | 3.8 | 3.8 | 3.8 | 4.5 | 4.2 | 4.1 | 4.1 | 4.1 | 4.1 |
Asp | 6.3 | 4.6 | 5.7 | 5.5 | 5.1 | 5.1 | 4.8 | 4.0 | 4.3 | 4.1 | 4.3 | 4.3 |
Glu | 3.6 | 3.6 | 3.6 | 3.8 | 3.8 | 3.6 | 4.5 | 4.5 | 4.3 | 4.3 | 4.3 | 4.3 |
Phe | 3.4 | 3.6 | 3.4 | 3.4 | 3.6 | 3.6 | 4.2 | 4.0 | 3.8 | 3.7 | 3.8 | 3.8 |
Gly | 9.6 | 9.8 | 9.6 | 9.6 | 9.6 | 9.6 | 10.4 | 9.6 | 9.6 | 9.7 | 9.6 | 9.6 |
His | 1.7 | 1.7 | 1.7 | 1.7 | 1.9 | 1.9 | 1.3 | 1.4 | 1.3 | 1.3 | 1.3 | 1.3 |
Ile | 9.6 | 9.8 | 8.9 | 8.9 | 8.9 | 9.1 | 8.1 | 9.4 | 9.8 | 9.7 | 10.0 | 9.4 |
Lys | 4.5 | 4.2 | 4.5 | 4.2 | 5.1 | 5.3 | 4.3 | 4.0 | 4.3 | 4.1 | 4.3 | 4.3 |
Leu | 4.0 | 5.1 | 4.0 | 4.7 | 4.5 | 4.5 | 3.8 | 4.0 | 3.8 | 3.9 | 3.8 | 3.8 |
Met | 2.3 | 2.1 | 2.1 | 2.1 | 1.5 | 1.7 | 1.0 | 1.0 | 1.5 | 1.5 | 1.5 | 1.5 |
Asn | 6.4 | 7.0 | 6.6 | 7.0 | 7.4 | 7.4 | 8.1 | 9.1 | 9.1 | 9.1 | 8.3 | 8.3 |
Pro | 4.7 | 4.4 | 4.7 | 4.5 | 4.3 | 4.3 | 5.1 | 4.5 | 4.7 | 4.7 | 4.7 | 4.7 |
Gln | 2.8 | 2.9 | 2.6 | 2.6 | 2.3 | 2.6 | 2.0 | 3.3 | 3.2 | 3.2 | 3.2 | 3.2 |
Arg | 4.5 | 4.0 | 4.7 | 4.7 | 3.6 | 3.4 | 4.3 | 4.0 | 3.6 | 3.7 | 3.6 | 3.8 |
Ser | 10.4 | 11.3 | 10.6 | 10.2 | 10.6 | 10.6 | 12.4 | 12.2 | 11.5 | 11.4 | 11.9 | 12.2 |
Thr | 6.6 | 6.2 | 7.0 | 7.0 | 7.0 | 7.0 | 5.1 | 5.4 | 5.5 | 5.6 | 5.8 | 5.8 |
Val | 5.7 | 4.9 | 6.0 | 5.3 | 6.1 | 5.7 | 6.1 | 6.1 | 6.1 | 6.0 | 5.8 | 6.0 |
Trp | 3.4 | 3.4 | 3.4 | 3.4 | 3.4 | 3.4 | 3.3 | 3.3 | 3.4 | 3.4 | 3.4 | 3.4 |
Tyr | 3.0 | 3.0 | 3.0 | 3.2 | 3.0 | 3.0 | 3.0 | 3.3 | 3.0 | 3.0 | 3.0 | 3.0 |
Table 2: Secondary structure of selected H1N1 strains
Viral Strain | Helix (%) | Strand (%) | Turns (%) | Random coil (%) |
Malaysia | 8.5 | 33.7 | 38.7 | 19.1 |
New Jersey (USA) | 6.5 | 33.7 | 37.2 | 22.6 |
India | 8.1 | 34.5 | 37.7 | 19.8 |
Memphis (USA) | 8.1 | 32.6 | 38.9 | 20.4 |
New Zealand | 10.0 | 32.6 | 38.1 | 20.6 |
Russia | 9.1 | 32.1 | 37.4 | 21.3 |
Italy | 5.8 | 29.8 | 38.6 | 25.8 |
Thailand | 4.7 | 33.3 | 37.2 | 24.8 |
Texas (USA) | 6.6 | 33.3 | 35.4 | 24.7 |
Brazil | 5.4 | 33.8 | 36.0 | 24.8 |
China | 6.6 | 33.3 | 35.6 | 24.5 |
Kenya | 6.6 | 32.2 | 37.1 | 24.1 |
Table 3: Glycosylation sites in NA H1N1 strains
Malaysia | New jersey (USA) | India | Memphis (USA) |
Position | Sequence | Position | Sequence |
63 | NQTY | 50 | NQSV |
88 | NSSL | 63 | NQTY |
146 | NGTV | 68 | NISN |
235 | NGSC | 146 | NGTV |
455 | NWSW | 235 | NGSC |
New Zealand | Russia | Italy | Thailand |
Position | Sequence | Position | Sequence |
88 | NSSL | 88 | NSSL |
146 | NGTV | 146 | NGTV |
235 | NGCS | 455 | NWSW |
455 | NWSW | ||
Texas (USA) | Brazil | China | Kenya |
Position | Sequence | Position | Sequence |
63 | NQTY | 63 | NQTY |
68 | NISN | 68 | NISN |
88 | NSSL | 88 | NSSL |
146 | NGTI | 146 | NGTI |
235 | NGSC | 235 | NGSC |
386 | NFSI | 386 | NFSI |
Analysis of glycosylation sites
The glycosylation sites (Table-3) identified using NetNGlyc 1.0 (http: //www. cbs. dtu. dk/services/NetNGlyc/) of CBS server, to compare the post-translational modification of the protein sequences. The glycosylation site prediction (Table 3), shows that there is a common glycosylation site at position 146 (NGTV) and 455 (NWSW) in the NA H1N1 sequence from Malaysia, India, USA, New Zealand and Russia. Italy protein sequence of NA H1N1 has a unique glycosylation site i.e. NISN at position 1 which did not exist in any previous appeared strain except USA NA H1N1 strain. This site is present in all the further available sequence after 2009. Sequence from Brazil, China and Kenya holds common glycosylation sites i.e. NQTY, NISN, NSSL, NGTI, NGSC, and NFSI at position 63, 68, 88, 146, 235, 386.
Prediction of antigenic variants
Antigenic variants from all the protein sequences were predicted using CTLPred tool (http: //www. imtech. res. in/raghava/ctlpred/index. html) from the imtech server. It predicts the CTL (Conserved cytotoxic T lymphocytes) epitopes, which helps in the design of the subunit vaccine. The result indicate that position 55-63 (TYENNTWVM), 167-175 (PSPYNSRFE), 228-236 (ESECVCVNG) are conserved in the NA sequences, but 2009 NA H1N sequence shows the unique antigenic variants i.e. SKDNSIRIG, SASACHDGI, and IITDTIKSW at position 33-42, 113-121 and 144-152 respectively.
Phylogenetic analysis
Phylogenetic analysis indicates the separation of one sequence from the other. Its divergence is measured in terms of branch length. The phylogenetic tree indicated that protein Sequences of NA H1N1 from New Zealand, Russia, USA, Malaysia, andIndia areclosely related with theBrazil, China and Kenya with branch length of 0.0606, and sequence of NAH1N1 of Italy and Thailand are distantly related.
Disorder prediction
The disorder region of the protein predicted using PONDR®s VLXT software gives the graphical as well as text view of disorder region (table-5). The threshold value is set to 0.5, for the prediction of disorder region of the sequence. A peak over the threshold value shows the disorder region and those present below the threshold value considered as normal region.
Antigenic variants (table-4) and disorder prediction (table-5) also depicts that there is a similarity between NA H1N1 sequences from Malaysia, India, New Zealand and Russia with the exception of USA. Same way we can say that there is a common similarity among Thailand, Brazil, China and Russia. So from both the tables we can state that Italy NA H1N1 is the unique one among the sequence taken for the study.
Table 4: Antigenic variants for NA H1N1 sequences
Malaysia | New jersey (USA) | India | Memphis (USA) |
Position | Sequence | Position | Sequence |
220 | ESECVCVNG | 220 | ESECVCING |
366 | SSRKGFEMI | 44 | SNPKVCNQS |
55 | TYENNTWVN | 55 | TYENNTWVN |
167 | PSPYNSRFE | 167 | PSPYNSRFE |
179 | WASSACNDG | 179 | WSASACHDG |
New Zealand | Russia | Italy | Thailand |
Position | Sequence | Position | Sequence |
206 | LTQGALLND | 228 | ESECVCMNG |
230 | LMSEPLGEA | 242 | MTDGPSNGA |
377 | SFNQNLDYQ | 167 | PSPYNSKFE |
386 | IGYICSGVF | 179 | WSASACHDG |
22 | ESINFLENA | 235 | NGSCFTIMT |
Texas (USA) | Brazil | China | Kenya |
Position | Sequence | Position | Sequence |
42 | NQNQIETCN | 228 | ESECACVNG |
55 | TYENNTWVN | 42 | NQNQIETCN |
167 | PSPYNSRFE | 55 | TYENNTWVN |
220 | ESECACVNG | 167 | PSPYNSRFE |
239 | FTIMTDGPS | 239 | FTIMTDGPS |
Table 5: Disorder region of NA sequence
Strain | Position | Disorder | No. of disorder |
Malaysia | 1-2, 4, 76-82, 148-169, 332-337 | MN, N, AGKDTTS, TVKDRSPYRALMSCPIGEAPSPY, KGSCDP | 5 |
New Jersey (USA) | 70-89, 148-169, 334-337, 460-464 | SNTNIAAGQGVTPIILAGNS, TVKDRSPYRTLMSCPIGEAPSP, NCGP, GADLP | 4 |
India | 1-2, 4, 34-37, 79-82, 148-169, 215-224, 332-337, 461-465 | MN, N, VSHS, DTTS, TVKDRSPYRALMS CPIGEAPSP, TIKSWRKRIL, KGSCDP, GAELP | 8 |
Memphis (USA) | 1-2, 4, 34-38, 80-83, 148-169, 217-224, 329-339, 358-372, 461-465 | MN, N, ASHSI, KTSM, TVKDRSPYRALMSCPLGEAPSP, KSWKKRIL, KDGEGSCNPVT, WIGRTKSNRLRKGFE, GAELP | 9 |
New Zealand | 1-2, 4, 33-38, 148-165, 217-224, 329-339, 363-371, 461-465 | MN, N, WASHSI, TVKDRSPYRALMSCPLGE, KSWKKRI, KDGEGSCNPVT, KSNRLRKGF, GAELP | 8 |
Russia | 1-2, 4, 33-38, 148-165, 215, 333-338, 363-370, 461-465 | MN, N, WASHSI, TVKDRSPYRALMSCPLGE, T,GSCNPV, KSNRLRKG, GAELP | 8 |
Italy | 17-23, 82-100, 265-271, 391-392, 394-396 | KLAGNSS, IKDRSPYRTLMSCPIGEVP, TGSCGPV, PD, AEL | 5 |
Thailand | 63-69, 128-146, 311-317 | KLAGNSS, IKDRSPYRTLMSCPIGEVP, TGSCGPV | 3 |
Texas (USA) | 1-2, 4, 84-90, 149-167, 332-338, 461-464 | MN, N, KLAGNSS, IKDRSPYRTLMSCPIGEVP, KGSCGPV, AELP | 6 |
Brazil | 1-2, 4, 84-89, 149-167, 332-338, 456-464 | MN, N, KLAGNS, IKDRSPYRTLMSCPIGEVP, TGSCGPV, SWPDGAELP | 6 |
China | 1-2, 4, 84-90, 149-167, 332-338, 461-464 | MN, N, KLAGNSS, IKDRSPYRTLMSCPIGEVP, TGSCGPV, AELP | 6 |
Kenya | 1-2, 4, 84-90, 149-167, 332-338, 461-464 | MN, N, KLAGNSS, IKDRSPYRTLMSCPIGEVP, TGSCGPV, AELP | 6 |
Fig. 1: A phylogenetic tree among various NA sequences obtained from different country
A | B | C |
D | E | F |
G | H | I |
J | K | L |
Fig. 2: Graphical representation of Disorder region. The X-axis represents the residue number of the protein sequence, while Y-axis represents the score value. Threshold is the cut-off value for prediction of disorder region
CONCLUSION
The influenza virus enables its spread through the human body by means of its Neuraminidase receptor protein enzyme present on its surface. The NA enzyme facilitates the release and subsequent growth of progeny virions following the intracellular viral replication cycle. NA exhibits its main function during the initial stages of infection when it cleaves sialic acid from the cell surface as well as of the progeny virions, which enable its release from the infected cells and thus it, spreads further into the body by infecting other normal healthy cells [19] Antibodies against the NA enzyme can inhibit it and regulate the infection but the various Antigenic variations of the NA enzyme makes the antibodies ineffective in a vaccine [20].
In this study, we have considered different protein sequences of NA H1N1 strains from different countries to learn about the region of similarity and dissimilarity. So this sequence analysis study revealed that there is a slight difference between these sequences, but the protein sequence of NA H1N1from Italy shows that, it has multiple variations. So our study has found that the protein sequence of NA H1N1 strains from Italy was the unique one. Apart from we also suggest that, protein sequence of NA H1N1 strain from Thailand, Brazil, China and Kenya are similar in characteristics.
CONFLICT OF INTERESTS
Declared None.
REFERENCES