Active Alu element "A-tails": Size does matter

Astrid M. Roy-Engel, Tulane University School of Public Health and Tropical Medicine
Abdel Halim Salem, Louisiana State University
Oluwatosin O. Oyeniran, Tulane University School of Public Health and Tropical Medicine
Lisa Deininger, Tulane University School of Public Health and Tropical Medicine
Dale J. Hedges, Louisiana State University
Gail E. Kilroy, Louisiana State University
Mark A. Batzer, Louisiana State University
Prescott L. Deininger, Tulane University School of Public Health and Tropical Medicine

Abstract

Long and short interspersed elements (LINEs and SINEs) are retroelements that make up almost half of the human genome. L1 and Alu represent the most prolific human LINE and SINE families, respectively. Only a few Alu elements are able to retropose, and the factors determining their retroposition capacity are poorly understood. The data presented in this paper indicate that the length of Alu "A-tails" is one of the principal factors in determining the retropositional capability of an Alu element. The A stretches of the Alu subfamilies analyzed, both old (Alu S and J) and young (Ya5), had a Poisson distribution of A-tail lengths with a mean size of 21 and 26, respectively. In contrast, the A-tails of very recent Alu insertions (disease causing) were all between 40 and 97 bp in length. The L1 elements analyzed displayed a similar tendency, in which the "disease"-associated elements have much longer A-tails (mean of 77) than do the elements even from the young Ta subfamily (mean of 41). Analysis of the draft sequence of the human genome showed that only about 1000 of the over one million Alu elements have tails of 40 or more adenosine residues in length. The presence of these long A stretches shows a strong bias toward the actively amplifying subfamilies, consistent with their playing a major role in the amplification process. Evaluation of the 19 Alu elements retrieved from the draft sequence of the human genome that are identical to the Alu Ya5a2 insert in the NF1 gene showed that only five have tails with 40 or more adenosine residues. Sequence analysis of the loci with the Alu elements containing the longest A-tails (7 of the 19) from the genomes of the NF1 patient and the father revealed that there are at least two loci with A-tails long enough to serve as source elements within our model. Analysis of the A-tail lengths of 12 Ya5a2 elements in diverse human population groups showed substantial variability in both the Alu A-tail length and sequence homogeneity. On the basis of these observations, a model is presented for the role of A-tail length in determining which Alu elements are active.