The marsupial inactive X chromosome expresses a long noncoding RNA (lncRNA) called Rsx that has been proposed to be the functional analog of eutherian Xist Despite the possibility that Xist and Rsx encode related functions, the two lncRNAs harbor no linear sequence similarity. However, both lncRNAs harbor domains of tandemly repeated sequence. In Xist, these repeat domains are known to be critical for function. Using k-mer based comparison, we show that the repeat domains of Xist and Rsx unexpectedly partition into two major clusters that each harbor substantial levels of nonlinear sequence similarity. Xist Repeats B, C, and D were most similar to each other and to Rsx Repeat 1, whereas Xist Repeats A and E were most similar to each other and to Rsx Repeats 2, 3, and 4. Similarities at the level of k-mers corresponded to domain-specific enrichment of protein-binding motifs. Within individual domains, protein-binding motifs were often enriched to extreme levels. Our data support the hypothesis that Xist and Rsx encode similar functions through different spatial arrangements of functionally analogous protein-binding domains. We propose that the two clusters of repeat domains in Xist and Rsx function in part to cooperatively recruit PRC1 and PRC2 to chromatin. The physical manner in which these domains engage with protein cofactors may be just as critical to the function of the domains as the protein cofactors themselves. The general approaches we outline in this report should prove useful in the study of any set of RNAs.

Sprague, D., Waters, S. A., Kirk, J. M., Wang, J. R., Samollow, P. B., Waters, P. D., & Calabrese, J. M.
