Open Access

Effect of exogenous transcription factors integration sites on safety and pluripotency of induced pluripotent stem cells


Cite

Introduction

In 2006, Takahashi and Yamanaka [1] generated induced pluripotent stem cells (iPSCs) through introducing Oct4, Sox2, Klf4 and c-Myc into mouse fibroblasts. Later, numerous studies demonstrated that iPSCs and embryonic stem cells (ESCs) were similar in both the cell morphology and functions of self-renewal and differentiation into almost all cell types. Particularly, Zhao et al. [2] and Zhao et al. [3] reported the generation of viable and fertile mice through tetraploid complementation, an assay considered to be the most stringent test for pluripotency and developmental potency. The assay was performed in such a way that at the 2-cell embryo stage, the two cells were fused to create tetraploid (4N) embryos, which would normally cease to develop at a later stage, so that no viable embryos were generated. When the iPSCs were injected into these tetraploid embryos, the resulting embryos or animal must develop entirely from the injected iPSCs. Thus, the successful tetraploid complementation assay confirmed that iPSCs could attain true pluripotency that is extremely similar to embryonic stem cells (ESCs), generated from in vivo or nuclear transfer embryos [4,5]. Together with the fact that iPSCs were believed to have less of an ethics issue compared to ECSs, making them important and valuable resources for both medical applications and fundamental science research [6,7].

With promising potential for clinical application, human iPSCs provide hopeful resolutions for disease research, cell therapy and even organ transplantation. Human iPSCs can be derived from a patient’s somatic cells, which can be differentiated into a disease model, such as Parkinson’s and Huntington’s models [8,9]. Compared with the use of traditional immortalized cell lines, patient-derived iPSCs can accurately reflect the drug response in patients, thus making them a valuable tool for screening new drug candidates [10,11]. In addition, iPSCs can also be the potential sources for curing some kind of genetic diseases. Hanna et al. [12] demonstrated that sickle cell anemia model mouse can be rescued by transplanting hematopoietic progenitors derived from corrected autologous iPSCs. And our laboratory also reported that the transplantation of iPSCs with the normal human β-globin gene into the β+-thalassemia (β+-thal) IVS-II-654 (C>T) (HBB: c.316-197C>T) (http://globin.cse.psu.edu) blastocytes conditionally reversed the pathology of anemia [13]. In 2014, the first autologous iPSCs derived from retinal tissue were applied clinically [14,15], representing the landmark of iPSCs that took the step from basic research to clinical application. Recently, a series of clinical trials were proposed and performed in Japan, the iPSCs were used to treat heart disease and Parkinson’s disease [16, 17, 18]. Despite this, however, the iPSCs’ safety is still an important issue before their application in stem cells and regenerative medicine. During the process of iPSCs induction by retrovirus, integration of viral DNA into host cells is an essential step. However, the random integration of the exogenous genes into genome of host cells may affect the pluripotency and safety of iPSCs [19], especially by inactivating some functional genes or activating pro-oncogene. To investigate the influence of exogenous transcription factors (TFs) to the pluripotency and safety of iPSCs, this study detected the integration sites of exogenous genes and the function of the flanking genes in three iPSC lines, which were shown to be fully pluripotent through tetraploid complementation assay, reported in our previous study [2].

Materials and methods

Nested Inverse Polymerase Chain Reaction. Nested inverse polymerase chain reaction (iPCR) was used to detect the integration sites of four exogenous transcriptional factors in three iPSC lines (Figure 1). Genome DNA fragmentation and circularization: iPSCs IP14D-1, IP14D-6, IP14D-101 [2] and mouse tail fibroblast cells (B6D2F1) were harvested. The genomic DNA was extracted [TIANamp genomic DNA kit; Tiangen Biotech, Beijing, People’s Republic of China (PRC)], and then digested in 40 μL of digestion reaction that contained 2 peg of genome DNA, 2 μL BamHI, 2 μL BglII, 4 μL 10× Fast Digest Buffer (Fermentas Life Science, St. Leon-Rot, Germany). After incubation in 37 °C for 4 hours, 5 μL digests were electrophoreses in 0.9% agarose gel. The remaining digests were purified by QIAquick Gel Extract Kit (Qiagen GmbH, Hilden, Germany) and then self-ligated within 1000U T4 DNA ligase (New England Biolabs Ltd., Hitchin, Hertfordshire, UK) in a 200 μL reaction. After overnight incubation at 22 °C, ligated DNA was purified and dissolved in 30 μL water.

Figure 1

Schematic diagram of nested iPCR.

Nested iPCR Primer Designations. We designed the forward primers paired with the skeleton of the vector, and the inverse primers paired with exogenous TFs (Plasmid pMXs-Sox2: No.13367, pMXs-Oct4: No.13366, pMXs-Klf4: No. 13370, pMXs-c-Myc: No. 13375; Addgene, Watertown, MA, USA) (Table 1).

The primer sequences of nested inverse polymerase chain reaction

PrimerSequences (5’>3’)
First iPCR-FAAA ATA ATA ATA ACC GGG CAG GCC A
First iPCR-Sox2-RCCT TCT TCA TGA GCG TCT TGG TTT T
First iPCR-Oct4-RGTG TCC CTG TAG CCT CAT ACT CTT C
First iPCR-Klf4-RCTT TGC TAA CAC TGA TGA CCG AAG G
First iPCR-c-Myc-RTCT TCT CCA CAG ACA CCA CAT CAA T
Second iPCR-FCAG CAC AGT GGT CGA CGA TAA AAT A
Second iPCR-Sox2-RTTC AGC TCC GTC TCC ATC ATG TTA T
Second iPCR-Oct4-RTTT GCA TAT CTC CTG AAG GTT CTC A
Second iPCR-Klf4-RGGG TTA GCG AGT TGG AAA GGA TAA A
Second iPCR-c-Myc-RCCTCCAAGTAACTCGGTCATCATCT

iPCR: inverse polymerase chain reaction; F: forward; R: reverse.

First iPCR. The cyclized DNA from iPSCs IP14D-1, IP 14D-6, IP 14D-101 and mouse B6D2F1, were used as the template for PCR amplification in 25 μL reaction system that contained 1 μL of cyclized DNA, 0.75 μL forward first ICR primer and 0.75 μL reverse first iPCR primer, 2.5 μL PER buffer for KOD-Plus- (10×), 2.5 μL dNTPs (2 mM), 1 μL MgSO4 (2 mM), 0.5 μL KOD-Plus- (Toyobo Co. Ltd., Osaka, Japan). The reaction condition was 94 °C for 2 min., 25 cycles at 94 °C for 15 seconds, 60 °C for 30 seconds, 68 °C for 3.5 min.

Second iPCR. The secondary iPCR system used 1 μL product of the first iPCR as template and the primers were 0.75 μL forward second iPCR primer and 0.75 μL reverse second iPCR primer, followed the same program as the first iPCR. The reaction condition was 94 °C for 2 min., 25 cycles at 94 °C for 15 seconds, 60 °C for 30 seconds, 68 °C for 3.5 min.

The second PCR product was electrophoresed on 1.0% agarose, the specific fragment was recycled for sequencing. To find the integration sites of the exogenous TFs, the target and vectors sequence were aligned using BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi).

Microarray Data Analysis of Flanking Genes. After confirmation of the integration sites, expressing pattern of the flanking genes of the integration sites in IP14D-1, IP14D-6, IP14D-101, mouse embryonic fibroblast (MEF), and ESCs were compared using the global expression analysis (part microarray data were obtained from the Gene Expression Omnibus repository, accession number GSE16925; https://ncbi.nlm.nih.gov/geo/).

Gene Oncology and Cluster Analysis of Flanking Genes. To verify whether the flanking genes were related to development, cell differentiation or cancer, functions of the flanking genes were analyzed using the online database MGI (http://www.informatics.jax.org/). Moreover, cluster analysis was used to investigate the functions of the flanking genes and to identify whether any important signaling pathways were involved. In addition, we analyzed the 500 kb of downstream and upstream flanking genes from the integration sites through mouse genomic information. To verify whether these genes were associated with tumorigenesis, flanking genes with known oncogenes were compared according to the database described by Akagi et al. [20].

Results

Identification of Transgene Integration Sites. Electrophoresis results of digested genomic DNA showed large diffused bands, indicating that the genomic DNA were digested completely [Figure 2(A)], while nested iPCR products displayed multiple bands, indicating multiple integration sites of exogenous TFs in iPSCs [Figure 2(B)]. Through alignment of the specific sequence, we identified 22 integration sites for the four TFs in three different iPSC lines, in which 17 (77.3%) were located in intergenic regions, and five (22.7%) were located in introns far from the transcription start sites.

Figure 2

Nested iPCR. (A) Restriction digestion of genomic DNA. M: 1 kb marker. (B) The result of nested iPCR. M: 1 kb marker; Oct4, Sox2, Klf4, c-Myc: exogenous TFs.

Expression Profile Analysis of Flanking Genes of Integration Sites. For the 22 sites above, 39 flanking genes were involved (Table 2). Of the 39 flanking genes, five genes including LOC101055956, LOC105244150, Olfr456, Olfr455 and Nanos3, were not expressed in iPSCs, MEF or ESCs. The expression patterns of the remaining 34 genes in three iPSC lines, MEF and ESCs were compared, and no distinct expression difference was observed in these cells (Figure 3), indicating that the integration of exogenous genes did not activate the expression of flanking genes.

The identified integration sites and their flanking genes.

Flanking Genes and Their Distance
Reprogramming FactorsiPSCsChromosome No. of Integration SiteGenBank Acc. No.eIntegration SitGene5Distance (kb)3Distance (kb)
Sox2IP14D-110NC_000076.6intergenicPhlda18.3Krr1450.0
10NC_000076.6intergenicAldh8a1301.0Sgk1185.0
IP14D-61NC_000076.6intergenicCops7b28.2Nppc35.7
IP14D-10111NC_000077.6intronsVat1
8NC_000074.6intergenicNanos320.9Zswim414.3
IP14D-115NC_000081.6intergenicCdh62390.0LOC101055956461.1
14NC_000080.6intergenicRarb287.3Thrb554.7
Oct4IP14D-616NC_000082.6intergenicAtp13a3111.0Tmem4414.1
7NC_000073.6intronsSamd4
IP14D-1015NC_000071.6intergenicAlox5ap102.8Medag15.7
2NC_000068.6intronsSlc13a3
7NC_000073.6intergenicRassf10196.2Amtl128.1
IP14D-18NC_000074.6intergenicKbtbd112.7Myom225.4
5NC_000071.6intergenicFryl57.7LOC1052441505.8
Klf46NC_000072.6intronsJagn1
IP14D-611NC_000077.6intergenicHspa46.3Zcche1018.1
IP14D-1014NC_000070.6intergenicLsm1013.8Stk406.2
6NC_000072.6intergenicAnkrd71579.0Kcnd2766.4
IP14D-13NC_000069.6intergenicAbcd3126.2Arhgap2933.7
c-Myc6NC_000072.6intergenicOlfr4561.4Olfr45549.3
IP14D-68NC_000074.6intronsDctn6
IP14D-1016NC_000072.6intergenicKcna4459.7Mettl151289.0

iPSCs: induced pluripotent stem cells; No.: number; Acc. No.: access number.

Figure 3

Expression profiles of flanking genes from microarray analysis. (A) Expression profiles of Oct4 flanking genes. (B) Expression profiles of Sox2 flanking genes. (C) Expression profiles of Klf4 flanking genes. (D) Expression profiles of c-Myc flanking genes.

Functional Analysis of Flanking Genes of Integration Sites. For the 39 flanking genes, none of them was found to have the function as early embryonic development or cancer-related genes (Table 3). Of the five flanking genes mentioned above, Olfr456 and Olfr455 were olfactory receptors that were specifically expressed in olfactory cells [21,22], while Nanos3 was involved in spermatogenesis and specific expressed in germ cells [23]. LOC101055956 and LOC105244150 are predicted genes, and the other 34 genes are necessary for normal life activities. On the other hand, the cluster analysis results showed that these genes have different functions and were involved in different metabolic activities. Moreover, no common function was found in early embryonic development or differentiation between these genes except that some of them, including LSM10, were relative to cartilage development (Figure 4). Additionally, we further studied the function of genes located 500 kb upstream and downstream from integration sites and none of the genes that related to tumorigenesis was found in that range. Taken together, we concluded that the integration of exogenous genes would not influence the safety of iPSCs.

Figure 4

The GO classification of flanking genes.

The molecular functions of flanking genes.

GenesFunctionMainly Process Involved
Phlda1protein bindingcell apoptosis; programmed cell death
Krr1RNA binding; POLY-A bindingbiosynthesis of ribosomes; rRNA process
Vat1Zn2+ binding; oxioreductase activitynegative regulation of mitochondrial fusion; redox
Lsm10RNA binding; histone precursor mRNA bindingpositive regulation of mitosis from G1 to S phase
Stk40kinase activation; ATP bindingembryonic ectoderm differentiation; glucose metabolism
Nanos3RNA binding; Zn2+ bindingoogenesis; spermatogenesis
Zswim4metal ion binding; Zn2+ bindingbiochemical process
Samd4POLY-A bindingnegative regulation process of synapse; translation
Alox5approtein binding; enzyme bindingmetabolism process of leukotrienes
Medagcatalysis; bindingadipocyte differentiation
Rassf10catalysis; bindingbiochemical processes; signal transduction
ArntlDNA binding; RNA binding; protein bindingregulation of gene expression related to circadian clock
Hspa4ATP binding; protein complex bindingintroduction of mitrochondrial outer membrance proteins
Zcchc10metal ion bindingbiochemical processes
Ankrd7catalysis; bindingsupport cell maturation
Kcnd2ion channel activationK+ transmembrane transport
Dctn6kinetic protein bindingmotochondrial cavity
Olfr456; Olfr455protein bindingolfactory receptors; signal transduction
Atp13a3ATP binding; ATP enzyme activationNa+ transmembrane transport; cation transport
Tmem44transmembrane protein bindingbiochemical processes
Kbtbd11N/Atranscriptional regulation
Myom2components of cytoskeleton structuralmuscle contraction
Jagn1molecular function (basic cell activity)neutrophil-fungal-induced mediated immune immune response response;
Frylprotein bindingbiochemical processes
Cops7btransmembrane protein functional moleculedeubiquitination
Nppcpeptide hormone receptor binding; hormone activityvasodilation; endochondral growth
Aldh8alaldehyde dehydrogenase activityretinoic acid metabolism
Sgk1ATP binding; protein binding; kinase activitypositive negative regulation regulation of of Naapoptosis + transportation;
Abcd3ATP binding; protein bindingtransmembrane transportation; long chain fatty acid metabolism
Arhgap29activity of GDP enzyme; protein bindingintercellular signal transduction
Cdh6Ca2+ binding; metal ion bindingcell adhesion
Slcl3a3cotransport activity; transmembrane transport activitysuccinic acid transportation; Na+ transportation
Rarbbinding; retinoic acid receptorpositive digestive regulation system development; of cell proliferation; neurogenesis
Thrbthyroxine; activity of receptor thyroxineintracellular receptor of signaling pathway; organ morphonenesis
Mettl15activity of methyltransferasemethylation; ribosomal methylation
Kcna4K+ binding; ion channel activationion transport; K+ transmembrane transport

POLY-A: polyadenylic acid; Zn2+: zinc ions; ATP: adenosine triphosphate; N/A: not available; GDP enzyme: guanosine 5-diphosphates enzyme; Ca2+: calcium ions; K+: potassium ions.

LOC101055956 and LOC105244150 are predicted genes, the function and mainly process involvement is as yet unknown.

Discussion

Once the first successful induction of pluripotent stem cells from adult cells, Hanna et al. [12] demonstrated that iPSCs could be used to treat sickle cell anemia in the mouse model. Later, Raya et al. [6] obtained disease-corrected, patient-specific iPSCs that can be used for cell therapy without immune rejection. Using chimera models, Yang et al. [13] demonstrated that gene-modified iPSCs derived from the β+-thal IVS-II-654 mouse significantly improved the disorders in β+-thal IVS-II-654 mice, especially when the chimerism of iPSCs with normal human β-globin gene was more than 30.0%.

The traditional method of pluripotent cell induction is transferring the exogenous TFs into MEF via retrovirus. However, retrovirus can randomly integrate the exogenous genes into host cells with multiple copies, which may inactivate the tumor suppressor genes or activate the protooncogene or make any frameshift mutation [24]. Thus, retrovirus could probably influence the pluripotency and safety of iPSCs. Despite that the pluripotency of iPSCs has been confirmed through the tetraploid complementation assay [2], the exogenous TFs’ integration sites can still influence the development and health of the iPSCs-derived mice.

To investigate possible issues with these integration sites, nested iPCR [25] was performed in three iPSC lines previously identified to be pluripotent through the tetraploid complementation assay [2]. A total of 22 integration sites were identified, 77.3% (17) were located in the intergenic region, while 22.7% (five) were within introns far from the transcription start sites. Expressional profiles of 39 flanking genes were analyzed and functions of these flanking genes were reviewed in the iPSCs, ESCs and MEF. Our results showed that these flanking genes have no distinct difference in the expression levels in the three iPSC lines, MEF and ESCs. Moreover, none of the 39 flanking genes correlated to early embryonic development or differentiation, and most of them belonged to housekeeping genes, which are necessary for the basic life events of cells.

It is generally recognized that retroviral integration sites were randomly distributed in multiple chromosomes [26]. In this study, these transgene integration sites were widely distributed throughout the mouse chromosomes and no common integration sites were detected in these iPSCs, which was similar to the previous studies in the iPSCs derived from adult mouse cells [27,28]. However, reports showed retroviral vectors such as murine leukemia virus (MLV) and human immunodeficiency virus (HIV) have preferences for integration sites in the human genome [29]. The HIV was prone to integrate into active genes, while MLV favored integrating near transcription-start regions [29]. In our study, these transgene integration sites mainly resided in the intergenic regions, and it appeared that there is no effect on the function of the flanking genes. Moreover, many of the flanking genes were found to be house-keeping genes, suggesting a friendly environment for the expression of exogenous TFs for iPSCs induction. Thus, we speculated that these clones contained retroviral integration sites in active genes and transcription-start regions may have no capability to finish the correct reprogramming and generate iPSCs, while the clones that have no integration sites in these related locations could be induced into iPSCs. In addition, specific differences may also give rise to integration bias of retroviral vectors in different species. Locus control region (LCR) and gene-proximal elements, which could affect the transcription of genes by recruiting coactivator, transcription complexes or RNA polymerase, play an important role in gene expression [30,31]. In the process of iPSCs induction, these LCR and gene-proximal elements could be disrupted by the retroviral integration. However, according to our previous reports, these iPSCs in this research have the similar transcription pattern with ESCs, implying the integration of retroviral vector has no distinct effect on the transcription of genes [2]. Theoretically, the integrality of LCR and gene-proximal elements is essential for iPSCs to pass the tetraploid complementation assay and generate live pups. In this study, for the five genes that contain retroviral integration in their introns, the expression level showed no obvious difference in iPSCs, MEF and ESCs, suggesting the integrality of LCR and gene-proximal elements. In future, we could induce the homozygous mouse that contains retroviral integration in indicated intron to study the effect of intron integration. Additionally, for the integration sites detection, PCR was the common method. However, this method was time-consuming and it was hard to get an integrated map of the insertion sites. Along with the advancement of the science and technology, new technical approach such as CRISPR/Cas9 was developed to study structural variants in mammalian genomes [32]. In future, we could also use new technologies to study the insertion sites, and to further study the stability of gene expression in iPSCs.

The mechanisms of tumorigenesis by iPSCs are probably attributed to two aspects. One might be related to the exogenous TFs Klf4 and C-Myc, as these two proto-oncogenes can facilitate cell proliferation and transformation through regulating the expression and activity of down-stream proteins, eventually leading to malignant proliferation [33]. The other is the integration sites of exogenous TFs. Sadelain et al. [34] summarized the features of safe integration sites on the human genome and considered that two genes hardly influence each other when their distance is more than 300 kb. Thus, we further studied the 500 kb of downstream and upstream genes from integration sites, and no tumor-related gene was found in this range. Our results suggested that integration sites probably have no effect on the safety of iPSCs. This was consistent with the research that the tetraploid complementation mice derived from iPSCs in this study was similar to mice derived from ESCs in both intelligence and development, which proved that these iPSCs possess totipotency [35]. Overall, we conclude that the integration of exogenous genes did not affect the expression of the flanking genes, and the stable expression of the flanking genes offered a safe environment for iPSCs.

eISSN:
1311-0160
Language:
English
Publication timeframe:
2 times per year
Journal Subjects:
Medicine, Basic Medical Science, other