Abstract
Genetic variance is vital for breeding programs and mutant screening, yet traditional mutagenesis methods wrestle with genetic redundancy and a lack of specificity in gene targeting. CRISPR-Cas9 offers precise, site-specific gene editing, but its application in crop improvement has been limited by scalability challenges. In this study, we develop genome-wide multi-targeted CRISPR libraries in tomato, enhancing the scalability of CRISPR gene editing in crops and addressing the challenges of redundancy while maintaining its precision. We design 15,804 unique single guide RNAs (sgRNAs), each targeting multiple genes within the same gene families. These sgRNAs are classified into 10 sub-libraries based on gene function. We generate approximately 1300 independent CRISPR lines and successfully identify mutants with distinct phenotypes related to fruit development, fruit flavor, nutrient uptake, and pathogen response. Additionally, we develop CRISPR-GuideMap, a double-barcode tagging system to enable large-scale sgRNA tracking in generated plants. Our results demonstrate that multi-targeted CRISPR libraries are scalable and effective for large-scale gene editing and offer an approach to overcome gene functional redundancy in basic plant research and crop breeding.
Similar content being viewed by others
Introduction
Genetic variance plays a pivotal role in both breeding programs and mutant screening for genetic research, whether the source of this variance is natural or artificially induced. Natural variance is rather limited, but there is a broad range of artificial methods to induce genetic mutations. Traditional screening and breeding programs rely heavily on random mutagenesis induced by chemicals or radiation to generate mutations across the genome1,2,3. Although these methods can occasionally produce desirable traits, they have several significant limitations. First, classic mutagenesis cannot target specific genes or regions of the genome and often produces unpredictable outcomes4. Second, random mutagenesis approaches often unintentionally result in mutations in multiple genes, causing off-target noise5,6,7. Third, classic mutagenesis in current breeding programs cannot deal with genetic linkage, as genes situated closely on the chromosome stay together during crossover events. For example, in plants, approximately 12% of genes within each family are genetically linked8,9, so individual mutated lines cannot be combined to obtain double mutants in closely related genes. This linkage is a critical concern for breeding programs that rely on crossing to develop desired crop varieties. Finally, perhaps one of the most overlooked shortcomings of these methods is genetic redundancy: Genes with high sequence similarity often have overlapping or redundant functions10. Redundancy masks the effects of mutations in individual genes, thereby leading to buffered phenotypic plasticity that poses significant challenges in deciphering the precise roles of genes and understanding their functions11,12.
Over the past decade, CRISPR-Cas technology has emerged as a dominant and extremely versatile tool in genetic engineering. The system involves an endonuclease directed by a single guide RNA (sgRNA) to a complementary sequence in the genome that also includes a protospacer adjacent motif (PAM); CRISPR enables site-specific gene editing with high-efficiency and low off-target effect13,14,15. Moreover, use of sgRNA multiplexing, introducing two or more sgRNAs into the same plant, can enable simultaneous targeting of multiple genes to circumvent issues of genetic redundancy16. For example, the BREEDIT tool has been used to facilitate multiplexed genome editing of several gene families in maize to improve complex traits such as yield and drought tolerance17. Similarly, the multiplexing strategy was applied to elite winter wheat varieties for simultaneous editing of multiple genomic loci in one generation18. A recent example of a multiplex CRISPR approach was carried out in poplar to increase fiber production capacity19. sgRNA multiplexing has enabled efficient genetic editing in tomato20,21. For instance, introducing six sgRNAs targeting three genes involved in fruit color regulation led to diverse inherited mutation combinations and a broad spectrum of fruit colors22. In another study, this approach resulted in a fivefold increase in lycopene content23. Although the sgRNA multiplexing approach tackles functional redundancy and enables selection of a desired phenotype to improve agronomically important traits, it suffers from low scalability as each vector is cloned manually. As gene lists must be filtered manually and only a handful of potential candidates can be targeted, the likelihood of success is relatively low. Due to this very low scalability, the influence of sgRNA multiplexing approaches on crop improvement and breeding programs has been limited24.
Recent studies have generated small, medium, and even large-scale CRISPR libraries in several crop plants. The first instance of such a system was demonstrated in tomato, where 165 sgRNAs were designed to target all 54 genes within the LRR-RLK subfamily XII, achieving a reported mutation detection rate of 62.5%25. In another example, a CRISPR library was applied to rice with 25,604 sgRNAs targeting 12,802 genes expressed in the shoot26. A parallel study in rice generated a genome-scale CRISPR library of 88,541 sgRNAs, targeting 34,234 genes with an average coverage of 2.59 sgRNA vectors per gene27. In two other recent studies, a CRISPR library of 246 sgRNAs was used to screen cotton for insect-resistance genes28, and 4379 sgRNAs were designed to target 990 transcription factors in tomato29. These libraries demonstrate the potential of large-scale targeted genetics but do not bypass the issue of functional redundancy that results from large gene families with partially overlapping activities. Since, on average, 64.5% of the genes in the genome of plants are part of paralogous gene families30,31,32, phenotypic plasticity may limit the utility of these large CRISPR libraries.
In an attempt to utilize CRISPR’s pinpoint accuracy and high efficiency while maintaining the high scalability of traditional methods, we previously developed an approach for the design of libraries in which each vector harbors a single sgRNA that targets conserved sequences across multiple genes, allowing simultaneous editing of several gene family members33,34,35,36. Applying this approach in Arabidopsis generated a genome-wide multi-targeted CRISPR library, which included tens of thousands of unique sgRNAs targeting genes in both genome-wide and functional group-specific manners37. Given the relevance of this approach to agricultural crops and breeding programs, we here design and develop a genome-wide, multi-targeted CRISPR library for use in a crop plant, tomato. Using this library, we generated approximately 1300 independent CRISPR lines and identified over 100 independent lines that have a wide range of phenotypes, including those related to fruit shape and size, fruit flavor, pathogen response, and nutrient uptake. In addition, we developed an sgRNA mapping system, CRISPR-GuideMap, that uses a double barcode tagging system and deep sequencing to enhance the use of CRISPR libraries in breeding programs. By integrating precise gene targeting with advanced sequencing technology, we streamlined the identification of gene functions and broadened the scope of functional characterization obtainable from these libraries.
Results
Design of tomato genome-wide, multi-targeted CRISPR library
Given that high sequence similarity among members of gene families in plants can lead to phenotypic buffering, classic forward genetic screening is limited in its ability to identify unknown phenotypes. Previously, we showed that by designing sgRNAs that target multiple genes in a family containing the same or similar target sites, double or multi-knockout mutants could be generated at the genome scale37. To date, this approach has only been demonstrated in the model plant Arabidopsis. Here, we develop and apply this strategy to tomato, being a major crop species.
To this end, we first grouped all coding gene sequences of Solanum lycopersicum into gene families based on amino acid sequence similarity and used the CRISPys algorithm38 to design multiple sgRNAs for each family. Given a gene family, its phylogenetic tree was reconstructed, such that a subgroup of genes that are more closely related are placed closer to each other. CRISPys then designed multiple sgRNAs that could optimally target multiple members within each subgroup (represented by internal nodes in these trees). To maximize the likelihood of gene knockouts, sgRNA targets were confined to the first two-thirds of the coding sequence. We note that during this procedure, multiple sgRNAs could be designed for the same set of genes, allowing for different types of mutations to be created. Based on the similarity between the 20-nucleotide sequence of the sgRNA and its target genes, an “on-target” score was calculated using the cutting frequency determination (CFD) scoring function39, and discarded sgRNAs with an on-target score below 0.8. Once an optimal sgRNA was generated, specificity was verified by scanning the rest of the genome for sequences with similarity to the sgRNA. To ensure specificity, we filtered out sgRNAs with potential off-target effects, applying stricter thresholds for off-targets in exons (20% of the on-target score) compared to other genomic regions (50% of the on-target score). These strict parameters were chosen to achieve high cleavage efficacy while maintaining the robustness required for a genome-scale tool. The design was then subsequently applied to all subgroups and gene families (Fig. 1). This strategy generated a library with 15,804 unique sgRNAs targeting 10,036 of the 34,075 genes in tomato (Fig. 2a, b). Approximately 95% of the sgRNAs target groups of two or three genes, with the remaining sgRNAs target groups of four to eight genes (Fig. 2c). On average every sgRNA targeted 2.23 genes. Analysis of all the matches between sgRNAs and the targeted genes showed that 25% had no mismatches, 33% had 1 mismatch, 32% had 2 mismatches, and 10% had 3 or more mismatches, with an average of 1.21 mismatches per gene. (Fig. 2d).
Schematic overview illustrating the library workflow, from design to screening. All coding genes were divided into phylogenetic trees, and trees were classified by function. sgRNAs were designed to target multiple genes located in close proximity to one another phylogenetically (indicated by colors). Each sgRNA was cloned into the Cas9 vector, creating a plasmid library. Transformed lines were screened with multi-targeted, large-scale, forward genetics for specific traits of interest, revealing hidden phenotypes. *Whole genome indicates all coding genes excluding transporters, transcription factors, and enzymes.
a Schematic visualization of the 10 sub-libraries, detailing the main gene families targeted, the number of genes targeted, and number of sgRNAs designed. MC, mitochondrial carrier. Numbers in bold and in brackets in ‘Genes’ column represent the number of genes targeted and the total number of genes (respectively) in aforementioned groups. b Bar chart of the number of genes targeted (black) versus the total number of genes in each sub-library (gray). c Distribution of the number of target genes per sgRNA in the library. d Pie charts of the number of mismatches between the designed sgRNAs and their target genes for sub-library 1 (top) and the entire library (bottom). e The frequency of numbers of sgRNA reads in sub-library 1 as determined by deep sequencing. Coverage was compared to the number of sgRNAs theoretically present in the library designed in silico. f Skewness (left axis, orange) and coverage (right axis, blue) of each sub-library. Orange and blue dotted lines mark the high-quality thresholds for skewness (−1, 1) and coverage (95%), respectively. Skewness for deep sequencing results were calculated as 3*(mean-median)/std dev. Source data are provided as a Source Data file.
Sub-library construction and transformation
To create a research-ready tool that can be easily and flexibly used, we split the sgRNA library into 10 sub-libraries based on the targeted genes function. Sub-libraries 1–9 target specific families within transporter, transcription factor, and enzyme functional groups, and sub-library 10 targets all other genes (Fig. 2a). The three sub-libraries within the transporter, transcription factor, and enzyme functional groups focus on specific gene families. For example, for the transporter functional group, sub-library 1 includes sgRNAs targeting genes from the ABC, MFS, and DMT families, whereas sub-libraries 2 and 3 target genes from the APC family, channels and porins, mitochondrial carriers, and cation carriers, among others (Fig. 2a). The transcription factors sub-libraries (4–6) target noticeable groups such as ARF, AP2/ERF, ARR/GRF, MYB, WRKY, bHLH, bZIP, HSF and NAC, and the enzyme sub-libraries (7–9) were split into hydrolases, isomerases, oxidoreductases, lyases, ligases and transferases (Fig. 2a). Sub-libraries 1–9 contain between 179 and 511 sgRNAs targeting between 90 and 342 genes, and sub-library 10 contains 12,923 sgRNAs targeting 8156 genes (Fig. 2b). A detailed breakdown of functional groups, their associated smaller families, and the corresponding numbers of genes and sgRNAs is provided in Supplementary Table 1. The genome-scale library design and output are available in Supplementary Data File 1.
During synthesis, sub-library-specific adaptors of approximately 30 nucleotides were added on each side of every sgRNA according to their target gene’s functional group. This enabled the amplification of specific sub-libraries using primers complementary to these adaptors. Once amplified, the sub-library sgRNAs were cloned into a binary Cas9 vector (pMR284) in bulk as previously described17, resulting in copies of the vector that contain different sgRNAs. For each sub-library, we evaluated coverage (i.e., how many of the sgRNAs were cloned into the vector) and the distribution of the amplified sgRNAs. Deep sequencing of the 10 sub-libraries revealed essentially full coverage (> 99%) of the possible sgRNAs and a narrow bell-shaped distribution (skew values ranging from 0.02 to 0.26) (Fig. 2e, f, Supplementary Fig. 1), indicating that the sgRNAs were adequately distributed with no gross overrepresentation of individual sgRNAs.
To demonstrate the robustness and applicability of this tool in overcoming redundancy in forward-genetics screens, we transformed sub-library 1 into the M82 sp- tomato background. The 502 cloned sgRNAs, targeting 199 genes belonging to three main transporter families (ABC, MFS, and DMT), were transformed into plants via Agrobacterium tumefaciens-mediated transformation. The transformation process was conducted in a bulk manner: The pool of vectors encoding the 502 sgRNAs was introduced into a single batch of Agrobacterium. Subsequently, this heterogenous Agrobacterium batch was used for plant transformation. We generated ~250 independent transgenic lines, each harboring a unique sgRNA targeting multiple transporter genes. DNA was extracted from individual T0 plants. Notably, not all lines yielded seeds. In addition, we generated ~1060 lines in an Ailsa Craig background from sub-libraries targeting genes encoding transporters (sub-libraries 2 and 3) and transcription factors (sub-libraries 4 and 5). This large-scale transformation demonstrates the versatility of our CRISPR library system, which resulted in over 1300 unique transgenic lines in two tomato backgrounds.
Identifying effective predictors of Cas9 cleavage efficiency
Precise sgRNA design is essential for efficient CRISPR editing, but in order to achieve multi-targeting in our library, we applied a strategy allowing mismatches between sgRNA and target gene. This approach accommodates the multi-targeted nature of our library by enabling a single sgRNA to target multiple homologous genes, often in cases where no fully identical sequence is located near a PAM site or where off-target effects must be minimized. While allowing mismatches facilitates broader targeting, it may affect cleavage efficiency. To evaluate how mismatches and CFD scores influence Cas9 activity, we performed Sanger sequencing on 146 target genes with varying mismatch counts and CFD scores, using both randomly selected plants and those with observable phenotypes for analysis. We also included plants from two tomato backgrounds, M82 and Ailsa Craig, to assess whether these trends were background-dependent. The results showed that both mismatch count and CFD score were significant predictors of whether a gene would be cleaved (Fig. 3a, b), while neither the selection method (phenotype-based vs. random) nor the background (M82 or Ailsa Craig) significantly impacted cleavage efficiency (Fig. 3c, d). Having established that mismatch count and CFD score are reliable predictors of cleavage efficiency in our sequenced samples, we extended the analysis to the entire library to verify whether similar trends held across all target genes. The data confirmed a strong correlation between CFD score and mismatch count across the full library (Fig. 3e). In summary, our findings indicate that the ability to induce mutations can be predicted by target gene-sgRNA parameters, most notably mismatch count, and is independent of other factors like selection method or transformation background (Fig. 3f).
a Number of mismatches in cut (n = 63) and uncut (n = 83) plants, mutations verified by Sanger sequencing. Statistical significance was evaluated by Students two-sided t-test, ***p-value < 0.001, assuming equal variances. b CFD score in cut (n = 63) and uncut (n = 83) plants. Statistical significance was evaluated by Students two-sided t-test, * p-value < 0.05, assuming equal variances. Data are presented as mean values ± SD. c Cas9 edits (cleavage efficiency) in phenotype-based selection (n = 66) and randomly selected plants (n = 80). Not significant by Students two-sided t-test (p-value = 0.064). Data are presented as mean values ± SD. d Cas9 cleavage efficiencies in two tomato backgrounds, M82 (n = 66) and Ailsa Craig (AC, n = 80). Data are presented as mean values ± SD. Not significant by Students two-sided t-test (p-value = 0.134). e CFD score in all library targets, divided into mismatch numbers: 0, 1, 2, and 3 or more mismatches. n number indicated in brackets below. Statistical significance was evaluated by one-way ANOVA and Tukey’s multiple comparison test, p-value < 0.0001. Data are presented as a box plot with the center by the median, box limits correspond to the 25th and 75th percentiles, whiskers extending to the minimum and maximum values. f Cas9 cleavage efficiencies in two tomato backgrounds, M82 (n = 66) and Ailsa Craig (AC, n = 80), among target genes with 0, 1 or 2 mismatches. “random” includes Cas9 cleavage efficiency data from independent plants randomly selected. “Phenotype” indicates phenotype-based selection and includes Cas9 cleavage efficiency data from plants selected based on observable phenotypes. Source data are provided as a Source Data file.
CRISPR-GuideMap—a barcode tagging tool enables whole-library sgRNA sequencing
CRISPR-based forward genetic screens are powerful for identifying gene functions and interactions in plants. However, due to the large number of generated plants, a major limitation of these screens is the inability to track all CRISPR constructs within the library. Since the transformation is carried out in bulk, the identity of sgRNAs in individual plants is unknown. To identify the sgRNAs, researchers typically sequence only plants showing phenotypes, leaving all other plants un-sequenced and unstudied. We envisioned that a double barcode tagging system would enable us to determine the identity of the inserted sgRNA in each plant at a library scale. Thus, we aimed to sequence sgRNAs present in all plants in the library, regardless of whether the plant shows a phenotype at first sight. Such an approach would enhance the utility of our library by enabling complete characterization of the population and maximizing the information obtained from each plant.
To establish this barcoding approach, DNA was separately extracted from 253 T0 M82 transgenic plants transformed with sub-library 1, and the sgRNA region of each plant was amplified with forward and reverse primers that each have unique 8-nucleotide overhangs; this results in a unique “double barcode” for each individual plant (Fig. 4a). We deep sequenced the PCR products of all samples pooled together using the paired-end 150 (PE150) strategy in which 150 nucleotides of each strand of every individual amplicon is sequenced, creating an overlapping region. Use of n forward primers and n reverse primers results in \({n}^{2}\) combinations of barcodes. For example, a library of 1024 samples will require 32 forward primers and 32 reverse primers. The deep sequencing results produced millions of reads, many of which required filtering to remove irrelevant data and isolate meaningful information. The details of the process are described in the Methods section. Of the 253 samples, 17 of them (6.7%) had no sgRNA sequences. We speculate this resulted from loss of DNA during extraction or unsuccessful PCR amplification rather than an inherent absence of sgRNAs in these plants. We detected 146 unique sgRNAs among the 236 samples that expressed an sgRNA. The majority of these sgRNAs, 133 out of 146, were observed three or fewer times; however, we identified three sgRNAs that were significantly overrepresented, appearing 11, 24, and 37 times, respectively, across our sample set (Fig. 4b). In addition, we found that 186 (78%) of the plants expressed a single sgRNA, 41 (17%) expressed 2 sgRNAs, and 9 plants (4%) harbored 3 or more sgRNAs (Fig. 4c). These findings are in accordance with existing literature of insertion rates40. Importantly, when comparing our CRISPR-GuideMap results to those obtained through Sanger sequencing, we observed over 95% matching results. Comprehensive datasets of these results, along with additional data containing barcoding information for 441 transcription factors plants, are available in Supplementary Data File 2. Together, these results demonstrate the reliability and robustness of this system in providing a comprehensive overview of the entire sgRNA repertoire within the plant library. This enables a reverse-genetics approach, as data is available for every plant rather than just a selected few.
a Schematic visualization of the CRISPR-GuideMap system. DNA from each plant is amplified using primers with unique 8-nucleotide overhangs, generating amplicons with unique barcodes and sgRNAs. b Number of sgRNAs versus the number of plants that harbor a given sgRNA. c Number of sgRNAs inserted per plant. d Phylogenetic tree of the SlNPF1 clade showing the genes targeted by sgRNAs in the library (left) and the combinations detected using CRISPR-GuideMap in planta (right). e Bacterial population counts in 4-week-old M82 (n = 9) and 2 independent npf triple mutant plants (n = 3) at 0 and 3 DPI. Each point represents a biological replicate, averaged from three technical replicates. Statistical significance was evaluated by Students two-sided t-test, *p-value < 0.05: assuming equal variances. NS, not significant by Students t-test (p-value = 3.32 × 10-5 and 5.45 × 10−4 for alleles 1 and 2, respectively). Data are presented as box plot with the center by the median, box limits corresponding to the 25th and 75th percentiles, whiskers extending to the minimum and maximum values. f Representative images of symptoms on leaflets of M82 and two alleles of the npf triple mutant at 4 DPI. Scale bar = 1 cm. g Phylogenetic tree of the SlPT clade, showing the targeted sgRNA combinations in the library design and the combinations detected in plants. Solyc09g073010 has not been previously named and is referred to as PT9. h–j Phenotypic data of 8-week-old M82 and pt2/pt6 mutant plants grown under normal and low-phosphate conditions. h n = 13 in M82 control conditions, n = 10 in M82 low P, and pt2/pt6. i, j n = 14 in M82 control conditions, n = 10 in M82 low P, n = 10 in pt2/pt6 control conditions, n = 9 in pt2/pt6 low P. Data are presented as box plot with the center by the median, box limits correspond to the 25th and 75th percentiles, whiskers extending to the minimum and maximum values: h shoot biomass dry weight, not significant by Students two-sided t-test (p-value = 0.992) (i) root surface area, not significant by Students two-sided t-test (p-value = 0.872) and j number of root tips. Not significant by Students two-sided t-test (p-value = 0.603) Statistical significance was evaluated by Students t-test, *p-value < 0.05, **p-value < 0.01, assuming equal variances. k Representative images of the whole root system of M82 and pt2/pt6 mutant under normal (30 ppm) and low (3 ppm) phosphate conditions. Scale bar = 2 cm. Source data are provided as a Source Data file.
Phenotypic screening of CRISPR library in tomato
Two distinct populations of tomato plants transformed with our CRISPR library were screened, each targeting different gene families and having specific phenotypic focuses. The first population transformed subgroup 1 into the M82 background, targeted transporter genes from the ABC, MFS, and DMT families. We aimed to identify physiological and developmental abnormalities under standard and low fertilization conditions in transformed plants. We speculated that screening under low fertilization conditions may accentuate some phenotypic differences.
During initial screening of the mutant population, we identified a mutant line that exhibited an apparent increased susceptibility to pathogens. Sequencing analysis revealed that this line harbored an sgRNA targeting three adjacent genes within the NPF1 subfamily: NPF1.10, NPF1.11, and NPF1.12 (Solyc05g005990, Solyc05g006000, Solyc05g006010), all with zero mismatches. While these genes were successfully amplified in wild-type plants, the mutant plant showed no amplification of the target genes, raising the possibility that a large deletion of all 3 genes may have occurred; this line is referred to as npf1.10/11/12-1 (allele 1) (Supplementary Fig. 2a). Leveraging our CRISPR-GuideMap system, we searched the library for another independent sgRNA line targeting the same genes and identified an additional line harboring an sgRNA targeting NPF1.10, NPF1.11, and NPF1.12 (Fig. 4d). We successfully amplified the expected outcome of a large deletion using the forward primer of NPF1.10 and the reverse primer of NPF1.12. Sequencing confirmed a large, 10,739 base pair, deletion between the target sites in NPF1.10 and NPF1.12 (Supplementary Fig. 2a, b). NPF1.10 was truncated in the first 1/3 of the gene, NPF1.11 was deleted entirely, and the deletion resulted in a frameshift in NPF1.12, resulting in a knockout of all 3 genes. This line is referred to as npf1.10/11/12-2 (allele 2). Data from the Tomato Functional Genomics Database indicated that NPF1.10 is upregulated in response to various pathogens, suggesting a role in the pathogen response. To test this hypothesis, we conducted a Xanthomonas euvesicatoria bacterial inoculation assay on wild-type M82 plants and both npf1.10/11/12 mutant alleles. Plant samples were taken at 0- and 3 days post inoculation (DPI) with bacteria. No differences were observed at 0 DPI, but both alleles showed increased bacterial growth at 3 DPI compared to the wild-type plant, indicating that loss of NPF1.10/11/12 enhanced susceptibility to X. euvesicatoria (Fig. 4e, f). Although the exact mechanism remains unclear, several studies have suggested potential links between nitrogen metabolism, a putative NPF substrate, and pathogen responses41,42,43.
In plants grown under low fertilization conditions, we identified a mutant line with heightened sensitivity to nutrient deficiency. Sequencing revealed mutations in PT2 and PT6 (Solyc03g005530, Solyc03g005560), which encode PHT1 phosphate transporter gene family members (Fig. 4g, Supplementary Fig. 3a). While PT2 and PT6 knockouts were not studied in tomato thus far, the eight PHT1 family members play crucial roles in phosphate uptake and transport, and evolutionary relationships and functional divergence within the Solanaceae family have been characterized44,45. The identical tandem duplicates, PT2 and PT6, are expressed in roots, with upregulation observed under low phosphate conditions45,46. We compared the growth of the pt2/pt6 double mutant with wild-type plants under both control and low-phosphate conditions. Although the double mutant had no noticeable defects under standard fertilized conditions, under low-phosphate conditions, the mutant had stunted growth and reduced shoot biomass accumulation (Fig. 4h). Since both genes are expressed in the root, we assessed root elongation on agar and found that while pt2/6 exhibited reduced root length under low P conditions, no effect was detected under normal P fertilization (Supplementary Fig. 3b). To further investigate the phenomenon in a more natural environment, we imaged the root system under varying phosphate conditions. Under normal phosphate conditions, there was no significant difference between wild-type plants and the pt2/pt6 mutant. However, under low-phosphate conditions, the root surface area and overall number of root tips were both decreased in pt2/pt6 plants compared to wild-type plants (Fig. 4i–k). Root system depth and width did not show a significant change (Supplementary Fig. 3c, d). These results indicate that the two genes are important in phosphate uptake and response. In conclusion, our results demonstrate that multi-targeted CRISPR libraries, combined with CRISPR GuideMap, are effective tools for generating higher-order mutants and uncovering previously unknown phenotypes.
Manipulation of tomato fruit-related traits
The multi-targeted CRISPR approach applied in Subgroup 1 showed significant feasibility in revealing hidden phenotypes, we wanted to evaluate the universality of our approach at larger scales. We, therefore, transformed subgroups 2 and 3, targeting transporters, and 4 and 5, targeting transcription factors (Fig. 2a and Supplementary Table 1), into the Ailsa Craig indeterminate background, and generated additional 1062 independent lines. To identify genes associated with tomato fruit quality, such as fruit size, shape, and Brix content, we screened all 1062 lines in two consecutive seasons at T1 generation (over 10,000 plants, with each line grown in 10 replicates). We identified 125 lines with putative fruit-related phenotypes. Among these, six mutant lines displayed significant variations in fruit size and shape (Fig. 5a, Supplementary Fig. 4). Measurements of fruit diameter, tip length, and Brix content showed significant differences compared to the fruits from control plants (Fig. 5b). Sequencing of the sgRNAs and their target genes identified one single mutant and five double mutants associated with the phenotypes (Fig. 5c, d). For example, line 0815-105 targets two Dof transcription factor genes Solyc02g067230 (SlDof3) and Solyc02g088070 (SlDof8), resulting in smaller fruits (Fig. 5). While, Dof3 and Dof8 loss-of-functions were not reported in tomato, Dof9 single mutant was shown to regulate fruit yield47. Additionally, the sgRNA expressed in line number 0815-122 targets two genes from the ERF family, of which several family members are key transcriptional regulators in the ethylene response pathway48,49,50. Sequencing of the targeted genes showed that the gene family members, Solyc08g081650 and Solyc08g081670 (ERF118-like genes), were mutated, and the fruits contained lower Brix levels (Fig. 5). Mutant plants showed slightly smaller fruit weight compared to Ailsa Craig, while seed number per fruit and seed germination rate traits were not affected (Supplementary Fig. 4). Another example of lower Brix content is line 0815-155, which targets Solyc03g043820, Solyc03g043830, and Solyc03g043840, three unstudied bZIP family members. Interestingly, in both cases (lines 0815-122 and 0815-155), the targeted gene family members were genetically linked to each other, magnifying the strength of the approach in uncovering redundant genetically linked gene activities. Lines 0815-246 and 0815-350 showed significant modification in fruit shape, fruit weight, and reduced seed number per fruit (Fig. 5, Supplementary Fig. 4). Both lines target AP2 family genes using different sgRNA seq, where Solyc02g093150 is common between the two. Previous research indicates that AP2a, another member of the AP2 family, functions as a negative regulator of fruit ripening. Inhibiting AP2a expression enhances ethylene synthesis and accelerates the ripening process in tomato51.
a Representative images of fruit-related phenotypes in 18-week-old wild-type (Ailsa Craig) and the indicated mutant tomato plants. Scale bar = 1 cm. b Quantification of fruit-related phenotypes such as Brix score, diameter, and tip length. For line 0815−105, n = 7, AC n = 13. For line, 0815−122, n = 10, AC n = 10. For line 0815–155, n = 8, AC n = 12. For line 0815–246, n = 4, AC n = 4. For line 0815–350, n = 18, AC n = 6. For line 0815–456, n = 16, AC n = 5. Each data point represents a biological replicate. Statistical significance was evaluated by Students two-sided t-test, **p-value < 0.01, ***p-value < 0.001. Data are presented as box plot with the center by the median, box limits correspond to the 25th and 75th percentiles, whiskers extending to the minimum and maximum values. c DNA alignments of sgRNAs and target genes. Letters in red indicate mismatches. d Sequencing chromatograms of the targeted genes. sgRNA sequence is in blue, and deletions or substitutions are highlighted in red. e Phylogenetic trees of closely related homologs (targeted genes are highlighted in red). Source data are provided as a Source Data file.
Finally, line 0815-456 targets two ARF genes (ARF2B Solyc12g042070 and ARF3 Solyc02g077560) with significant changes in the fruit shape. Notably, the fruits of this line contain placenta but no seeds (Supplementary Fig. 4b). No previous reports have linked ARF2B and ARF3 to tomato fruit shape, however, ARF3 was shown to play a role in the development of epidermal cells and trichomes development. In this respect, the homologous gene ARF9 has been implicated in regulating tomato cell division and expansion, affecting fruit size52,53. Together, this data demonstrates that gene discovery at large scales, such as across all transcription factor-encoding genes, combined with the multi-genic editing of gene families, can reveal hidden phenotypic variation required for crop improvement.
Discussion
Classic mutagenesis screening techniques, which are often imprecise and produce unpredictable outcomes, have significant limitations. These methods typically generate random mutations, resulting in off-target effects that complicate the identification and characterization of relevant genes. Additionally, genetic linkage, which means that genes located close together on a chromosome tend to be inherited together, further complicates traditional breeding efforts52,54,55. Although often overlooked, this issue presents a substantial challenge for breeding programs that rely on crossing to develop desirable traits. Previously employed CRISPR systems offer specificity but lack scalability. Recently, the first CRISPR library for tomato was generated; it targeted all annotated transcription factors, and its application demonstrated the significant scalability of CRISPR libraries29. Despite the scale of this system, it lacked the ability to generate double or triple mutants, as only a single gene was targeted in each plant. Given that genetic redundancy buffers phenotypic plasticity, the inability to target multiple genes simultaneously possessed a significant limitation. Our research introduces a multi-targeted CRISPR library designed to tackle genetic redundancy by targeting multiple genes within the same family. Unlike previous single-gene targeting approaches, this strategy provides a robust platform for functional genomics and agricultural innovation by enabling simultaneous editing of multiple family members. One of the primary advantages of CRISPR-Cas9 technology is its ability to perform high-efficiency, site-specific gene editing with minimal off-target effects. Building on our previous work in Arabidopsis, which demonstrated the feasibility of genome-wide, multi-targeted CRISPR libraries37, we further extended this approach to tomato. Our successful implementation in tomato demonstrates the feasibility of this approach in crop species. This strategy could be readily adapted to any crop with a sequenced genome and an established transformation protocol, such as maize, wheat, and rice, expanding its potential for functional genomics and agricultural improvement. The developed library, comprising tens of thousands of unique sgRNAs designed as sub-libraries to target genes within functional groups, represents a significant step forward in utilizing CRISPR technology to enhance crop traits and improve agricultural productivity.
To simultaneously target multiple genes from the same family, we allowed for mismatches between the sgRNAs and the targeted genes. This approach allows for high coverage, but mismatches lower the efficiency of the system due to lower complementarity between the sgRNA and the target genes. As expected, our results demonstrated that a high number of mismatches significantly hindered the ability of Cas9 to induce double-stranded breaks in DNA. While we acknowledge that all sgRNAs designed by the CRISPys algorithm have passed additional limiting criteria of the CFD scoring function, such as mismatch position and substitution type, we still observe a clear trend within the approved sgRNAs: lower mismatch counts consistently correlate with higher cleavage efficiency. This suggests that even among sgRNAs that meet the CFD scoring thresholds, mismatch count remains a dominant factor influencing Cas9 activity. This finding underscores the impact of mismatches on CRISPR efficiency and highlights the need for optimized sgRNA design. Our findings are in line with previous reports on the deleterious effect of mismatches on Cas9 cleavage efficiencies in both bacterial and mammalian cells53,56. Several strategies could enhance the robustness of the system in designing future libraries. First, the use of multiplexing, wherein each plant is transformed with multiple sgRNAs, could enhance the overall effectiveness of the system by simultaneously targeting multiple genes without requiring mismatches. However, because of technical oligo synthesis and cloning limitations, this has not been carried out at the large-scale level. Second, genetically engineered versions of the Cas9 protein, such as an intronized Cas957, have increased cleavage efficiency in Arabidopsis and may offer similar benefits in other plant systems including tomato. Third, while our current design relied on the CFD39 scoring function to predict mismatch effects, newer scoring functions are continuously being developed that leverage large-scale experimental data and deep learning to better predict off-target effects58,59,60. Such improved scoring methods could be incorporated into our algorithm to optimize sgRNA design in future libraries.
Our library features a barcoding-sequencing system, CRISPR-GuideMap, that enables tracking of sgRNAs present in all plants. Unlike previous CRISPR screens where only plants displaying phenotypes are typically sequenced and analyzed, this approach allows the identification of sgRNAs in all plants in the library, regardless of their phenotypes. This comprehensive characterization provides information about the full spectrum of potential genetic modifications in the population.
For the M82 background transformation of Sub library 1, a small percentage of transformed plants (17 out of 253) had no detectable sgRNA sequences, likely due to technical issues with DNA extraction or PCR amplification. Analysis of the remaining samples revealed that 78% expressed a single sgRNA, 17% expressed two, and 4% harbored three or more sgRNAs. This distribution aligns with expectations based on existing literature40 and demonstrates the effectiveness of our barcoding-sequencing system in providing an accurate view of the sgRNA content in each plant. Notably, although most sgRNAs were evenly distributed, a few were significantly overrepresented, highlighting areas for further investigation to understand the factors contributing to their prevalence. One way to reduce the overrepresentation would be to perform transformations and tissue culture regeneration in a trackable manner, labeling each regenerated callus.
The ability to identify the sgRNAs present in every plant offers several important advantages for both research and breeding programs. First, by using next-generation sequencing, CRISPR-GuideMap enables the identification of plants harboring multiple sgRNAs, information that cannot be obtained through Sanger sequencing. This is particularly valuable as it ensures the reliability of genotype-phenotype causality inferences. Secondly, by identifying other plants within the transformed library that harbor the same or different sgRNAs targeting the same genes of interest, researchers can efficiently verify genotype-phenotype relationships and streamline verification. Additionally, this system expands homology research by allowing for the identification of plants with sgRNAs targeting neighboring genes on the phylogenetic tree, thereby enhancing our understanding of genetic relationships and evolutionary patterns. This information guides choice of lines for continued research. Second, by revealing the distribution of sgRNAs across the entire population of transformed plants, CRISPR-GuideMap provides a comprehensive overview of the genetic variation generated, benefiting both basic research and breeding applications.
Lastly, CRISPR-GuideMap can facilitate a ‘reverse genetics’ approach, allowing researchers to identify plants carrying sgRNAs targeting specific genes or gene families even prior to screening. For example, studies focusing on genes encoding particular nutrient or hormone transporters can be conducted using only plants with relevant sgRNAs, increasing screening efficiency. Overall, CRISPR-GuideMap represents a significant advancement in genetic screening technology. It not only maximizes the amount of usable data from the library but also enhances the system’s overall functionality, making it a powerful tool for research and breeding alike.
In conclusion, the developed CRISPR library in tomato represents a significant advancement in genetic engineering, extending the toolbox for functional genomics and crop improvement. By overcoming the limitations of genetic redundancy and implementing a barcoding-based sequencing strategy, our approach provides powerful multi-targeted tools for researchers and breeders. This innovative strategy has the potential to accelerate the development of various crops with enhanced traits, contributing to food security and sustainable agriculture.
Methods
Plant material and growth conditions
Tomato (S. lycopersicum) plants in M82 sp- and Ailsa Craig (Sl) backgrounds were used throughout this study. Plants were grown in a greenhouse or a growth room with long-day conditions (16 h light/8 h dark) at 20–30 °C.
Tomato Solanaceae Genomics Network (SGN) accession numbers: Solyc05g005990 (NPF1.10), Solyc05g006000 (NPF1.11), Solyc05g006010 (NPF1.12), Solyc03g005530 (PT2), Solyc03g005560 (PT6), Solyc02g067230 (SlDOF3), Solyc02g088070 (SlDOF8), Solyc08g081650 (ERF118-like), Solyc08g081670 (ERF118-like), Solyc03g043820, Solyc03g043830, Solyc03g043840, Solyc02g064960, Solyc02g093150, Solyc03g044300, Solyc12g042070 (ARF2b), Solyc02g077560 (ARF3).
Bacterial material and growth condition
All bacteria were grown on LB agar media: 20 g of LB (Lennox) (Accumedia) and 15 g bacteriological agar (Accumedia) were added to 1 L doubly distilled water and autoclaved for 20 min at 121 °C. Antibiotics were added at final concentrations of 50 μg/ml kanamycin, 100 μg/ml carbenicillin, 25 μg/ml gentamycin, 50 μg/ml spectinomycin, and 25 μg/ml rifampicin accordingly.
Plant DNA extraction and genotyping
To isolate genomic DNA from young tomato leaves to serve as a template for PCR to perform sequencing and genotyping, approximately 100 mg of tomato leaves were placed in a 2-ml round-tip Eppendorf tubes together with a metal bead and rapidly frozen in liquid nitrogen. Subsequently, the frozen tissue was crushed into a thin powder using a tissue-lyser. The powdered tissue was homogenized with 400 μl of DNA extraction buffer containing 200 mM Tris-HCL (pH 7.5–8.0), 25 mM EDTA, 250 mM NaCl, and 0.5% SDS. After homogenization, the tubes were briefly vortexed for 5 s and then subjected to centrifugation at 21,130 × g for 1 min in an Eppendorf mini centrifuge. The supernatant was transferred to a new tube, and DNA precipitation was achieved by adding 300 μl of isopropanol. Following a 5-min incubation at room temperature, the tube was centrifuged at 21,130 × g for 10 min at room temperature. The visible pellet was washed with 600 μl of 70% ethanol and centrifuged for 1 min at 21,130 × g at room temperature. Finally, the DNA pellet was resuspended in 50 μl doubly distilled water. The concentration of the extracted DNA was determined from absorbance measured using a Nanodrop spectrophotometer.
All lines shown in this study were screened in T1 and confirmed for phenotype and genotype in T2 generation. The presented mutants are either homozygous or biallelic. Plants were first genotyped for sgRNA and Cas9 insertion, followed by detailed genotyping for mutations in target genes using primers indicated in Supplementary Table 4. After validating homozygous or biallelic lines, plants were genotyped again to test whether the Cas9 was still present in the line or segregated out: lines npf1.10/11/12_2, 0815−105, 0815-246, and 0815-350 were found homozygous for Cas9; lines: npf1.10/11/12_1, 0815−122 and 0815- 155 were heterozygous for the Cas9; pt2/6 and 0815-456 have no Cas9 insertion.
Construction of multi-targeted CRISPR libraries and tomato transformation
The 20-nucleotide sgRNA target sites were appended to specific adaptors containing type IIS restriction enzymes BsaI sites (Supplementary Table 4). Synthesis of the 15,804 DNA oligonucleotides (total yield: 500 ng) corresponding to the sgRNAs was performed by Twist Bioscience. Using adaptor-specific primers (Supplementary Table 4), the libraries were prepared as described by Hu et al.37. Briefly, vectors were assembled using the Golden Gate cloning system61 to generate specific overhangs that allow for one-pot, directional assembly of multiple DNA fragments. Each fragment is flanked by unique 4-bp overhangs, enabling seamless and predefined assembly into a destination vector via simultaneous digestion and ligation. Final binary vectors, pMR284, were introduced into A. tumefaciens strain GV3101 by electroporation. The constructs were transformed into M82 sp- and Ailsa Craig cotyledons using transformation and regeneration methods described by McCormick62. Briefly, tomato cotyledons from 10-day-old seedlings of M82 sp- or Ailsa Craig are first excised and placed on regeneration medium for a 2-day pre-culture period to enhance their competence for transformation. The explants are then infected with Agrobacterium tumefaciens carrying the desired construct and co-cultivated in the dark for 2 days to allow for T-DNA transfer. After co-cultivation, the cotyledons are transferred to a selective regeneration medium containing appropriate antibiotics to inhibit Agrobacterium growth and select for transformed cells; they are subcultured every 14 days until shoot formation. Kanamycin-resistant T0 plants were moved to soil, and independent transgenic lines were numbered according to the order in which they were moved to soil.
Barcode primer design and amplification
Primers with no overhangs were tested to verify sufficient amplification of transgenic T0 plants from the library (forward primer, 5′-cacatcgcttagataagaaaacg-3′; reverse primer, 5′-cctaggtaatgccaactttgtac-3′). PCR was conducted using the Vazyme X2 Rapid Taq Kit with annealing at 54 °C, elongation for 5 s, 30 cycles. Next, 64 barcode sequences, each 8 nucleotides in length, were retrieved from Hamady et al.63. These barcodes were designed following specific criteria such as 40–60% G/C content and no consecutive triplets and avoid self-complementarity to ensure optimized PCR amplification. Thirty-two of the sequences were concatenated to the 5′ end of the forward primer, and the remaining 32 to the 5′ end of the reverse primer. A list of all final primers can be found in Supplementary Table 1, and a table of all combinations can be found in Supplementary Table 2.
Each sample was amplified using a unique combination of barcode primers. The length of each amplified fragment, including the two 8-bp barcodes, should be 222 bp. At this stage, the individual PCR products were pooled together and run on a gel using electrophoresis. The DNA was isolated from the gel using the Nucleospin Gel and PCR Clean-up Kit from Macherey and Nagel, and samples were PE−150 deep sequenced by Novogene.
The analysis of deep sequencing PE150 data commenced with rigorous quality checks to ensure the reliability of the reads. Each pair of reads was examined for an overlapping region of 78 bp, encompassing the sgRNA. Reads with discrepancies in this region were discarded. Additionally, reads containing barcode sequences at the 5′ end of reads that did not match our predefined list were removed from the dataset. Non-variable regions in the sequencing were scrutinized, and reads with mismatches, insertions, or deletions were removed as well. Once high-quality reads were obtained, we proceeded to assign pairs of reads to their respective plant numbers using the unique barcode combinations listed in Supplementary Table 2. Once the plant of origin was identified, the corresponding sgRNA sequence was assigned to that plant.
Despite the initial filtering, the data contained some “noise,” necessitating further evaluation to extract meaningful information. We categorized reads per plant based on their abundances and conducted a careful examination for adequate representation. Plants with the most abundant reads accounting for less than 20% of all reads for that plant, or with limited occurrences (<750 reads), were classified as having insufficient sequencing results. Plants for which the most abundant read was at least three times more frequent than the next most abundant read were identified as containing a single sgRNA. For the remaining plants, we analyzed the reads in order of abundance. When we encountered a read that was more than double the abundance of the following read, we determined the number of sgRNAs present in the plant, resulting in 2, 3, or in rare cases 4 sgRNAs.
Gene family classification and multi-target sgRNA design
The set of protein-coding genes of S. lycopersicum was obtained from the PLAZA 4.5 Plant Comparative Genomics Database64. To efficiently design sgRNAs for each gene family, we utilized the CRISPys algorithm, while considering homologous relationships within each family38. For a given gene family, its alignment was computed using MAFFT version 765, and phylogeny was reconstructed using a hierarchical clustering algorithm66. Large gene families were partitioned to subfamilies of at most eight genes.
The sΩ design strategy of CRISPys was then applied recursively to each subgroup of each gene tree, determining the optimal sgRNAs for targeting specific subfamilies. The CRISPys algorithm utilized the CFD score39 as the scoring function, with a targeting efficacy threshold (Ω) of 0.8. The potential sgRNA targets were confined to the first two-thirds of the coding sequence.
To avoid generating the same sgRNA twice, we accounted for cases where the same sgRNAs could be assigned to different subgroups of homologous genes, with one subgroup being a subset of another (e.g., generating candidate sgRNAs for both {gene1, gene2} and {gene1, gene2, gene3}) as a subset of homologous genes). In such cases, only one occurrence of the sgRNA was considered. Additionally, to prevent multiple sgRNAs from targeting essentially the same genomic region, we allowed a maximum of 2 bp overlap between sgRNAs.
After generating the sgRNAs, a genome-wide off-target detection search was applied. We defined an off-target gene as a potential genomic target outside the specified gene family, whereas on-target genes were genomic targets within the family, even with some mismatches. The off-target threshold was set at one-fifth of the on-target score. For instance, if an sgRNA had an on-target score of 0.9, the off-target threshold would be 0.18 (0.9*0.2 = 0.18). Any sgRNA with an off-target sequence having a CFD score above 0.18 was discarded, ensuring the selection of specific sgRNA sequences. Following this filtering, we limited the number of designed sgRNAs to a maximum of 8 per subgroup (i.e., internal node in the tree).
Phylogenetic tree building
Gene families were taken from Plaza Dicots 4.5. Amino acid sequences of proteins in every family of interest were aligned using MAFFT version 7 using BLOSUM62 scoring matrix with a gap opening penalty of 1.53 and an offset value of 0. Results were reformatted to PAUP/NEXUS format and downloaded from MAFFT website65,67. Nexus files were uploaded to phylogeny.fr using the “one-click” mode for analysis, with the Gblocks program enabled. Completed trees were downloaded in Newick format and rendered via FigTree v1.4.4 software.
Xanthomonas euvesicatoria bacterial inoculation assay
X. euvesicatoria strain 85−10 (Xe 85−10) was vacuum infiltrated into the leaves of 4-week-old wild-type M82 sp- and npf1 mutant plants. The bacteria were inoculated into the tomato leaves at a dilution of 2 × 105 colony-forming units/ml (O.D.600 = 0.0004) in a solution containing 10 mM MgCl2 and 0.08% Silwet L-77. Following the inoculation, 1 cm2 leaf discs were punched out of the third leaf at 0 DPI (i.e., at 3 h after inoculation) and at 3 DPI. Bacteria were quantified. Three leaf discs from the bacteria-inoculated leaves were then crushed in 2.00 ml Eppendorf tubes with 10 mM MgCl2. Next, tomato leaf extracts containing the bacteria were serially diluted and spotted onto LB agar plates to count the number of bacterial colonies that grew. Leaves were photographed at 4 DPI.
Shoot and root phenotyping
Seeds were germinated in soil with slow-release fertilizer (vendor). Plants received regular fertilization and drip irrigation (2 X 5 min daily). After 28 days, plants were moved to 12-L sand pots. Control conditions contained 3% phosphate (70 ppm N, 30 ppm P2O5, 70 ppm K2O, micronutrients - Koratin); and low-phosphate conditions contained 0.3% phosphate (70 ppm N, 3 ppm P2O5, 70 ppm K2O, micronutrients - Koratin). At 2 months, plants were scanned and analyzed using the Phenoroot imaging system (phenoroot.com). Shoots were cut at the soil level, separating the shoot from the root. Shoots were placed in brown paper bags at 60 °C for 72 h. After drying, each plant was weighed to determine shoot biomass. List of fertilizers and suppliers: Multi-K (13-0-46) (Haifa Group), Phosphoric Acid 85% (Haifa Group), Liquid Ammonium Sulfate 21% (Deshen Gat), Koratin (ICL).
For root length measurements on ½ MS plates with varying phosphate (P) levels, standard ½MS medium was used for 100% phosphate plates. For 0% phosphate plates, ½MS -P (phosphate-free) plates were prepared using MS -P powder (catalog number MSP11, Caisson Labs). To create plates containing 10% phosphate, standard ½ MS medium was mixed with ½MS -P medium in a 1:9 ratio, achieving a final phosphate concentration of 10% relative to standard MS. Sterilized M82 and pt2/6 seeds were germinated and root length measured after 6 days using ImageJ software.
Fruit phenotyping
Tomato CRISPR lines were grown under normal conditions in the greenhouse (25–30 °C), and naturally ripened tomato fruits were harvested for phenotyping characterization. The determination of total soluble solids of tomatoes was performed using a digital Brix refractometer (ATAGO PAL-BX/ACID3). The Brix measurement was conducted using juice extracted from ripe fruits. The diameter of the ripe fruits was measured using a vernier caliper. To measure the length of the pointed tip of the fruit, mature fruits were photographed longitudinally and analyzed using the software ImageJ. For the statistics of the weight of a single fruit, fruits of relatively uniform size on the plants at the age of 18 weeks were selected for measurement. After that, the seeds of each single fruit were scooped out for counting the number of seeds in a single fruit. Each biological replicate included 30 seeds that were germinated in a culture dish, and the germination rate was counted after 2 weeks. The germination rate = (the number of germinated seeds / 30) × 100%. For each mutant line, the measurements were taken from at least three biological replicates for analysis.
Statistics and reproducibility
Statistical analysis was performed using Microsoft Excel 2019 and GraphPad Prism v.8.0 (GraphPad software). Statistical significance was determined by using unpaired two-tailed Student’s t-test for two-group comparisons and one-way ANOVA for multiple comparisons. Asterisks indicate significant differences (*P < 0.05, **P < 0.01, ***P < 0.001). Different lowercase letters indicate significant differences (P < 0.05). All experiments were repeated independently three times with consistent results. No statistical method was used to predetermine sample size, no data were excluded from the analyses, the experiments were not randomized, and the investigators were not blinded to allocation during experiments and outcome assessment.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All Source data are provided with this paper, including the genome-scale library design and output and CRISPR Guidemap data. Genomonic data is open access and available at PLAZA 4.5 https://bioinformatics.psb.ugent.be/plaza/versions/plaza_v4_5_dicots and the Solanaceae Genomics Network (SGN; https://solgenomics.net/ftp/genomes/Solanum_pangenomics/). CRISPys algorithm is available at https://github.com/HyamsG/CRISPys as an open-source package. Source data are provided with this paper.
References
Mullins, E. et al. In vivo and in vitro random mutagenesis techniques in plants. EFSA J. 19, e06611 (2021).
Sage, E. & Shikazono, N. Radiation-induced clustered DNA lesions: Repair and mutagenesis. Free Radic. Biol. Med. 107, 125–135 (2017).
Roots, R. & Okada, S. Estimation of life times and diffusion distances of radicals involved in X-ray-induced DNA strand breaks or killing of mammalian cells. Radiat. Res. 64, 306–320 (1975).
Sikora, P. et al. Mutagenesis as a tool in plant genetics, functional genomics, and breeding. Int. J. Plant Genomics 2011, 314829 (2011).
Li, S. et al. Frequency and type of inheritable mutations induced by γ rays in rice as revealed by whole genome sequencing. J. Zhejiang Univ. Sci. B 17, 905–915 (2016).
Li, F. et al. Comparison and characterization of mutations induced by gamma-ray and carbon-ion irradiation in rice (Oryza sativa L.) using whole-genome resequencing. G3: Genes, Genomes, Genetics 9, 3743–3751 (2019).
Du, Y. et al. Frequency and spectrum of mutations induced by gamma rays revealed by phenotype screening and whole-genome re-sequencing in Arabidopsis thaliana. Int. J. Mol. Sci. 23, 654 (2022).
Rizzon, C. et al. Striking similarities in the genomic distribution of tandemly arrayed genes in Arabidopsis and rice. PLoS Comput. Biol. 2, e115 (2006).
Tuskan, G. A. et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313, 1596–1604 (2006).
Noh, B. et al. Multidrug Resistance-like Genes of Arabidopsis Required for Auxin Transport and Auxin-Mediated Development. Plant Cell 13, 2441–2454 (2001).
Hauser, F. et al. A genomic-scale artificial MicroRNA library as a tool to investigate the functionally redundant gene space in arabidopsis. Plant Cell 25, 2848–2863 (2013).
Hanada, K. et al. Evolutionary persistence of functional compensation by duplicate genes in Arabidopsis. Genome Biol. Evol. 1, 409–414 (2009).
Rukavtsova, E. B., et al. CRISPR-Cas genome editing for horticultural crops improvement: advantages and prospects. Horticulturae 9, 38 (2023).
Jaganathan, D. et al. CRISPR for crop improvement: an update review. Front. Plant Sci. 9, 985 (2018).
Kleinstiver, B. P. et al. High-fidelity CRISPR-Cas9 nucleases with no detectable genome-wide off-target effects. Nature 529, 490–495 (2016).
Armario Najera, V. et al. Applications of multiplex genome editing in higher plants. Curr. Opin. Biotechnol. 59, 93–102 (2019).
Lorenzo, C. D. et al. BREEDIT: a multiplex genome editing strategy to improve complex quantitative traits in maize. Plant Cell 35, 218–238 (2023).
Luo, J. et al. Pyramiding favorable alleles in an elite wheat variety in one generation by CRISPR-Cas9-mediated multiplex gene editing. Mol. Plant 14, 847–850 (2021).
Sulis, D. B. et al. Multiplex CRISPR editing of wood for sustainable fiber production. Science 381, 216–221 (2023).
Hashimoto, R. et al. Efficient multiplex genome editing induces precise, and self-ligated type mutations in tomato plants. Front Plant Sci. 9, 916 (2018).
Kumar, A. et al. Gene Editing in Plants CRISPR-Cas and Its Applications (Springer, 2024).
Yang, T. et al. Recoloring tomato fruit by CRISPR/Cas9-mediated multiplex gene editing. Hortic. Res 10, uhac214 (2023).
Li, X. et al. Lycopene is enriched in tomato fruit by CRISPR/Cas9-mediated multiplex genome editing. Front Plant Sci. 9, 559 (2018).
Vazquez-Vilar, M. et al. The GB4.0 Platform, an All-In-One Tool for CRISPR/Cas-Based Multiplex Genome Engineering in Plants. Front Plant Sci. 12, 689937 (2021).
Jacobs, T. B. et al. Generation of a collection of mutant tomato lines using pooled CRISPR libraries. Plant Physiol 174, 2023–2037 (2017).
Meng, X. et al. Construction of a genome-wide mutant library in rice using CRISPR/Cas9. Mol. Plant 10, 1238–1241 (2017).
Lu, Y. et al. Genome-wide targeted mutagenesis in rice using the CRISPR/Cas9 system. Mol. Plant 10, 1242–1245 (2017).
Wang, F. et al. CRISPR/Cas9-mediated generation of a mutant library of cotton CDPK gene family for identifying insect-resistant genes. Plant Commun. 5, 101047 (2024).
Bi, M. et al. Construction of transcription factor mutagenesis population in tomato using a pooled CRISPR/Cas9 plasmid library. Plant Physiol. Biochem. 205, 108094 (2023).
Panchy, N. et al. Evolution of gene duplication in plants. Plant Physiol. 171, 2294–2316 (2016).
Li, Z. et al. Gene duplicability of core genes is highly consistent across all angiosperms. Plant Cell 28, 326–344 (2015).
Rensing, S. A. Gene duplication as a driver of plant morphogenetic evolution. Curr. Opin. Plant Biol. 17, 43–48 (2014).
Marin-Gonzalez, A. et al. Multiplexing Cas9 activity with multi-target CRISPR. Biophys. J. 122, S1–556 (2023).
Endo, M. et al. Multigene knockout utilizing off-target mutations of the CRISPR/cas9 system in rice. Plant Cell Physiol. 56, 41–47 (2015).
Prykhozhij, S. V. et al. CRISPR multitargeter: a web tool to find common and unique CRISPR single guide RNA targets in a set of similar sequences. PLoS ONE 10, e0138634 (2015).
Bai, M. et al. Generation of a multiplex mutagenesis population via pooled CRISPR-Cas9 in soya bean. Plant Biotechnol. J. 18, 721–731 (2020).
Hu, Y. et al. Multi-Knock—a multi-targeted genome-scale CRISPR toolbox to overcome functional redundancy in plants. Nat. Plants 9, 572–587 (2023).
Hyams, G. et al. CRISPys: optimal sgRNA design for editing multiple members of a gene family using the CRISPR system. J. Mol. Biol. 430, 2184–2195 (2018).
Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 34, 184–191 (2016).
De Buck, S. et al. The T-DNA integration pattern in Arabidopsis transformants is highly determined by the transformed target cell. Plant J. 60, 134–145 (2009).
Ding, S. et al. Nitrogen forms and metabolism affect plant defence to foliar and root pathogens in tomato. Plant Cell Environ. 44, 1596–1610 (2021).
Wang, M. et al. Plant primary metabolism regulated by nitrogen contributes to plant-pathogen interactions. Plant Cell Physiol 60, 329–342 (2019).
Shoji, T. & Saito, K. Downregulation of a cluster of genes encoding nitrate transporter 1/peptide transporter family proteins in tomato with a mutated JRE4 transcription factor. Plant Biotechnology 40, 71–76 (2023).
Rui, W. et al. Genome-wide analysis of the PHT gene family and its response to mycorrhizal symbiosis in tomatoes under phosphate starvation conditions. Int. J. Mol. Sci. 24, 10246 (2023).
Chen, A. et al. Genome-wide investigation and expression analysis suggest diverse roles and genetic redundancy of Pht1 family genes in response to Pi deficiency in tomato. BMC Plant Biol. 14, 61 (2014).
Waese, J. et al. ePlant: Visualizing and exploring multiple levels of data for hypothesis generation in plant biology. Plant Cell 29, 1806–1821 (2017).
Hu, G. et al. The auxin-responsive transcription factor SlDOF9 regulates inflorescence and flower development in tomato. Nat. Plants 8, 419–433 (2022).
Fan, B. et al. Calcium-dependent activation of CPK12 facilitates its cytoplasm-to-nucleus translocation to potentiate plant hypoxia sensing by phosphorylating ERF-VII transcription factors. Mol. Plant 16, 979–998 (2023).
Deng, H. et al. Ethylene-MPK8-ERF.C1-PR module confers resistance against Botrytis cinerea in tomato fruit without compromising ripening. New Phytologist 242, 592–609 (2024).
Hu, K. et al. The H2S-responsive transcription factor ERF.D3 regulates tomato abscisic acid metabolism, leaf senescence, and fruit ripening. Plant Physiol 197, kiae560 (2024)
Chung, M. Y. et al. A tomato (Solanum lycopersicum) APETALA2/ERF gene, SlAP2a, is a negative regulator of fruit ripening. Plant J. 64, 936–947 (2010).
Van Rengs, W. M. J. et al. A chromosome scale tomato genome built from complementary PacBio and Nanopore sequences alone reveals extensive linkage drag during breeding. Plant J. 110, 572–588 (2022).
Hu, W. X. et al. ExsgRNA: reduce off-target efficiency by on-target mismatched sgRNA. Brief Bioinform 23, bbac183 (2022).
Haggard, J. E. et al. Linkage relationships among multiple QTL for horticultural traits and late blight (P. infestans) resistance on chromosome 5 introgressed from wild tomato solanum habrochaites. G3: Genes, Genomes, Genetics 3, 2131–2146 (2013).
Fuentes, R. R. et al. A catalogue of recombination coldspots in interspecific tomato hybrids. PLoS Genet 20, e1011336 (2024).
Feng, H., Guo, J., Wang, T., Zhang, C. & Xing, X. H. Guide-target mismatch effects on dCas9-sgRNA binding activity in living bacterial cells. Nucleic Acids Res 49, 1263–1277 (2021).
Grützner, R. et al. High-efficiency genome editing in plants mediated by a Cas9 gene containing multiple introns. Plant Commun. 2, 100135 (2021).
Fu, R. et al. Systematic decomposition of sequence determinants governing CRISPR/Cas9 specificity. Nat. Commun. 13, 474 (2022).
Yaish, O. et al. A systematic evaluation of data processing and problem formulation of CRISPR off-target site prediction. Brief. Bioinform. 23, bbac157 (2022).
Zhang, D. et al. Unified energetics analysis unravels SpCas9 cleavage activity for optimal gRNA design. Proc. Natl Acad. Sci. USA 116, 8693–8698 (2019).
Weber, E. et al. A modular cloning system for standardized assembly of multigene constructs. PLoS ONE 6, e16765 (2011).
McCormick, S. et al. Transformation of tomato with Agrobacterium tumafaciens. In Plant Tissue Culture Manual B6 (Kluwer Academic Publishers, 1991).
Hamady, M. et al. Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex. Nat. Methods 5, 235–237 (2008).
Proost, S. et al. PLAZA 3.0: an access point for plant comparative genomics. Nucleic Acids Res 43, D974–D981 (2015).
Katoh, K. et al. MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Brief Bioinform 20, 1160–1166 (2018).
Gronau, I. & Moran, S. Optimal implementations of UPGMA and other common clustering algorithms. Inf. Process. Lett. 104, 205–210 (2007).
Kuraku, S. et al. Leaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity. Nucleic Acids Res. 41, W22–W28 (2013).
Acknowledgements
A.B.’s scholarship was partially funded by the ADAMA Center Fellowship and the Israeli Ministry of Environmental Protection fellowship. This work was supported by the Israel Science Foundation (3419/20 and 1462/24 to E.S.), the European Research Council (757683-RobustHormoneTrans, 101118769-HYDROSENSING and 101113412-Multi-Crop to E.S.), the Chinese Academy of Sciences BR plan (E3ER6401A2 to Y.Z.), the Zimin Institute (to E.S. and I.M.), and the Self-Deployment Program of University of Chinese Academy of Sciences (E3E46401X2 to Y.Z.).
Author information
Authors and Affiliations
Contributions
A.B. and Y.Z. cloned the libraries. A.B. performed M82 tissue culture, genetic screening, Xanthomonas assays, root experiments, and wrote the manuscript. N.S. led the AC tomato fruit genetic screen and phenotypic characterization and contributed to writing. Z.L. participated in the AC tomato fruit genetic screen and phenotypic analyses. U.L. designed the CRISPR libraries. J.C. carried out Xanthomonas infection assays. N.G. contributed to tissue culture work. J.L., Y.Q., B.Y., and W.W. contributed to the AC tomato fruit genetic screen. O.Y. conducted field experiments. I.M. co-conceptualized the CRISPR library design and contributed to manuscript writing. Y.Z. and E.S. led the conceptual development of the study and contributed to writing the manuscript.
Corresponding authors
Ethics declarations
Competing interests
E.S and I.M are co-founders of NetaGenomiX, a biotech company associated with this study. The other authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Ana Laxalt and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Berman, A., Su, N., Li, Z. et al. Construction of multi-targeted CRISPR libraries in tomato to overcome functional redundancy at genome-scale level. Nat Commun 16, 4111 (2025). https://doi.org/10.1038/s41467-025-59280-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-025-59280-6
This article is cited by
-
Thermotolerance for yield and quality in rice: a novel NF-Ys-QT12 gene system
Functional & Integrative Genomics (2025)