About FuzzyClusTeR Home

FuzzyClusTeR analyzes the distribution of specified sequence patterns in nucleotide sequences and enables the study of both classical tandem repeats and diffuse repeat clusters, in which related motifs occur nearby without forming continuous arrays. The algorithm identifies occurrences of a given motif and its reverse complement separately in linear sequences, determines inter-repeat intervals, and detects repeat clusters.

Different parameter settings allow the identification of diffuse (fuzzy) clusters as well as tandem perfect or interrupted repeat clusters.

Patterns

Custom or predefined (telomeric) patterns can be searched within the input sequences.

TTAGGG – perfect TTAGGG telomeric repeats and their complement CCCTAA.

FuzzyTel – a composite pattern including the most common telomeric repeat variants.

T{1,2}A{0,1}G{3,5}|T{1,2}\D{1}A{1}G{3,5}|T{2}A{1}G{2}
C{3,5}T{0,1}A{1,2}|C{3,5}T{1}\D{1}A{1,2}|C{2}T{1}A{2}

Custom pattern – can be specified using the “Custom pattern” option. Multiple motifs can be provided using the "|" symbol (up to three motifs).

Examples: TCACCC (single motif), TCAGG|TCACG (multiple motifs). We recommend keeping motifs in a multiple-pattern query similar in length. Custom motifs 3 to 30 nucleotides long are supported.

Sequences and files

FASTA files up to 250 Mb can be uploaded. Alternatively, a nucleotide sequence (up to 10,000 characters) can be entered directly in the “Sequence” field. Standard IUPAC nucleotide codes are supported. Degenerate letters are considered for loop-length determination but are not used for pattern matching.

Supported parameters

FuzzyClusTeR supports parameterized identification of repeat clusters defined by different sequence patterns. Users can adjust the Loop size, Cluster Score, Score Significance Ratio (SSR), and the minimum number of repeat units per cluster.

Cluster Score – reflects the density of repeats within an identified cluster.

Loop – the inter-repeat interval used in the clustering algorithm and for visualization.

SSR – the ratio between the theoretical Cluster Score and the observed Cluster Score.

Repeats – the minimum number of repeat units required to define a cluster.

Genomes

FuzzyClusTeR currently supports two versions of the human genome: GRCh38.p14 and T2T-CHM13v2.0. Analyses are performed at the chromosome level.

Smaller linear genomes (files less than 250 Mb and not exceeding 17 sequences) can be uploaded using the “Upload File” option.

Please contact us if you would like to submit a batch job for a larger genome, analyze files larger than 250 Mb, or search for more complex patterns.

Output

FuzzyClusTeR generates CSV files containing individual pattern hits (repeatsF.csv and repeatsR.csv) and identified repeat clusters (clustersF.csv and clustersR.csv).

In addition, the server produces graphical outputs including loop length distribution maps (Loops_map), loop length histograms (Loops_hist), and cluster distribution maps (Clusters_map).