Start an Analysis

Gene Set Plot visualizes gene set enrichment results from genes you provide. Begin on the Input page, choose one input style, choose a gene set dataset, select one or more gene set collections, then click Submit.

1. Choose an input style

  • Scored Genes: Upload a .tsv or .txt file with exactly two columns and no header: gene name and numeric score. Use this when each gene has a statistic such as log fold change, correlation, or model score.
  • Thresholded Genes: Paste significant genes and insignificant genes into separate boxes. Genes may be separated by commas, spaces, or new lines.
  • Ranked Genes: Paste one ordered gene list. Genes may be separated by commas, spaces, or new lines.

2. Pick a directional hypothesis

  • Scored and ranked inputs support Positive, Negative, and Two-Sided tests.
  • Positive tests look for gene sets enriched near the positive or top end of the input. Negative tests look near the negative or bottom end. Two-sided tests keep both directions and color positive and negative enrichment separately.

3. Choose gene set data

  • Use Human (MSigDB) or Mouse (MSigDB) for built-in collections.
  • Choose Upload your own to provide a custom gene set JSON file. MSigDB-like JSON files are shown as a selectable collection tree. Flat custom files are loaded as one custom dataset.

4. Select collections

  • Click a collection node to expand or collapse it. Hold Ctrl on Windows/Linux or Cmd on macOS while clicking to select collections.
  • Use Select All to select all top-level collections, or Clear All to reset your selections.
  • Set the minimum relevant members per gene set. The default and recommended minimum is 5.

Scored gene file example

TP53	2.84
MYC	1.91
STAT1	-1.32
CXCL10	-2.20

Graph workflow

  • After Submit, each point represents a gene set that passed the current threshold. Nearby points have more similar matched gene sets under the selected distance metric and dimensionality reduction algorithm.
  • Hover over a point to see its gene set name and q-value.
  • Click the gear button in the top navigation to open or close the graph settings panel.
  • Use Input in the navigation bar to clear the current graph and start a new analysis.

Select and compare points

  • Click a point to add it to Selected Points. Click it again to remove it.
  • Use box select or lasso select to select multiple points at once.
  • Selected Points shows the gene set name, p-value, q-value, enrichment direction, and matched molecules for each selected point.
  • When multiple points are selected, genes shared by selected gene sets are bolded in the Selected Points tables.
  • Clear Selected removes all current point selections.

Labels and clustering

  • Toggle Labels shows or hides point labels for the currently displayed gene sets.
  • If Cluster Labeling is enabled, cluster labels are added automatically at the centers of detected clusters.
  • Point labels can be edited and moved on the graph. Cluster labels stay tied to cluster mode and are regenerated when the graph is recomputed.

Export results

  • Use the camera icon in the Plotly mode bar to download the current graph as a PNG image.
  • Click Download Result below the graph to export analysis_results.tsv. The file includes gene set name, p-value, q-value, set size, and matched genes, sorted by p-value.

Graph and Analysis Settings

  • Submit validates the current input, selected collections, threshold, and settings, runs necessary analysis, then generates the graph.
  • Apply saves settings changes. If a graph already exists, Apply updates the graph immediately when possible.
  • Color, size, selected-gene display, and label-related changes are fast visual updates. Threshold, distance, reduction, and cluster computation changes may rerun the analysis.
  • In standard mode, points are colored by q-value. The least significant color maps to larger q-values and the most significant color maps to smaller q-values.
  • In two-sided scored or ranked analyses, the most positive and most negative colors show enrichment direction, while the least significant color is the neutral midpoint.
  • The selected color controls the outline color around selected points.
  • Fixed: every point uses the same marker diameter.
  • Dynamic: point size is based on the number of matched molecules in the gene set. Increase the scalar to make differences more visible.
  • For thresholded input, Show significant genes only limits selected point tables to genes from the significant gene box.
  • For ranked input, the same control shows matched genes in enrichment order for a single selected point.
  • For scored input, the control shows leading-edge genes when they are available.
  • Multi-select overlap comparisons still use the full matched gene set.
  • Choose one threshold type: FDR or P-Value. Enter a value only in the active field.
  • Thresholds filter which gene sets are included in the graph. Stricter thresholds usually produce fewer points.
  • After an analysis, the inactive threshold field may be filled with the corresponding computed value.
  • UMAP: default layout method. Number of Neighbors controls local versus global structure, Minimum Distance controls how tightly points can pack, and Random State sets reproducibility. A seed of 0 or blank uses random initialization.
  • t-SNE: useful for local neighborhood structure. Perplexity controls neighborhood scale, Early Exaggeration affects early cluster separation, and Max Iterations controls optimization length.
  • Isomap: preserves graph geodesic structure. Number of Neighbors controls the neighbor graph used for the embedding.
  • Jaccard Distance: compares overlap relative to the union of two gene sets.
  • Overlap Coeff: compares overlap relative to the smaller gene set. This can make subset-like relationships appear closer.
  • Weighted metrics include available rank or score information. Plain metrics use membership overlap only.

Turn on Cluster mode if you want the graph to color points by detected clusters instead of by q-value. This also shows cluster labels on the graph. Turn Cluster mode off to go back to q-value coloring.

HDBSCAN settings

Min Cluster Size

This sets the smallest number of nearby points that can be treated as a cluster.

  • Increase it if the graph shows too many tiny clusters or messy labels.
  • Decrease it if the algorithm is missing small groups that look meaningful to you.
Min Samples

This controls how strict the algorithm is when deciding whether points belong in a cluster.

  • Leave it blank to use the default behavior.
  • Increase it if you want only denser, more reliable clusters to be colored, with more isolated points shown as noise.
  • Decrease it if too many points are being marked as noise instead of joining a cluster.
Good starting approach for HDBSCAN

Start with Min Cluster Size close to the smallest group you would actually want to interpret.

  • If the graph looks too broken up, increase Min Cluster Size.
  • If clusters disappear too easily, lower Min Samples or leave it blank.

OPTICS settings

Min Samples

This sets how many nearby points are needed before an area is considered dense enough to form a cluster.

  • Increase it for fewer, larger, and more stable clusters.
  • Decrease it if you want to detect smaller or looser groups.
Min Cluster Size

This sets the smallest cluster size OPTICS is allowed to keep.

  • Leave it blank if you want OPTICS to use Min Samples automatically.
  • Increase it to hide very small clusters.
  • Decrease it to keep smaller visible groups.
Xi

This controls how easily OPTICS splits one region into separate clusters.

  • Lower Xi gives more clusters by splitting regions more easily.
  • Higher Xi gives fewer clusters by merging nearby structure together.
Good starting approach for OPTICS

A common starting point is:

  • Min Samples = 5
  • Xi = 0.05

Then adjust based on what you want to see:

  • If there are too many clusters, increase Min Samples or Xi.
  • If large regions are being merged too much, decrease Xi.
  • If small groups are missing, decrease Min Samples or Min Cluster Size.

Noise points

Points that are not assigned to any cluster are shown in gray.