OmicsAnalyst

1. Introduction

This tutorial covers Upload Mode 1 (Data Tables) for Statistical Integration in OmicsAnalyst 2.0. After uploading your data tables and metadata file, you will use the Multi-omics Data Harmonization page to review, edit, and prepare your data for downstream analysis.

Tutorial Scope: This tutorial is specifically for Upload Mode 1 (Data Tables). If you want to upload pre-identified feature lists for Network Integration (Upload Mode 2), please refer to the Network Integration Tutorial.

Note: All figures in this tutorial are generated using the default settings with Example 1 dataset. Your actual results may vary depending on your data and chosen parameters.

2. Data Format Requirements

Required Files

Metadata file: A single file describing sample information and experimental factors
Data tables: At least two omics data tables (expression/abundance matrices)

Supported File Formats

Format	Extension	Max Size
CSV	.csv	50 MB (Genes/mRNAs), 25 MB (others)
TXT	.txt	50 MB (Genes/mRNAs), 25 MB (others)
TSV	.tsv	50 MB (Genes/mRNAs), 25 MB (others)

Data Table Structure

Each omics data table should follow this structure:

First column: Feature identifiers (gene symbols, protein IDs, metabolite names, etc.)
Subsequent columns: Sample measurements (one column per sample)
First row: Header with sample names matching the metadata file

Data Table Structure Example — Figure 1: Example data table structure

Metadata File Structure

A metadata file describing sample information is required:

First column: Sample names (must match column headers in data files)
Second column: Primary study factor (group/condition) - no missing values allowed
Additional columns: Other sample attributes (batch, clinical variables, etc.)

Metadata File Structure Example — Figure 2: Example metadata file structure

Supported Omics Types and ID Formats

Omics Type	Supported ID Formats
Genes/mRNAs	Entrez ID, Ensembl Gene ID, Official Gene Symbol, RefSeq ID
Proteins	UniProt Protein ID, Entrez ID, Ensembl Gene ID, Official Gene Symbol
miRNAs	miRBase mature ID, miRBase accession, miRBase ID (e.g., hsa-miR-21)
Metabolites	KEGG ID, PubChem ID, HMDB ID, Common Name
Microbiome	Taxonomy label, OTU ID (Phylum to Strain level)

3. Metadata Overview

After uploading your files, you will see the Metadata Overview tab where you can review and edit your metadata before proceeding to analysis.

Metadata Type Settings

Ensure the type (discrete or continuous) for each metadata column is correct:

Discrete (Categorical): For experimental groups (e.g., control vs. diseased). Requires at least two groups and three replicates per group.
Continuous: For numerical measures. All values must be numerical.

Metadata Editor

The metadata editor provides several functions to prepare your metadata:

Edit Metadata Button

Click the "Edit metadata" button to access the following options:

Include/Exclude columns: Select which metadata columns to include in your analysis
Primary metadata: Set the main metadata of interest (only categorical type accepted)
Order (factor-level): Reorder the levels of categorical metadata for display and analysis

Edit Metadata Dialog — Figure 4: Edit Metadata dialog showing Include/Exclude, Primary metadata, and Order tabs

Row-Level Editing

Edit values: Click the edit icon on any row to modify sample metadata values
Remove samples: Click the trash icon to exclude samples that don't meet requirements
Reset: Click "Reset" to restore the original metadata

Missing Values: Missing values (NA) in metadata will cause samples to be filtered out during differential expression analysis. You can manually update missing values using the row editor, or affected samples will be excluded automatically.

4. Omics Data Overview

The Omics Data Overview tab allows you to apply data processing and visualize quality control plots.

Why Data Harmonization? Different omics data types often have very different numbers of features and variances. Some multi-omics integration methods are sensitive to imbalanced dimensionality or variance (i.e., omics layers with many more features or large variance could dominate the analysis). Data filtering and scaling help make the data more comparable across layers.

Data Filtering

Filter out low-quality or uninformative features:

Option	Description
Dataset	Select which dataset to filter, or "Apply to all" for all datasets
Method	Filtering method (e.g., variance-based, mean-based)
Percentage to filter out	Percentage of features to remove (0-100%)

Data Scaling

Scale your data to make different omics layers comparable:

Method	Description	Best For
None	No scaling applied	Data already normalized
Auto Scaling	Mean-center and divide by standard deviation	General purpose, most multi-omics analyses
Pareto Scaling	Mean-center and divide by square root of standard deviation	Metabolomics data
Range Scaling	Scale to 0-1 range	When absolute ranges matter

Click the "Update" button to apply your filtering and scaling choices.

Data Filtering and Scaling Options — Figure 6: Data Filtering and Scaling options panel

Quality Control Plots

Two key visualizations help you assess data quality:

Density Plot

The aggregated density plot compares the distributions across different omics layers. If the overall distributions are in very different ranges, it is advised to perform scaling.

PCA Plot

PCA plots are generated separately for each omics data to show patterns within each layer. Use these to visually check for:

Sample clustering by biological condition (expected)
Batch effects (samples cluster by batch instead of condition)
Outlier samples (samples far from others)

Batch Effect Warning: If significant batch effects are present in individual omics layers (samples cluster by batch rather than biological condition), try accounting for batch effects prior to uploading your data to OmicsAnalyst, or include batch as a covariate in downstream analyses.

5. Next Steps

After completing data harmonization, click "Proceed" to continue to Statistical Integration for single-omics characterization, pairwise omics analysis, and multi-omics integration methods.

Data Upload and Quality Control

Table of Contents

1. Introduction

2. Data Format Requirements

Required Files

Supported File Formats

Data Table Structure

Metadata File Structure

Supported Omics Types and ID Formats

3. Metadata Overview

Metadata Type Settings

Metadata Editor

Edit Metadata Button

Row-Level Editing

4. Omics Data Overview

Data Filtering

Data Scaling

Quality Control Plots

Density Plot

PCA Plot

5. Next Steps