Microarray Expression

With more than a decade of commercial availability behind them, microarrays are still a popular method of gene expression profiling despite the increasing popularity of RNA-Seq. Whether you are working with microarrays looking to re-analyse pre-existing results or trying to perform concordance studies with RNA-Seq, CodeLinker is the tool to answer all your needs. CodeLinker uses the metaphor of the experiment for each analysis or step. This means that you can experiment to get the right settings and use the right analyses to best illustrate your data and results.

Importing data, while not even an analysis step, is one that can trip up many scientists as they struggle to  get their data into the correct format. It is not a problem with CodeLinker since it uses Import Templates to do the hard work for you. Missing data is one of the banes of microarray analysis algorithms, so you will be pleased to know that this step is there, too, and locating and removing is as straightforward as clicking a few buttons. Data Normalization is a hugely important step, correcting for technical variations between samples or unintentional differences between microarray batches, and there is plenty of choice from Linear Regression, Division by Central Tendency to Logarithm, Divide by Maximum, and Scaling Between 0 and 1.

When it comes to analysis, you expect to find methods like K-Means or Hierarchical Clustering and SOMs. These are great methods for grouping patterns in your data based on their statistical behavior. These algorithms are looking for similarity in the expression profiles of your data and grouping those genes or sample sets accordingly.

When working with gene expression profiles, you are dealing with thousands of pieces of data. Depending on the size of your dataset, this can create overly dense plots from which it is impossible to derive meaning. This is where PCA (Principal Component Analysis) comes in. The role of PCA is to reduce the complexity of your dataset into a reduced set of uncorrelated variables or principal components. The analysis seeks to explain the maximum amount of variance with the least number of principal components. Thus, if you had 30 samples in your dataset, it would be impossible to plot the data points on 30 orthogonal axes (2-3 is the most we can achieve on 2 D displays), but with PCA, this can be reduced to something that can be displayed and is meaningful.

There are also some methods you won’t have heard of before such as or Integrated Bayesian Inference System (IBIS) and SubLinear Association Mining (SLAM).  IBIS is able to predict class membership for gene expression. It will help you find a candidate set of genes on which to pick a classifier for your dataset. Once you have your list, you can then perform predictive modeling with artificial neural networks (ANNs), first using a training set of data where the outcomes are known prior to analyzing test data. SLAM is a form of association mining which searches for associations in your gene expression data. It detects associations which occur together at a rate greater than by chance. This means that you can then use these associations to find genes which are connected to specific sample classes.

Each of the methods mentioned has its own set of visualizations which can be exported in one of a number of common graphic formats. This means that not only can you share your results with colleagues but you can produce outstanding graphics for presentations and publications.