Class DataUpdaterImpl
- java.lang.Object
-
- ubic.gemma.core.loader.expression.DataUpdaterImpl
-
- All Implemented Interfaces:
DataUpdater
@Service public class DataUpdaterImpl extends Object implements DataUpdater
Update or fill in the data associated with an experiment. Cases include reprocessing data from CEL files (Affymetrix, GEO only), inserting data for RNA-seq data sets but also generic cases where data didn't come from GEO and we need to add or replace data. For loading experiments from flat files, see SimpleExpressionDataLoaderService- Author:
- paul
-
-
Constructor Summary
Constructors Constructor Description DataUpdaterImpl()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addAffyDataFromAPTOutput(ExpressionExperiment ee, String pathToAptOutputFile)
Affymetrix: Use to bypass the automated running of apt-probeset-summarize.void
addCountData(ExpressionExperiment ee, ArrayDesign targetArrayDesign, DoubleMatrix<String,String> countMatrix, DoubleMatrix<String,String> rpkmMatrix, Integer readLength, Boolean isPairedReads, boolean allowMissingSamples)
RNA-seq: Replaces data.void
addData(ExpressionExperiment ee, ArrayDesign targetPlatform, ExpressionDataDoubleMatrix data)
Generic but in practice used for RNA-seq.void
log2cpmFromCounts(ExpressionExperiment ee, QuantitationType qt)
RNA-seq: For back filling log2cpm when only counts are available.void
replaceData(ExpressionExperiment ee, ArrayDesign targetPlatform, ExpressionDataDoubleMatrix data)
Replace the data associated with the experiment (or add it if there is none).void
replaceData(ExpressionExperiment ee, ArrayDesign targetPlatform, QuantitationType qt, DoubleMatrix<String,String> data)
Replace the data associated with the experiment (or add it if there is none).void
reprocessAffyDataFromCel(ExpressionExperiment ee)
Affymetrix only: Provide or replace data for an Affymetrix-based experiment, using CEL files.
-
-
-
Method Detail
-
addAffyDataFromAPTOutput
@Transactional(propagation=NEVER) public void addAffyDataFromAPTOutput(ExpressionExperiment ee, String pathToAptOutputFile) throws IOException
Affymetrix: Use to bypass the automated running of apt-probeset-summarize. For example if GEO doesn't have them and we ran apt-probeset-summarize ourselves, or if some GEO files were corrupted (in which case the file used here must have blank columns added with headers for the unused samples). Must be single-platform. Will switch the data set to use the "right" platform when the one originally used was an alt CDF or exon-level, so be sure never to use an alt CDF for processing raw data.- Specified by:
addAffyDataFromAPTOutput
in interfaceDataUpdater
- Parameters:
ee
- eepathToAptOutputFile
- file, presumed to be analyzed using the "right" platform (not an alt CDF or exon-level)- Throws:
IOException
- when IO problems occur.
-
addCountData
@Transactional(propagation=NEVER) public void addCountData(ExpressionExperiment ee, ArrayDesign targetArrayDesign, DoubleMatrix<String,String> countMatrix, DoubleMatrix<String,String> rpkmMatrix, @Nullable Integer readLength, @Nullable Boolean isPairedReads, boolean allowMissingSamples)
RNA-seq: Replaces data. Starting with the count data, we compute the log2cpm, which is the preferred quantitation type we use internally. Counts and FPKM (if provided) are stored in addition. Rows (genes) that have all zero counts are ignored entirely.- Specified by:
addCountData
in interfaceDataUpdater
- Parameters:
ee
- eetargetArrayDesign
- - this should be one of the "Generic" gene-based platforms. The data set will be switched to use it.countMatrix
- Representing 'raw' counts (added after rpkm, if provided).rpkmMatrix
- Representing per-gene normalized data, optional (RPKM or FPKM)allowMissingSamples
- if true, samples that are missing data will be deleted from the experiment.isPairedReads
- is paired readsreadLength
- read length
-
log2cpmFromCounts
@Transactional(propagation=NEVER) public void log2cpmFromCounts(ExpressionExperiment ee, QuantitationType qt)
RNA-seq: For back filling log2cpm when only counts are available. This wouldn't be used routinely, because new experiments get log2cpm computed when loaded.- Specified by:
log2cpmFromCounts
in interfaceDataUpdater
- Parameters:
ee
- eeqt
- qt
-
replaceData
@Transactional(propagation=NEVER) public void replaceData(ExpressionExperiment ee, ArrayDesign targetPlatform, QuantitationType qt, DoubleMatrix<String,String> data)
Replace the data associated with the experiment (or add it if there is none). These data become the 'preferred' quantitation type. Note that this replaces the "raw" data. Similar to AffyPowerToolsProbesetSummarize.convertDesignElementDataVectors and code in SimpleExpressionDataLoaderService. This method exists in addition to the other replaceData to allow more direct reading of data from files, allowing sample- and element-matching to happen here.- Specified by:
replaceData
in interfaceDataUpdater
- Parameters:
ee
- eetargetPlatform
- (this only works for a single-platform data set)qt
- qtdata
- data
-
reprocessAffyDataFromCel
@Transactional(propagation=NEVER) public void reprocessAffyDataFromCel(ExpressionExperiment ee)
Affymetrix only: Provide or replace data for an Affymetrix-based experiment, using CEL files. CEL files are downloaded from GEO, apt-probeset-summarize is executed to get the data, and then the experiment is updated. One side-effect is that the data set may end up being on a different platform than originally. A complication is the CEL file type may not match the platform we want the experiment to end up being one. A further complication is when this is re-run on a data set, or if the data set is on a merged platform. Therefore, some of the steps involve inspecting the CEL files to determine the chip type used so we can run apt-probset-summarize correctly; replacing the vectors. Exceptions will be thrown if CEL files can't be located, or the experiments is set up in a way we can't support.- Specified by:
reprocessAffyDataFromCel
in interfaceDataUpdater
- Parameters:
ee
- the experiment (already lightly thawed)
-
addData
@Transactional(propagation=NEVER) public void addData(ExpressionExperiment ee, ArrayDesign targetPlatform, ExpressionDataDoubleMatrix data)
Generic but in practice used for RNA-seq. Add an additional data (with associated quantitation type) to the selected experiment. Will do postprocessing if the data quantitationType is 'preferred', but if there is already a preferred quantitation type, an error will be thrown.- Specified by:
addData
in interfaceDataUpdater
- Parameters:
ee
- eetargetPlatform
- optional; if null, uses the platform already used (if there is just one; you can't use this for a multi-platform dataset)data
- to slot in
-
replaceData
@Transactional(propagation=NEVER) public void replaceData(ExpressionExperiment ee, ArrayDesign targetPlatform, ExpressionDataDoubleMatrix data)
Replace the data associated with the experiment (or add it if there is none). These data become the 'preferred' quantitation type. Note that this replaces the "raw" data. Similar to AffyPowerToolsProbesetSummarize.convertDesignElementDataVectors and code in SimpleExpressionDataLoaderService.- Specified by:
replaceData
in interfaceDataUpdater
- Parameters:
ee
- the experiment to be modifiedtargetPlatform
- the platform for the new data (this can only be used for single-platform data sets). The experiment will be switched to it if necessary.data
- the data to be used
-
-