@Component public class ArrayDesignSequenceProcessingServiceImpl extends Object implements ArrayDesignSequenceProcessingService
Modifier and Type | Field and Description |
---|---|
static String |
DUPLICATE_PROBE_NAME_MUNGE_SEPARATOR
When we encounter two probes with the same name, we add this string along with a unique identifier to the end of
the name.
|
Constructor and Description |
---|
ArrayDesignSequenceProcessingServiceImpl(ArrayDesignReportService arrayDesignReportService,
ArrayDesignService arrayDesignService,
BioSequenceService bioSequenceService,
ExternalDatabaseService externalDatabaseService,
Persister persisterHelper) |
Modifier and Type | Method and Description |
---|---|
void |
assignSequencesToDesignElements(Collection<CompositeSequence> designElements,
Collection<BioSequence> sequences)
Associate sequences with an array design.
|
void |
assignSequencesToDesignElements(Collection<CompositeSequence> designElements,
File fastaFile)
Associate sequences with an array design.
|
void |
assignSequencesToDesignElements(Collection<CompositeSequence> designElements,
InputStream fastaFile)
Associate sequences with an array design.
|
Collection<BioSequence> |
processAffymetrixDesign(ArrayDesign arrayDesign,
InputStream probeSequenceFile,
Taxon taxon)
Use this to add sequences to an existing Affymetrix design.
|
Collection<BioSequence> |
processArrayDesign(ArrayDesign arrayDesign,
InputStream sequenceFile,
InputStream sequenceIdentifierFile,
SequenceType sequenceType,
Taxon taxon)
Read from FASTA file when the sequence file lacks any way to link the sequences back to the probes.
|
Collection<BioSequence> |
processArrayDesign(ArrayDesign arrayDesign,
InputStream sequenceFile,
SequenceType sequenceType)
The sequence file must provide an unambiguous way to associate the sequences with design elements on the
array.
|
Collection<BioSequence> |
processArrayDesign(ArrayDesign arrayDesign,
InputStream sequenceFile,
SequenceType sequenceType,
Taxon taxon)
The sequence file must provide an unambiguous way to associate the sequences with design elements on the
array.
|
Collection<BioSequence> |
processArrayDesign(ArrayDesign arrayDesign,
InputStream sequenceIdentifierFile,
String[] databaseNames,
String blastDbHome,
Taxon taxon,
boolean force)
Intended for use with array designs that use sequences that are in genbank, but the accessions need to be
assigned after the array is already in the system.
|
Collection<BioSequence> |
processArrayDesign(ArrayDesign arrayDesign,
InputStream sequenceIdentifierFile,
String[] databaseNames,
String blastDbHome,
Taxon taxon,
boolean force,
FastaCmd fc) |
Collection<BioSequence> |
processArrayDesign(ArrayDesign arrayDesign,
String[] databaseNames,
boolean force) |
Collection<BioSequence> |
processArrayDesign(ArrayDesign arrayDesign,
String[] databaseNames,
String blastDbHome,
boolean force)
For the case where the sequences are retrieved simply by the Genbank accession.
|
Collection<BioSequence> |
processArrayDesign(ArrayDesign arrayDesign,
String[] databaseNames,
String blastDbHome,
boolean force,
FastaCmd fc)
Provided primarily for testing.
|
BioSequence |
processSingleAccession(String sequenceId,
String[] databaseNames,
String blastDbHome,
boolean force)
Update a single sequence in the system.
|
Taxon |
validateTaxon(Taxon taxon,
ArrayDesign arrayDesign)
If taxon is null then it has not been provided on the command line, then deduce the taxon from the arrayDesign.
|
public static final String DUPLICATE_PROBE_NAME_MUNGE_SEPARATOR
@Autowired public ArrayDesignSequenceProcessingServiceImpl(ArrayDesignReportService arrayDesignReportService, ArrayDesignService arrayDesignService, BioSequenceService bioSequenceService, ExternalDatabaseService externalDatabaseService, Persister persisterHelper)
public void assignSequencesToDesignElements(Collection<CompositeSequence> designElements, Collection<BioSequence> sequences)
ArrayDesignSequenceProcessingService
assignSequencesToDesignElements
in interface ArrayDesignSequenceProcessingService
designElements
- design elementspublic void assignSequencesToDesignElements(Collection<CompositeSequence> designElements, File fastaFile) throws IOException
ArrayDesignSequenceProcessingService
assignSequencesToDesignElements
in interface ArrayDesignSequenceProcessingService
designElements
- design elementsfastaFile
- fasta fileIOException
- when IO problems occur.public void assignSequencesToDesignElements(Collection<CompositeSequence> designElements, InputStream fastaFile) throws IOException
assignSequencesToDesignElements
in interface ArrayDesignSequenceProcessingService
IOException
public Collection<BioSequence> processAffymetrixDesign(ArrayDesign arrayDesign, InputStream probeSequenceFile, Taxon taxon) throws IOException
ArrayDesignSequenceProcessingService
processAffymetrixDesign
in interface ArrayDesignSequenceProcessingService
arrayDesign
- An existing ArrayDesign that already has compositeSequences filled in.probeSequenceFile
- InputStream from a tab-delimited probe sequence file.taxon
- validated taxonIOException
- when IO problems occur.public Collection<BioSequence> processArrayDesign(ArrayDesign arrayDesign, InputStream sequenceFile, SequenceType sequenceType) throws IOException
ArrayDesignSequenceProcessingService
processArrayDesign
in interface ArrayDesignSequenceProcessingService
arrayDesign
- platformsequenceFile
- FASTA formatsequenceType
- - e.g., SequenceType.DNA (generic), SequenceType.AFFY_PROBE, or SequenceType.OLIGO.IOException
- when IO problems occur.FastaParser
public Collection<BioSequence> processArrayDesign(ArrayDesign arrayDesign, InputStream sequenceFile, SequenceType sequenceType, Taxon taxon) throws IOException
ArrayDesignSequenceProcessingService
processArrayDesign
in interface ArrayDesignSequenceProcessingService
arrayDesign
- platformsequenceFile
- FASTA, Affymetrix or tabbed format (depending on the type)sequenceType
- - e.g., SequenceType.DNA (generic), SequenceType.AFFY_PROBE, or SequenceType.OLIGO.taxon
- - if null, attempt to determine it from the array design.IOException
- when IO problems occur.FastaParser
public Collection<BioSequence> processArrayDesign(ArrayDesign arrayDesign, InputStream sequenceFile, InputStream sequenceIdentifierFile, SequenceType sequenceType, Taxon taxon) throws IOException
ArrayDesignSequenceProcessingService
processArrayDesign
in interface ArrayDesignSequenceProcessingService
arrayDesign
- platformsequenceFile
- FASTAsequenceIdentifierFile
- two columns of probe ids and sequence IDs (the same ones in the sequenceFile)taxon
- - if null, attempt to determine it from the array designIOException
public Collection<BioSequence> processArrayDesign(ArrayDesign arrayDesign, InputStream sequenceIdentifierFile, String[] databaseNames, String blastDbHome, Taxon taxon, boolean force) throws IOException
ArrayDesignSequenceProcessingService
processArrayDesign
in interface ArrayDesignSequenceProcessingService
arrayDesign
- plaftormsequenceIdentifierFile
- Sequence file has two columns: column 1 is a probe id, column 2 is a genbank
accession or sequence name, delimited by tab. Sequences will be fetched from BLAST databases if possible;
ones missing will be sought directly in Gemma.databaseNames
- database namesblastDbHome
- blast db hometaxon
- taxonforce
- If true, if an existing BioSequence that matches is found in the system, any existing sequence
information in the BioSequence will be overwritten.IOException
- when IO problems occur.public Collection<BioSequence> processArrayDesign(ArrayDesign arrayDesign, InputStream sequenceIdentifierFile, String[] databaseNames, String blastDbHome, Taxon taxon, boolean force, FastaCmd fc) throws IOException
processArrayDesign
in interface ArrayDesignSequenceProcessingService
IOException
public Collection<BioSequence> processArrayDesign(ArrayDesign arrayDesign, String[] databaseNames, boolean force)
processArrayDesign
in interface ArrayDesignSequenceProcessingService
public Collection<BioSequence> processArrayDesign(ArrayDesign arrayDesign, String[] databaseNames, String blastDbHome, boolean force)
ArrayDesignSequenceProcessingService
processArrayDesign
in interface ArrayDesignSequenceProcessingService
arrayDesign
- platformdatabaseNames
- the names of the BLAST-formatted databases to search (e.g., nt, est_mouse)blastDbHome
- where to find the blast databases for sequence retrievalforce
- If true, then when an existing BioSequence contains a non-empty sequence value, it will be
overwritten with a new one.public Collection<BioSequence> processArrayDesign(ArrayDesign arrayDesign, String[] databaseNames, String blastDbHome, boolean force, FastaCmd fc)
ArrayDesignSequenceProcessingService
processArrayDesign
in interface ArrayDesignSequenceProcessingService
arrayDesign
- platformdatabaseNames
- the names of the BLAST-formatted databases to search (e.g., nt, est_mouse)blastDbHome
- where to find the blast databases for sequence retrievalforce
- If true, then when an existing BioSequence contains a non-empty sequence value, it will be
overwritten with a new one.fc
- fasta commandpublic BioSequence processSingleAccession(String sequenceId, String[] databaseNames, String blastDbHome, boolean force)
processSingleAccession
in interface ArrayDesignSequenceProcessingService
force
- If true, if an existing BioSequence that matches if found in the system, any existing sequence
information in the BioSequence will be overwritten.sequenceId
- sequence iddatabaseNames
- database namesblastDbHome
- blast db homepublic Taxon validateTaxon(Taxon taxon, ArrayDesign arrayDesign) throws IllegalArgumentException
validateTaxon
in interface ArrayDesignSequenceProcessingService
taxon
- Taxon as passed in on the command linearrayDesign
- Array design to processIllegalArgumentException
- Thrown when there is not exactly 1 taxon.Copyright © 2005–2023 Pavlidis lab, Michael Smith Laboratories and Department of Psychiatry, University of British Columbia. All rights reserved.