Class BiomartEnsemblNcbiFetcher
- java.lang.Object
-
- ubic.gemma.core.loader.util.biomart.BiomartEnsemblNcbiFetcher
-
public class BiomartEnsemblNcbiFetcher extends Object
BioMart is a query-oriented data management system. In our particular case we are using it to map ensembl, ncbi and hgnc ids. To construct the query we pass the taxon and the attributes we wish to query for. Note the formatting of taxon for biomart consists of latin name without the point e.g. 'hsapiens'. For more information visit the biomart website. Note that Gemma now includes Ensembl ids imported for NCBI genes, using the gene2ensembl file provided by NCBI.- Author:
- ldonnison
-
-
Field Summary
Fields Modifier and Type Field Description static String
BIOMARTPATH
-
Constructor Summary
Constructors Constructor Description BiomartEnsemblNcbiFetcher()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description String[]
attributesToRetrieveFromBioMartForProteinQuery(String biomartTaxonName)
Method that based on the taxon supplied constructs an array of attributes that can be queried on.Map<Taxon,File>
fetch(Collection<Taxon> taxa)
Main method that iterates through each taxon supplied and calls the fetch method for each taxon.File
fetchFileForProteinQuery(String bioMartTaxonName)
Given a biomart taxon formatted name fetch the file from biomart and save as a local file.String
getBiomartTaxonName(Taxon gemmaTaxon)
Biomart taxon names are formatted as the scientific name all lowercase with the genus name shortened to one letter and appended to species name E.g.
-
-
-
Field Detail
-
BIOMARTPATH
public static final String BIOMARTPATH
- See Also:
- Constant Field Values
-
-
Method Detail
-
attributesToRetrieveFromBioMartForProteinQuery
public String[] attributesToRetrieveFromBioMartForProteinQuery(String biomartTaxonName)
Method that based on the taxon supplied constructs an array of attributes that can be queried on. For example if hsapiens is supplied then hgnc_id can be supplied as a query parameter.- Parameters:
biomartTaxonName
- Biomart formatted taxon name- Returns:
- An Array of strings representing the attributes that can be used to query biomart.
-
fetch
public Map<Taxon,File> fetch(Collection<Taxon> taxa) throws IOException
Main method that iterates through each taxon supplied and calls the fetch method for each taxon. Which returns a biomart file for each taxon supplied.- Parameters:
taxa
- Collection of taxa to retrieve biomart files for.- Returns:
- A map of biomart files as stored on local file system keyed on taxon.
- Throws:
IOException
- if there is a problem while manipulating the file
-
fetchFileForProteinQuery
public File fetchFileForProteinQuery(String bioMartTaxonName) throws IOException
Given a biomart taxon formatted name fetch the file from biomart and save as a local file.- Parameters:
bioMartTaxonName
- taxon name from biomart- Returns:
- biomart file
- Throws:
IOException
- when there is a problem while manipulating the file
-
getBiomartTaxonName
public String getBiomartTaxonName(Taxon gemmaTaxon)
Biomart taxon names are formatted as the scientific name all lowercase with the genus name shortened to one letter and appended to species name E.g. Homo sapiens > hsapiens- Parameters:
gemmaTaxon
- taxon object- Returns:
- Biomart taxon formatted name.
- Throws:
RuntimeException
- The taxon does not contain a valid scientific name.
-
-