MaizeSeq Dataset Description.
The data accessed through the MaizeSeq web-interface hosted by the Donald Danforth Plant Science Center represents a large collection of industry-generated maize (corn) transcribed sequences (EST singlets and assemblies, many representing full-length cDNAs). These data were generated by Dupont, Pioneer, Monsanto and Ceres, and are being made available to the general scientific community for use in not-for-profit research, subject to the terms and conditions of the User Agreement obtained through the National Corn Growers Association.
The top level data objects are consensus sequence representatives of the clustered corn cDNA sequences. While this repository contains all data provided by the industry collaborators, and the MaizeSeq website provides a single point of access to the complete industry dataset, the consensus sequences have NOT been created by merging the contributed data together. Rather, Monsanto/Ceres and DuPont/Pioneer each have independently constructed consensus sequences with their respective maize EST data collections, and these have not been further merged. The contributing source is clearly indicated in the definition line of each consensus (and EST component) sequence.
The best predicted protein translation of each DNA consensus sequence has been provided.
Each consensus sequence has been annotated with BLAST (BLASTX vs. nr.aa) annotations and/or Pfam (using the predicted protein sequences). For the BLASTX annotations, best hits (or preferred hits) better than 1e-10 are reported; for the Pfam annotations best matches better than 1e-5 are reported.
Associated EST component sequence, the component quality files (phred scores) and library information (genotype and tissue source) have been provided, where available.