QC Home Page
The QC home page lists all workflows run by the user.The user can click on any of the previous runs to view results from that QC run.
- Google Chrome
- Mozilla Firefox
- Apple Safari
Start a QC run
- On the QC home page click Start QC button.
- Fill the form fields
Upload a ZIP file containing the submission files.
ZIP File Structure
All the files in the ZIP file should unique filenames. i.e. No two files (even ones in different directories) should have the same name.
- Choose the disorder for which data is being submitted.
- Specify one or more email addresses where a notification Email should be sent when QC process finishes.
- Click the Submit button.
File-names should not contain any spaces
Column names are case sensitive. i.e. ind_id is not the same as Ind_id
Columns in the files should match the order in which they are defined in their corresponding data dictionaries.
A submission to the QC system consists of a zip file including the following files. Files can be comma or tab separated. Comma separated files should have a .csv extension. Tab separated files should have a .txt extension.
Click on the links to view their formats.
|file name||required||description||required filename suffix||sample||template|
|Submission File||required||This file contains each individual in the submission with basic demographic, pedigree, and final diagnosis information.||FILENAME_sub|
|Investigator-defined main diagnosis File||required||This file is a dictionary defining the diagnoses used by the investigator in the "dx_study" column of the Submission file.||FILENAME_dx||sample_dx.csv||template_dx.csv|
|Alternate ID File||optional||The file contains alternate IDs for individuals, like internal ID, NDAR GUID, etc.||FILENAME_id||sample_id.csv||template_id.csv|
|optional||This file records all the different diagnostic codes that an individual may have. Typically, DSM codes are expected.||FILENAME_edx|
|Race/Ethnicity File||optional||This file records investigator-defined race/ethnicity for the individual, parents and grandparents.||FILENAME_re||sample_re.csv||template_re.csv|
|Investigator-defined Phenotypic Files||optional||Any number of these files may be submitted. These files include clinical assessments or any other type of investigator-defined phenotypic data files. The Center does not prescribe the content of these files. However, we require a data dictionary for each type of phenotypic data file to record the meaning of the data and to perform basic quality control on submissions.|
|sample_phen.csv||defined by user, see sample|
|Phenotypic Data Dictionary File||optional|
For every phenotypic file submitted, exactly one phenotypic data dictionary is required. *Note that the name preceding the '_phen_dd' suffix must match exactly between the phenotypic file and the phenotypic data dictionary.
Note: First column of every phenotypic file must be ind_id.
The QC system performs the following checks
QC system checks the submission file to see if the data in it matches the specification as described in Submission File.
QC system also performs certain Pedigree Checks on the submission file such as -
- Do records for an individual identified as a parent exist in the submission file.
- Individuals identified as Fathers are males.
- Individuals identified as Mothers are Females.
- Ensure an individual's age is less than the age of his/her parents.
- Ensure an individual is not the same as his/her father/mother.
- Every family has at least one individual who is a Proband. (*unless submission disorder is indicated as "Control")
Phenotypic Data Files
The QC system checks to see if the data contained in each phenotypic data file matches the information contained in the corresponding data dictionary file.
The QC system expects all phenotypic files submitted to have a first column as ind_id that matches the individual id ( ind_id column) in the submission file.
The QC system performs check such as:
- Ensures that all individuals mentioned in the phenotypic files have records in the submission file.
- Is data of the correct data type.
- Does data fall within the valid range of values.
The QC system checks that:
- Pairs of individual ids and cell line ids (Rutger ids) in data submission match the Center records
- DSM codes in extended diagnosis file (column code) are valid codes for the declared diagnostic system (dx_system).
The QC system will perform a variety of checks on the submitted files. If there are serious issues with the files, the QC submission will fail producing a detailed report of every issue the QC system encountered. The submitter should address these issues and resubmit. The report includes instructions on how to contact the NIMH RGR Data Curation team to help resolve any problems with the submission.
For some types of files the QC system suggests corrections to likely mistakes, such as typos. The Corrected File link is the original input file with the detected typos automatically corrected. If QC system did not make any corrections, the Corrected File is essentially the same as the input file.
Even though the QC system automatically generates a file incorporating the suggested values, the submission is still marked as a failure. The submitter is responsible to ensure that the automatically suggested values are indeed valid.
The submitter will have to rerun the submission through the QC system with a modified file containing the correct values.
The Corrections Log link lists all the corrections (typos corrected by the system), which the QC system made automatically.
The Log file link lists all errors that the QC system was unable to auto correct. The Log file lists the error message, column name, and line numbers of the invalid data item. These investigator must address these issues and resubmit.
The Log Link in the Pedigree Validation Column lists all errors related to the Pedigree checks.