Page tree
Skip to end of metadata
Go to start of metadata

QC Home Page

CGSMD Automated QC < >

The QC home page lists all workflows run by the user.The user can click on any of the previous runs to view results from that QC run.

Supported Browsers

  1. Google Chrome
  2. Mozilla Firefox
  3. Apple Safari

Start a QC run

  1. On the QC home page click Start QC button.
  2. Fill the form fields
    1. Upload a ZIP file containing the submission files. 

      ZIP File Structure

      All the files in the ZIP file should unique filenames. i.e. No two files (even ones in different directories) should have the same name. 

    2. Choose the disorder for which data is being submitted.
    3. Specify one or more email addresses where a notification Email should be sent when QC process finishes.
  3. Click the Submit button.

Input Files

File-names should not contain any spaces

Column names are case sensitive. i.e. ind_id is not the same as Ind_id

Columns in the files should match the order in which they are defined in their corresponding data dictionaries.

A submission to the QC system consists of a zip file including the following files. Files can be comma or tab separated. Comma separated files should have a .csv extension. Tab separated files should have a .txt extension.

Click on the links to view their formats.

file namerequireddescriptionrequired filename suffixsampletemplate
Submission FilerequiredThis file contains each individual in the submission with basic demographic, pedigree, and final diagnosis information.FILENAME_sub 






Investigator-defined main diagnosis FilerequiredThis file is a dictionary defining the diagnoses used by the investigator in the "dx_study" column of the Submission file.FILENAME_dxsample_dx.csvtemplate_dx.csv
Alternate ID Fileoptional

This file contains alternate IDs for individuals, like internal ID, NDAR GUID, etc.

Note: This file is required for studies also submitting to NDA and dbGaP.

Extended Diagnosis FileoptionalThis file records all the different diagnostic codes that an individual may have. Typically, DSM codes are expected.FILENAME_edx





Race/Ethnicity FileoptionalThis file records investigator-defined race/ethnicity for the individual, parents and grandparents.FILENAME_resample_re.csvtemplate_re.csv
Investigator-defined Phenotypic Files optionalAny number of these files may be submitted. These files include clinical assessments or any other type of investigator-defined phenotypic data files. The Center does not prescribe the content of these files. However, we require a data dictionary for each type of phenotypic data file to record the meaning of the data and to perform basic quality control on submissions.



sample_phen.csvdefined by user, see sample
Phenotypic Data Dictionary Fileoptional

For every phenotypic file submitted, exactly one phenotypic data dictionary is required. The name preceding the '_phen_dd' suffix must match exactly between the phenotypic file and the phenotypic data dictionary.

Note: First column of every phenotypic file must be ind_id.



Other FilesoptionalIn addition to the above files, you may include additional information like study acknowledgements, etc. using PDF (.pdf), OpenDocument Text (.odt), or Text (.txt) files in your submission.Files must have .pdf, or .odt, or .txt extension.

Validation Checks

The QC system performs the following checks

Submission File

QC system checks the submission file to see if the data in it matches the specification as described in Submission File.

QC system also performs certain Pedigree Checks on the submission file such as -

  1. Do records for an individual identified as a parent exist in the submission file.
  2. Individuals identified as Fathers are males.
  3. Individuals identified as Mothers are Females.
  4. Ensure an individual's age is less than the age of his/her parents.
  5. Ensure an individual is not the same as his/her father/mother.
  6. Every family has at least one individual who is a Proband. (*unless submission disorder is indicated as "Control")

Phenotypic Data Files  

The QC system checks to see if the data contained in each phenotypic data file matches the information contained in the corresponding data dictionary file.


The QC system expects all phenotypic files submitted to have a first column as ind_id that matches the individual id ( ind_id column) in the submission file.

The QC system performs check such as:

  1. Ensures that all individuals mentioned in the phenotypic files have records in the submission file.
  2. Is data of the correct data type.
  3. Does data fall within the valid range of values.

Content Checks

The QC system checks that:

  1. Pairs of individual ids and cell line ids (Rutger ids) in data submission match the Center records
  2. DSM codes in extended diagnosis file (column code) are valid codes for the declared diagnostic system (dx_system).

Output Files

The QC system will perform a variety of checks on the submitted files. If there are serious issues with the files, the QC submission will fail producing a detailed report of every issue the QC system encountered. The submitter should address these issues and resubmit. The report includes instructions on how to contact the NIMH RGR Data Curation team to help resolve any problems with the submission.

For some types of files the QC system suggests corrections to likely mistakes, such as typos. The Corrected File link is the original input file with the detected typos automatically corrected. If QC system did not make any corrections, the Corrected File is essentially the same as the input file.


Even though the QC system automatically generates a file incorporating the suggested values, the submission is still marked as a failure. The submitter is responsible to ensure that the automatically suggested values are indeed valid.

The submitter will have to rerun the submission through the QC system with a modified file containing the correct values.

The Corrections Log link lists all the corrections (typos corrected by the system), which the QC system made automatically.

The Log file link lists all errors that the QC system was unable to auto correct. The Log file lists the error message, column name, and line numbers of the invalid data item. These investigator must address these issues and resubmit.

The Log Link in the Pedigree Validation Column lists all errors related to the Pedigree checks.


  • No labels