Output files HATSEQ

From Hematology
Jump to: navigation, search

This sections describes all the output files that can be generated by HATSEQ.

Contents

General

  Once HATSEQ is finished analyzing your data, many folders and files will be created in the output directory.
  The number of output files and folders that are generated depends on the analysis that you have chosen to do. 
  For example, if you choose to do pathway analysis, a folder called "PATHWAYS" will be generated. If you choose to do motif analysis, a folder called "MOTIFS" will be generated. This is also the case for the "CIRCOS", "FASTA", "FIG" and "UCSC track" folders.
  
  All the user defined settings are stored in a separate file named: "HATSEQ_settings.txt".
01 output files.png

Save file

  The save file "HATSEQ_PARAMETERS.mat" can be used if a user decides to re-analyze their data but with different settings in for example "pathway" 
  analysis. This file is located in the root of the output-directory. By loading this file in HATSEQ, all recent settings are restored. 
  HATSEQ uses the "temp_matlab" directory which contains temporarily files that are created during the analysis.
03 output files.png


Main output file

  The main output file is called: "HATSEQ_REGIONS.csv", it is located in the root of the output-directory and is a semicolon delimited file. 
  Open the file in your favorite program and separate it on semicolon.
  A small example is as follows:
02 output files.png

CIRCOS directory

  The CIRCOS directory is only created when the option is selected in HATSEQ. Please see
  The folder contains many different circos plots in .png format and in a vector based .svg format.
  note that Perl must be installed: Installation and configuration of Perl for Circos.
04 output files.png

DETECTED_REGIONS directory

  The DETECTED_REGIONS directory contains 4 additional files that can contain deeper results of the detected regions.
  All files are semicolon delimited:
  
  * HATSEQ_GENERAL.csv
  * HATSEQ_GENEMAPPING.csv
  * HATSEQ_REGIONS_CIS.csv
  * HATSEQ_REGIONS_UNIQUE.csv
  HATSEQ_GENERAL.csv 
  This file describes for each experiment the number of detected regions and the number of overlapping regions across different experiments.
  
  HATSEQ_GENEMAPPING.csv: Please see figure below.
Output files HATSEQ GENEMAPPING.jpg
  HATSEQ_REGIONS_CIS.csv
  This file contains information about the commonly detected regions. Please see figure below.
Output files HATSEQ REGIONS CIS.jpg
  HATSEQ_REGIONS_UNIQUE.csv
  The HATSEQ_REGIONS_UNIQUE.csv file is similar to that of "HATSEQ_REGIONS.csv". Except that each row is a unique regions across the experiments.

FASTA directory

  The "FASTA" directory contains only fasta files of the detected regions. 
  We created fasta files for: 
  
  * HATSEQ_CIS.fa
  The commonly detected regions. Fasta file contains the regions as indicated in: "HATSEQ_REGIONS_CIS.csv" 
  
  * HATSEQ_total.fa
   All the detected regions. Fasta file contains the regions as indicated in: "HATSEQ_REGIONS.csv"
  
  * HATSEQ_unique.fa
   All uniquely detected regions. Fasta file contains the regions as indicated in: "HATSEQ_REGIONS_UNIQUE.csv"
  
  In addition we have fasta files that are created for the regions that are specified with a common background.

FIG_directory

  The FIG directory contains many different figures that are generated to support the results.
  * dataDistribution_*.png
    This figure illustrates the distribution of the reads/probe intensity and the thresholds that are used to find candidate 
    regions.
  * Pvalue_vs_region.png
    This figure illustrates the "detected regions" on the x-axis versus the of the Log Pvalue on the y-axis. 
  * ROI_distance_TSS_2000.png
    This figure illustrates the frequency of detected regions with respect to the absolute distance to the transcription start site of the 
    neighboring gene.
  * VENN_DIAGRAM.png
    This figure illustrates the venn-diagram of the overlapping regions between the experiments. This is limited to the first three experiments that 
    are selected by the user.


MOTIFS directory

  The "MOTIFS" directory contains semicolon delimited files.
  Each file depicts the (de-novo) motifs that are enriched for known transcription factor binding sites as described by TRANSFAC and JASPAR.
  
  * CIS_results.csv
  Motifs that are detected among the commonly detected regions. These motifs are determined by using the "HATSEQ_CIS.fa" file.
  
  * REGION_TOTAL_results.csv
  Motifs that are detected among all detected regions. These motifs are determined by using the "HATSEQ_total.fa" file.
  
  In addition we have files that describe the motifs which are created for the regions that are specified with a common background.
Output files MOTIFS.jpg

PATHWAYS directory

  The "PATHWAYS" directory contains semicolon delimited files.
  Each file depicts the pathways that are enriched using gene sets from the molecular signature database (MsigDB).
  
  * MSIGDB_common_results.csv
  Pathways that are detected among the commonly detected regions. These motifs are determined by using the "HATSEQ_REGIONS_CIS.csv" file.
  
  * MSIGDB_total_results.csv
  Pathways that are detected among all detected regions. These pathways are determined by using the closest neighboring gene from the "HATSEQ_REGIONS.csv" file.
  
  In addition we have files that describe the pathways which are created for the regions that are specified with a common background.
Output files PATHWAYS.jpg

UCSC track directory

  The "UCSC track" directory contains the ".wig" files that can loaded into UCSC genome browser.
  To load these files, please go to: http://genome.ucsc.edu/index.html, press "genome browser", and then press "add custom tracks".
Personal tools
Namespaces
Variants
Actions
Navigation
Developed software
Groups
Toolbox