Output files HATSEQ
This sections describes all the output files that can be generated by HATSEQ.
Once HATSEQ is finished analyzing your data, many folders and files will be created in the output directory. The number of output files and folders that are generated depends on the analysis that you have chosen to do. For example, if you choose to do pathway analysis, a folder called "PATHWAYS" will be generated. If you choose to do motif analysis, a folder called "MOTIFS" will be generated. This is also the case for the "CIRCOS", "FASTA", "FIG" and "UCSC track" folders. All the user defined settings are stored in a separate file named: "HATSEQ_settings.txt".
The save file "HATSEQ_PARAMETERS.mat" can be used if a user decides to re-analyze their data but with different settings in for example "pathway" analysis. This file is located in the root of the output-directory. By loading this file in HATSEQ, all recent settings are restored. HATSEQ uses the "temp_matlab" directory which contains temporarily files that are created during the analysis.
Main output file
The main output file is called: "HATSEQ_REGIONS.csv", it is located in the root of the output-directory and is a semicolon delimited file. Open the file in your favorite program and separate it on semicolon. A small example is as follows:
The CIRCOS directory is only created when the option is selected in HATSEQ. Please see The folder contains many different circos plots in .png format and in a vector based .svg format. note that Perl must be installed: Installation and configuration of Perl for Circos.
The DETECTED_REGIONS directory contains 4 additional files that can contain deeper results of the detected regions. All files are semicolon delimited: * HATSEQ_GENERAL.csv * HATSEQ_GENEMAPPING.csv * HATSEQ_REGIONS_CIS.csv * HATSEQ_REGIONS_UNIQUE.csv
HATSEQ_GENERAL.csv This file describes for each experiment the number of detected regions and the number of overlapping regions across different experiments. HATSEQ_GENEMAPPING.csv: Please see figure below.
HATSEQ_REGIONS_CIS.csv This file contains information about the commonly detected regions. Please see figure below.
HATSEQ_REGIONS_UNIQUE.csv The HATSEQ_REGIONS_UNIQUE.csv file is similar to that of "HATSEQ_REGIONS.csv". Except that each row is a unique regions across the experiments.
The "FASTA" directory contains only fasta files of the detected regions. We created fasta files for: * HATSEQ_CIS.fa The commonly detected regions. Fasta file contains the regions as indicated in: "HATSEQ_REGIONS_CIS.csv" * HATSEQ_total.fa All the detected regions. Fasta file contains the regions as indicated in: "HATSEQ_REGIONS.csv" * HATSEQ_unique.fa All uniquely detected regions. Fasta file contains the regions as indicated in: "HATSEQ_REGIONS_UNIQUE.csv" In addition we have fasta files that are created for the regions that are specified with a common background.
The FIG directory contains many different figures that are generated to support the results.
* dataDistribution_*.png This figure illustrates the distribution of the reads/probe intensity and the thresholds that are used to find candidate regions.
* Pvalue_vs_region.png This figure illustrates the "detected regions" on the x-axis versus the of the Log Pvalue on the y-axis.
* ROI_distance_TSS_2000.png This figure illustrates the frequency of detected regions with respect to the absolute distance to the transcription start site of the neighboring gene.
* VENN_DIAGRAM.png This figure illustrates the venn-diagram of the overlapping regions between the experiments. This is limited to the first three experiments that are selected by the user.
The "MOTIFS" directory contains semicolon delimited files. Each file depicts the (de-novo) motifs that are enriched for known transcription factor binding sites as described by TRANSFAC and JASPAR. * CIS_results.csv Motifs that are detected among the commonly detected regions. These motifs are determined by using the "HATSEQ_CIS.fa" file. * REGION_TOTAL_results.csv Motifs that are detected among all detected regions. These motifs are determined by using the "HATSEQ_total.fa" file. In addition we have files that describe the motifs which are created for the regions that are specified with a common background.
The "PATHWAYS" directory contains semicolon delimited files. Each file depicts the pathways that are enriched using gene sets from the molecular signature database (MsigDB). * MSIGDB_common_results.csv Pathways that are detected among the commonly detected regions. These motifs are determined by using the "HATSEQ_REGIONS_CIS.csv" file. * MSIGDB_total_results.csv Pathways that are detected among all detected regions. These pathways are determined by using the closest neighboring gene from the "HATSEQ_REGIONS.csv" file. In addition we have files that describe the pathways which are created for the regions that are specified with a common background.
UCSC track directory
The "UCSC track" directory contains the ".wig" files that can loaded into UCSC genome browser. To load these files, please go to: http://genome.ucsc.edu/index.html, press "genome browser", and then press "add custom tracks".