Here I will show some of the most frequently asked questions As an example I use the data set that is available in the download section.
What kind of data is does HATSEQ require?
1. Normalize your data using your favorite program.
For ChIP-on-chip, the input-file must be tab-delimited and in the following format: CHROM POS VALUE chr1 115365024 -0.023634 chr1 115365060 -0.663943 chr1 115365096 -0.784316 chr1 115365132 1.026958 chr14 97692830 0.753593 chr14 97692866 1.263455 chr14 97693010 -0.884203 etc..
2. For ChIP-Seq, the input-file must be a .BAM file or .PILUP file.
3. Select your files and your background-files and press <OK>
4. If your files are tab-delimited as shown above, your data is automatically seen as ChIP-on-chip data. If your files have the format for PILUP files (see the download section for examples) or is a BAM file, your data seen as ChIP-Seq data.
5. If you don't have any other desires, press "Save and Analyze" in the main screen to detect candidate regions.
Can I analyze experiments that are normalized with different genome builds?
Yes, you are allowed to select files that are normalized with different genome builds, such as experiment 1 with hg18 and experiment 2 with hg19. Obviously it is not recommended to search for common-regions or to do any "additional analysis" because of the differences in genome builds.
What are the dé-novo motifs in my data?
Load your data as described in step 1.
Click on "Additional analysis" in the main screen.
Toggle "FASTA files" to "Yes", and select the species of interest. If you don't have your species stored on the hard-drive, click on "UCSC". This will automatically download all the necessarily files from UCSC and stores it in the installation path of HATSEQ.
All sequences for the candidate regions will be stored in so called FASTA-files. In addition, it is possible to gather the upstream and downstream sequences of the flanking genes for further analysis. Set both values on 0 if you are not interested.
Toggle "MOTIF analysis" to "Yes" in the "Additional analysis" screen. The motif-analysis screen will open.
To detect dé-novo motifs, you must specify the expected length in "Expected motif length". I known for example that the STAT has a motif of length 9. Therefore I specified length, "8,9,10". Note that the motif results of length 8 can be very similar to the motifs of length 9 and also likely to the motifs of length 10.
If you use a ChIP-Seq experiment and you have directed reads (strand based), you can specify to only analyze the read where the peak has been found. For ChIP-on-chip data, and in most cases for ChIP-Seq, it is not known whether the candidate regions belongs to the positive or negative strand. It is therefore recommended to set "Analyze both strands" on "Yes". Both the sequence and complimentary sequence will be analyzed.
Can the motifs be annotated with known transcription factors?
Yes. Load your data as described in step 1. Continue as described in step 2. All detected motifs are automatically annotated with known TFBS using the publicly free databases of JASPAR and TRANSFAC.
Can I specify a motif-sequence to detect in the candidate regions and select the best scoring ones?
Yes. Load your data as described in step 1. Continue as described in step 2. If you specify your sequence in "Sequence-of-interest", it will score each candidate region. In the output-file, you can see the score and the best matched sequence.
Can I change the parameters after analyzing the data and re-run it again?
Yes. If you did make a run but realized that other settings may be more optimal, you can click on "Load recent settings" in the main-screen, browse to your save-directory and open the "HATSEQ_PARAMETERS.mat" file.
After loading the file, you are not allowed to make any changes to the input-files but you can make changes in the "Advanced options" or "Additional analysis".
If you now decide to store the data in the same directory as your previous results, you need to answer the question: "Do you want to re-analyze all the files for regions-of-interest?"
If you press "Yes", the peak-detection will be skipped and previous results are loaded. Note that: You should press "Yes" if you did make changes in the "Advanced options" that will affect the candidate regions (such as for pvalue or read-depth etc etc). If you only made changes in the "Additional analysis", you can however press "No".
Can I detect the overlap between my ChIP-on-chip and ChIP-Seq data ?
Yes. Load your data as described in step 1. Include both your ChIP-on-chip and ChIP-Seq data and press <Ok> Set other parameters as you wish and run your analysis.
I don't have a background set for my ChIP-Seq data, can I use the background files that I have for my ChIP-on-chip?
No. Because of differences in the used technologies. For ChIP-on-chip you measure probe-intensity values whereas for ChIP-seq sequence reads. It is however possible to select both files and set them as an "Experiment". Both files are now independently from each other analyzed and the common-candidate-regions detected. You can think of using these common-regions in the proceeding step. It is however recommended to always have a proper background set of the same data type. In addition, A ChIP-on-chip experiment measures only a fraction of the data compared to ChIP-Seq. Such type of an approach is therefore not recommended.
I did not find any candidate regions, how do I continue?
If the analysis result in the absence of significantly enriched candidate region it indicates that probe-intensity values or sequence-reads, by the hybridization of DNA-fragments on chip, showed no significant differences compared to the background data-file. In case the hybridization process on chip is successfully performed (i.e. DNA-fragments are immunoprecipitated) and the background data-file is correctly provided into the model, it still may result in the absence of significantly enriched candidate region. Note that analysing experimental data-files without the usage or incorrect usage of a background data-file can lead to the absence of significantly enriched candidate region or the detection of false-positive candidate region. If no significantly enriched candidate region are detected, it should be considered that no DNA-binding did take place and therefore no candidate region were detected. Alternatively, one could decide to increase the significance level alpha and re-run the analysis. Note that the false-positive rate increases by using alpha>0.05. It is therefore highly recommended to validate the candidate region by qPCR.