Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Last revision Both sides next revision
courses:bin:tutorials:tutorial12 [2024/02/09 10:17]
127.0.0.1 external edit
courses:bin:tutorials:tutorial12 [2024/03/18 16:24]
klema [A simple problem]
Line 18: Line 18:
   - What is the motif we are searching for? Are there any other plausible motifs in the sequence set? /* ACT, the other one could be CTG but it has much worse parameters (CTG, CTC and CGG) matches */    ​   - What is the motif we are searching for? Are there any other plausible motifs in the sequence set? /* ACT, the other one could be CTG but it has much worse parameters (CTG, CTC and CGG) matches */    ​
   - How does its information content logo look? What is its average bit-score? /* ACT with 2 bits in every position (log2(4)+4*log2(1)=2+0=2),​ the average bit-score is thus 2; CTG has the average bit-score of 1.04, the first position has 2 bits, the other two positions have 0.55,    (log2(4)+0.66*log2(0.66)+2*0.33*log2(0.33)=2-0.39-2*0.53=0.55),​ a random motif with uniform probability of all the symbols in all the positions has the average bit-score of 0 */    ​   - How does its information content logo look? What is its average bit-score? /* ACT with 2 bits in every position (log2(4)+4*log2(1)=2+0=2),​ the average bit-score is thus 2; CTG has the average bit-score of 1.04, the first position has 2 bits, the other two positions have 0.55,    (log2(4)+0.66*log2(0.66)+2*0.33*log2(0.33)=2-0.39-2*0.53=0.55),​ a random motif with uniform probability of all the symbols in all the positions has the average bit-score of 0 */    ​
-  - Could occurrence of such a motif in this sequence set be random? What is its E-value? /* the probability that we find a trigram in all the three sequences, difficult to compute, but obviously small; in the first sequence we draw 4 trigrams out of 64 that exist (with probability of 1), in the second one we have to hit one of them with 4 trials (1-(60/​64)^4=0.23),​ in the third sequence we have to hit the previous hit again with 4 trials (if we assume only one previous hit, it is 1-(63/​64)^4=0.061),​ the outcome is 0.0139, however the problem is resampling (some trigrams could be drawn repeatedly and thus we generate fewer distinct trigrams) and dependence in trigrams; see experimental derivation in motif_eval_calc.R,​ the result was 0.014, a similar figure to the previous approximate estimate, MEME reports E-value of 0.25 */+  - Could occurrence of such a motif in this sequence set be random? What is its E-value? /* the probability that we find a trigram in all the three sequences, difficult to compute, but obviously small; in the first sequence we draw 4 trigrams out of 64 that exist (with probability of 1), in the second one we have to hit one of them with 4 trials (1-(60/​64)^4=0.23),​ in the third sequence we have to hit the previous hit again with 4 trials (if we assume only one previous hit, it is 1-(63/​64)^4=0.061),​ the outcome is 0.0139, however the problem is resampling (some trigrams could be drawn repeatedly and thus we generate fewer distinct trigrams) and dependence in trigrams; see experimental derivation in motif_eval_calc.R,​ the result was 0.014, a similar figure to the previous approximate estimate, MEME reports E-value of 0.25; how would it be with bigrams? (1-(11/​16)^5)(1-(15/​16)^5)= 0.23*/
  
  
courses/bin/tutorials/tutorial12.txt · Last modified: 2024/03/18 16:28 by klema