=head1 NAME

iPE::Model::DurationDistribution - Base class for duration distribution models.

=head1 DESCRIPTION

This class should be derived in order to create any distribution type.  

=cut

package iPE::Model::DurationDistribution;
use iPE;
use iPE::Globals;
use base ("iPE::Model");
use strict;

=head1 CONSTANTS

=over 8 

=item FIXED (), ESTIMATED ()

One of these values is stored in the type member of the duration distribution object.  If it is fixed, it gathers the data from the template.

=back

=cut
sub FIXED     { 0 }
sub ESTIMATED { 1 }

=head1 FUNCTIONS

=over 8

=item getClassname(modelName)

This function gets the classname for a model type after checking if it exists.  If the class does not exist, it dies.  This is not an object method.

=cut
sub getClassname {
    my $model = shift;
    my $classname = __PACKAGE__."::$model";
    
    eval "require $classname";
    die __PACKAGE__.": The model $model does not exist in iPE.\n".
        "Error msg: $@\n" if ($@); 

    return $classname;
}


=item new(tag, att)

This creates a new duration distribution and initializes the member hash keys type, min, and max.  If your derived class needs further initialization, you may override the init method.  

=cut

sub new {
    my $class = shift;
    my $this = $class->SUPER::new(@_);
    my ($tag, $att, $data, $element) = @_;

    $this->{region_} = $att->{region};
    $this->{model_} = $att->{model};
    $this->{min_} = $att->{min};
    if(defined $att->{priorcounts}) { 
        $this->{priorCounts_} = $att->{priorcounts} 
    }
    else {
        $this->{priorCounts_} = 0;
    }
    if(defined $att->{max}) { $this->{max_} = $att->{max} }
    else                    { $this->{max_} = "L";        }
    if($tag =~ m/fixed/)    { $this->{type_} = FIXED      }
    else                    { $this->{type_} = ESTIMATED  }
    if(defined $att->{length_unit}) { 
            $this->{lengthUnit_} = $att->{length_unit}
    }
    else {  $this->{lengthUnit_} = 1 }
    $this->{fixedData_} = $data;
    $this->setParamString($data);

    #the density of this part of the distribution 
    $this->{density_} = 0;
    #the total number of samples in this piece of the distribution
    $this->{samples_} = $this->{priorCounts_};
    #the cumulative density up to the start point of the distribution
    $this->{cumDensity_} = 0;

    unless(defined($att->{normalizing})) { $att->{normalizing} = "normalize" }

    # set the normalizer of this instance to be the one requested in the input.
    # this will be called by the containing class when normalizing
    $this->{normalizer_} = $this->can($att->{normalizing}); 
    die "$this->{region_}: No such normalizing $att->{normalizing} for ".
        "duration type $att->{model}\n" unless defined($this->{normalizer_});

    $this->{initialProb_} = -1;
    $this->{finalProb_}   = -1;

    $this->handle_submodels($element);
    $this->init;

    return $this;
}

=item init ()

Called after the duration distribution is comepletely instantiated.

=cut
sub init { }

=item region (), model (), type (), priorCounts (), min (), max (), lengthUnit (), fixedData ()

These are the characteristics of the distribution as inputted on instantiation.  The region is the name of the region, the model is the type (which is duplicately defined by the class it is), the type is one of FIXED or ESTIMATED, defining whether we are concerned with counting this distribution, the min is the lowest value of a length that is of concern to this particular distribution, and the max is one of a number or "L" (referring to infinity), the length that is of concern to this distribution.  

The priorCounts is the number of counts to bias the entire distribution by (not the buckets in the distribution, as is done with pseudocounts).  This is added in on instantiation into the samples member.

The lengthUnit function returns the resolution of the length to be counted.  For example, all exons are generally counted in number of codons.  This makes smoothing the distribution make more sense, since the evolutionary model is dependent on the number of codons, and not the number of bases.

The fixedData function will return the fixed data of inputted on instantiation.  This is only relevant if the type is FIXED.

=cut
sub region      { shift->{region_}      }
sub model       { shift->{model_}       }
sub type        { shift->{type_}        }
sub priorCounts { shift->{priorCounts_} }
sub min         { scalar(@_) > 1 && ($_[0]->{min_} = $_[1]); $_[0]->{min_}; }
sub max         { scalar(@_) > 1 && ($_[0]->{max_} = $_[1]); $_[0]->{max_}; }
sub lengthUnit  { shift->{lengthUnit_}  }
sub fixedData   { shift->{fixedData_}   }

=item samples (), density (), cumDensity ()

The containing submodel of this distribution keeps track of the total number of samples that each distribution gets.  This is equal to the number of observations that occur within the length range of [min, max].  The density refers to the frequency of observations that occur in this length range, and thus the total density that this part of the probability density function should occupy.  This should be considered when normalizing your distribution, since the total density over all the distributions should add up to 1.  The cumDensity () is the cumulative density up to the beginning point of the distribution (excluding the distribution.

=item normalizer ()

This is the function which has been requested to be used (on instantiation) as the normalizing method.  Generally, there is a standard normalizer (i.e. "normalize") for cases with only one distribution, and another method for cases where more than one are used and this is the second part of the total distribution (e.g. "match_with_density").

=item initialProb (), finalProb ()

These functions define the probability of min-1 (from the previous distribution) and max (from the current distribution).  These should be used if you are matching a previous distribution to the current distribution.  initialProb () is set for you before the call to normalize occurs, and finalProb () should be set by you at the end of normalization.

Note that initialProb () may be -1, and in that case you should disregard it in normalization.

=cut

sub samples     { shift->{samples_}     }
sub density     { (defined($_[1]) && ($_[0]->{density_} = $_[1])); 
                  $_[0]->{density_} }
sub cumDensity  { shift->{cumDensity_}  }
sub normalizer  { shift->{normalizer_}  }
sub initialProb { (defined($_[1]) && ($_[0]->{initialProb_} = $_[1])); 
                  $_[0]->{initialProb_} }
sub finalProb   { (defined($_[1]) && ($_[0]->{finalProb_} = $_[1])); 
                  $_[0]->{finalProb_} }

sub incSamples      { 
    (defined($_[1]) && ($_[0]->{samples_} += $_[1])) || $_[0]->{samples_}++ 
}
sub setDensity        { $_[0]->{density_} = $_[1]    }
sub setCumDensity     { $_[0]->{cumDensity_} = $_[1] }

sub getZoeHeader {
    my ($this) = @_;

    my $max = $this->max;
    $max = -1 if $this->max eq "L";

    return ($this->model." ".$this->min." ".$max."\n");
}

sub outputZoe {
    my ($this, $out, $mode) = @_;

    $out->print($out->indent.$this->getZoeHeader);
    $out->increaseIndent;
    $out->printPCData($this->getParamString);
    $out->decreaseIndent;
}


# cache these values for speed.
our $negInf = undef;
our $scale = undef;

=item logScore (pos)

This function will return the log score of a ratio of frequencies.  This is intended as a utility function for all duration models.

=cut
sub logScore { 
    my ($this, $pr) = @_;
    my $score;

    my $g = new iPE::Globals();
    if(!defined ($negInf)) {
        $negInf = $g->options->durationNegInf;
        $scale = $g->options->scaleFactor;
    }

    if($pr) {
        $score = $scale*log($pr)/log(2);
    }
    else {
        $score = $negInf;
    }

    return $score;
}

=item countFeature (feature)

Here is where the specific model should count a feature for its size and add it to the mdoel.

=cut
sub countFeature { }

=item smooth ()

Smooth the distribution.

=cut
sub smooth  { }

=item normalize ()

Normalize the smoothed counts to probabilities.

=cut
sub normalize { }

=item score ()

Convert the normalized counts to scores.

=cut
sub score { }

=back

=head1 SEE ALSO

L<iPE::Model::Duration>, L<iPE::Model::DurationSubmodel>, L<iPE::Model>

=head1 AUTHOR

Bob Zimmermann (rpz@cse.wustl.edu)

=cut

1;
