% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/fitsmm.R
\name{fitsmm}
\alias{fitsmm}
\title{Maximum Likelihood Estimation (MLE) of a semi-Markov chain}
\usage{
fitsmm(
  sequences,
  states,
  type.sojourn = c("fij", "fi", "fj", "f"),
  distr = "nonparametric",
  init.estim = "mle",
  cens.beg = FALSE,
  cens.end = FALSE
)
}
\arguments{
\item{sequences}{A list of vectors representing the sequences.}

\item{states}{Vector of state space (of length s).}

\item{type.sojourn}{Type of sojourn time (for more explanations, see Details).}

\item{distr}{By default \code{"nonparametric"} for the non-parametric estimation
case.

If the parametric estimation case is desired, \code{distr} should be:
\itemize{
\item A matrix of distributions of dimension \eqn{(s, s)} if \code{type.sojourn = "fij"};
\item A vector of distributions of length \eqn{s} if \code{type.sojourn = "fi"} or \code{"fj"};
\item A distribution if \code{type.sojourn = "f"}.
}

The distributions to be used in \code{distr} must be one of \code{"unif"}, \code{"geom"},
\code{"pois"}, \code{"dweibull"}, \code{"nbinom"}.}

\item{init.estim}{Optional. \code{init.estim} gives the method used to estimate
the initial distribution. The following methods are proposed:
\itemize{
\item \code{init.estim = "mle"}: the classical Maximum Likelihood Estimator
is used to estimate the initial distribution \code{init};
\item \code{init.estim = "limit"}: the initial distribution is replaced by
the limit (stationary) distribution of the semi-Markov chain;
\item \code{init.estim = "freq"}: the initial distribution is replaced by
the frequencies of each state in the sequences;
\item \code{init.estim = "unif"}: the initial probability of each state is
equal to \eqn{1 / s}, with \eqn{s} the number of states.
}}

\item{cens.beg}{Optional. A logical value indicating whether or not
sequences are censored at the beginning.}

\item{cens.end}{Optional. A logical value indicating whether or not
sequences are censored at the end.}
}
\value{
Returns an object of S3 class \code{smmfit} (inheriting from the S3
class \code{smm} and \link{smmnonparametric} class if \code{distr = "nonparametric"}
or \link{smmparametric} otherwise). The S3 class \code{smmfit} contains:
\itemize{
\item All the attributes of the S3 class \link{smmnonparametric} or
\link{smmparametric} depending on the type of estimation;
\item An attribute \code{M} which is an integer giving the total length of
the set of sequences \code{sequences} (sum of all the lengths of the list
\code{sequences});
\item An attribute \code{logLik} which is a numeric value giving the value
of the log-likelihood of the specified semi-Markov model based on the
\code{sequences};
\item An attribute \code{sequences} which is equal to the parameter
\code{sequences} of the function \code{fitsmm} (i.e. a list of sequences used to
estimate the Markov model).
}
}
\description{
Maximum Likelihood Estimation of a semi-Markov chain starting
from one or several sequences. This estimation can be parametric or
non-parametric, non-censored, censored at the beginning and/or at the end
of the sequence, with one or several trajectories. Several parametric
distributions are considered (Uniform, Geometric, Poisson, Discrete
Weibull and Negative Binomial).
}
\details{
This function estimates a semi-Markov model in parametric or
non-parametric case, taking into account the type of sojourn time and the
censoring described in references. The non-parametric estimation concerns
sojourn time distributions defined by the user. For the parametric
estimation, several discrete distributions are considered (see below).

The difference between the Markov model and the semi-Markov model concerns
the modeling of the sojourn time. With a Markov chain, the sojourn time
distribution is modeled by a Geometric distribution (in discrete time).
With a semi-Markov chain, the sojourn time can be any arbitrary distribution.
In this package, the available distribution for a semi-Markov model are :
\itemize{
\item Uniform: \eqn{f(x) = \frac{1}{n}} for \eqn{1 \le x \le n}. \eqn{n} is the parameter;
\item Geometric: \eqn{f(x) = \theta (1-\theta)^{x - 1}} for \eqn{x = 1, 2,\ldots,n}, \eqn{0 < \theta < 1}, \eqn{\theta} is the probability of success.
\eqn{\theta} is the parameter;
\item Poisson: \eqn{f(x) = \frac{\lambda^x exp(-\lambda)}{x!}} for \eqn{x = 0, 1, 2,\ldots,n}, with \eqn{\lambda > 0}.
\eqn{\lambda} is the parameter;
\item Discrete Weibull of type 1: \eqn{f(x)=q^{(x-1)^{\beta}}-q^{x^{\beta}}}, \eqn{x = 1, 2,\ldots,n},
with \eqn{0 < q < 1}, the first parameter and \eqn{\beta > 0} the second parameter.
\eqn{(q, \beta)} are the parameters;
\item Negative binomial: \eqn{f(x)=\frac{\Gamma(x+\alpha)}{\Gamma(\alpha) x!} p^{\alpha} (1 - p)^x},
for \eqn{x = 0, 1, 2,\ldots,n}, \eqn{\Gamma} is the Gamma function,
\eqn{\alpha} is the parameter of overdispersion and \eqn{p} is the
probability of success, \eqn{0 < p < 1}. \eqn{(\alpha, p)} are the parameters;
\item Non-parametric.
}

We define :
\itemize{
\item the semi-Markov kernel \eqn{q_{ij}(k) = P( J_{m+1} = j, T_{m+1} - T_{m} = k | J_{m} = i )};
\item the transition matrix \eqn{(p_{trans}(i,j))_{i,j} \in states} of the
embedded Markov chain \eqn{J = (J_m)_m}, \eqn{p_{trans}(i,j) = P( J_{m+1} = j | J_m = i )};
\item the initial distribution \eqn{\mu_i = P(J_1 = i) = P(Z_1 = i)}, \eqn{i \in 1, 2, \dots, s};
\item the conditional sojourn time distributions \eqn{(f_{ij}(k))_{i,j} \in states,\ k \in N ,\ f_{ij}(k) = P(T_{m+1} - T_m = k | J_m = i, J_{m+1} = j )},
\eqn{f} is specified by the argument \code{param} in the parametric case
and by \code{distr} in the non-parametric case.
}

The maximum likelihood estimation of the transition matrix of the embedded
Markov chain is \eqn{\widehat{p_{trans}}(i,j) = \frac{N_{ij}}{N_{i.}}}.

Five methods are proposed for the estimation of the initial distribution :
\describe{
\item{Maximum Likelihood Estimator: }{The Maximum Likelihood Estimator
for the initial distribution. The formula is:
\eqn{\widehat{\mu_i} = \frac{Nstart_i}{L}}, where \eqn{Nstart_i} is
the number of occurences of the word \eqn{i} (of length \eqn{k}) at
the beginning of each sequence and \eqn{L} is the number of sequences.
This estimator is reliable when the number of sequences \eqn{L} is high.}
\item{Limit (stationary) distribution: }{The limit (stationary)
distribution of the semi-Markov chain is used as a surrogate of the
initial distribution.}
\item{Frequencies of each state: }{The initial distribution is replaced
by taking the frequencies of each state in the sequences.}
\item{Uniform distribution: }{The initial probability of each state is
equal to \eqn{1 / s}, with \eqn{s}, the number of states.}
}

Note that \eqn{q_{ij}(k) = p_{trans}(i,j) \ f_{ij}(k)} in the general case
(depending on the present state and on the next state). For particular cases,
we replace \eqn{f_{ij}(k)} by \eqn{f_{i.}(k)} (depending on the present
state \eqn{i}), \eqn{f_{.j}(k)} (depending on the next state \eqn{j}) and
\eqn{f_{..}(k)} (depending neither on the present state nor on the next
state).

In this package we can choose different types of sojourn time.
Four options are available for the sojourn times:
\itemize{
\item depending on the present state and on the next state (\code{fij});
\item depending only on the present state (\code{fi});
\item depending only on the next state (\code{fj});
\item depending neither on the current, nor on the next state (\code{f}).
}

If  \code{type.sojourn = "fij"}, \code{distr} is a matrix of dimension \eqn{(s, s)}
(e.g., if the 1st element of the 2nd column is \code{"pois"}, that is to say we
go from the first state to the second state following a Poisson distribution).
If \code{type.sojourn = "fi"} or \code{"fj"}, \code{distr} must be a vector (e.g., if the
first element of vector is \code{"geom"}, that is to say we go from (or to) the
first state to (or from) any state according to a Geometric distribution).
If \code{type.sojourn = "f"}, \code{distr} must be one of \code{"unif"}, \code{"geom"}, \code{"pois"},
\code{"dweibull"}, \code{"nbinom"} (e.g., if \code{distr} is equal to \code{"nbinom"}, that is
to say that the sojourn time when going from one state to another state
follows a Negative Binomial distribution).
For the non-parametric case, \code{distr} is equal to \code{"nonparametric"} whatever
type of sojourn time given.

If the sequence is censored at the beginning and/or at the end, \code{cens.beg}
must be equal to \code{TRUE} and/or \code{cens.end} must be equal to \code{TRUE}.
All the sequences must be censored in the same way.
}
\examples{
states <- c("a", "c", "g", "t")
s <- length(states)

# Creation of the initial distribution
vect.init <- c(1 / 4, 1 / 4, 1 / 4, 1 / 4)

# Creation of the transition matrix
pij <- matrix(c(0, 0.2, 0.5, 0.3, 
                0.2, 0, 0.3, 0.5, 
                0.3, 0.5, 0, 0.2, 
                0.4, 0.2, 0.4, 0), 
              ncol = s, byrow = TRUE)

# Creation of the distribution matrix

distr.matrix <- matrix(c(NA, "pois", "geom", "nbinom", 
                         "geom", NA, "pois", "dweibull",
                         "pois", "pois", NA, "geom", 
                         "pois", "geom", "geom", NA), 
                       nrow = s, ncol = s, byrow = TRUE)

# Creation of an array containing the parameters
param1.matrix <- matrix(c(NA, 2, 0.4, 4, 
                          0.7, NA, 5, 0.6, 
                          2, 3, NA, 0.6, 
                          4, 0.3, 0.4, NA), 
                        nrow = s, ncol = s, byrow = TRUE)

param2.matrix <- matrix(c(NA, NA, NA, 0.6, 
                          NA, NA, NA, 0.8, 
                          NA, NA, NA, NA, 
                          NA, NA, NA, NA), 
                        nrow = s, ncol = s, byrow = TRUE)

param.array <- array(c(param1.matrix, param2.matrix), c(s, s, 2))

# Specify the semi-Markov model
semimarkov <- smmparametric(states = states, init = vect.init, ptrans = pij, 
                            type.sojourn = "fij", distr = distr.matrix, 
                            param = param.array)

seqs <- simulate(object = semimarkov, nsim = c(1000, 10000, 2000), seed = 100)

# Estimation of simulated sequences
est <- fitsmm(sequences = seqs, states = states, type.sojourn = "fij", 
              distr = distr.matrix)

}
\references{
V. S. Barbu, N. Limnios. (2008). Semi-Markov Chains and Hidden Semi-Markov
Models Toward Applications - Their Use in Reliability and DNA Analysis.
New York: Lecture Notes in Statistics, vol. 191, Springer.
}
\seealso{
\link{smmnonparametric}, \link{smmparametric}, \link{simulate.smm},
\link{simulate.smmfit}, \link{plot.smm}, \link{plot.smmfit}
}
