% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/rem_fourcyles.R
\name{computeFourCycles}
\alias{computeFourCycles}
\title{Compute the Four-Cycles Network Statistic for Event Dyads in a Relational Event Sequence}
\usage{
computeFourCycles(
  observed_time,
  observed_sender,
  observed_receiver,
  processed_time,
  processed_sender,
  processed_receiver,
  sliding_windows = FALSE,
  processed_seqIDs = NULL,
  counts = FALSE,
  halflife = 2,
  dyadic_weight = 0,
  window_size = NA,
  Lerneretal_2013 = FALSE,
  priorStats = FALSE,
  sender_OutDeg = NULL,
  receiver_InDeg = NULL
)
}
\arguments{
\item{observed_time}{The vector of event times from the pre-processing event sequence.}

\item{observed_sender}{The vector of event senders from the pre-processing event sequence.}

\item{observed_receiver}{The vector of event receivers from the pre-processing event sequence}

\item{processed_time}{The vector of event times from the post-processing event sequence (i.e., the event sequence that contains the observed and null events).}

\item{processed_sender}{The vector of event senders from the post-processing event sequence (i.e., the event sequence that contains the observed and null events).}

\item{processed_receiver}{The vector of event receivers from the post-processing event sequence (i.e., the event sequence that contains the observed and null events).}

\item{sliding_windows}{TRUE/FALSE. TRUE indicates that the sliding windows computational approach will
be used to compute the network statistic, while FALSE indicates the ap- proach will not be used. Set
to FALSE by default. It’s important to note that the sliding windows framework should only be used
when the pre-processed event sequence is ‘big’, such as the 360 million pre-processed event sequence
used in Lerner and Lomi (2020), as it aims to reduce the computational burden of sorting ‘big’ datasets. In general,
most pre-processed event sequences will not need to use the sliding windows
approach. There is not a strict cutoff for ‘big’ dataset. This definition depends on both the
size of the observed event sequence and the post-processing sampling dataset. For instance,
according to our internal tests, when the event sequence is relatively large (i.e., 100,000
observed events) with probability of sampling from the observed event sequence set to 0.05
and using 10 controls per sampled event, the sliding windows framework for computing repetition
is about 11\% faster than the non-sliding windows framework. Yet, in a smaller dataset
(i.e., 10,000 observed events) the sliding windows framework is about 25\% slower than the
non-sliding framework with the same conditions as before.}

\item{processed_seqIDs}{If sliding_windows is set to TRUE, the vector of event sequence IDs from the post-processing event sequence. The event sequence IDs represents the index for when the event occurred in the observed event sequence (e.g., the 5th event in the sequence will have a value of 5 in this vector).}

\item{counts}{TRUE/FALSE. TRUE indicates that the counts of past events should be computed (see the details section). FALSE indicates that the temporal exponential weighting function should be used to downweigh past events (see the details section). Set to FALSE by default.}

\item{halflife}{A numerical value that is the halflife value to be used in the exponential weighting function (see the details section). Preset to 2 (should be updated by user).}

\item{dyadic_weight}{A numerical value that is the dyadic cutoff weight that represents the numerical cutoff value for temporal relevancy based on the exponential weighting function. For example, a numerical value of 0.01, indicates that an exponential weight less than 0.01 will become 0 and will not be included in the sum of the past event weights (see the details section). Set to 0 by default.}

\item{window_size}{If sliding_windows is set to TRUE, the sizes of the windows that are used for the sliding windows computational framework. If NA, the function internally divides the dataset into ten slices (may not be optimal).}

\item{Lerneretal_2013}{TRUE/FALSE. TRUE indicates that the Lerner et al. (2013) exponential weighting function will be used (see the details section). FALSE indicates that the Lerner and Lomi (2020) exponential weighting function will be used (see the details section). Set to FALSE by default.}

\item{priorStats}{TRUE/FALSE. Set to FALSE by default. TRUE indicates that the user has previously computed the sender outdegree and target indegree network statistics. Set to FALSE by default. The four-cycles network statistics is computationally burdensome. If priorStats =TRUE, the function speeds things up by setting the statistic for an event dyad to 0 if either a) the current event sender was not a sender in a previous event or b) the current event receiver was not a receiver in a past event, then the four-cycles statistics for that event dyad will be 0.}

\item{sender_OutDeg}{If priorStats = TRUE, the vector of previously computed sender outdegree scores.}

\item{receiver_InDeg}{If priorStats = TRUE, the vector of previously computed receiver indegree scores.}
}
\value{
The vector of four-cycle statistics for the two-mode relational event sequence.
}
\description{
The function computes the four-cycles network sufficient statistic for a two-mode relational
sequence with the exponential weighting function (Lerner and Lomi 2020). In essence, the
four-cycles measure captures the tendency for clustering to occur in the network of past
events, whereby an event is more likely to occur between a sender node \emph{a} and receiver
node \emph{b} given that \emph{a} has interacted with other receivers in past events who have
received events from other senders that interacted with \emph{b} (e.g., Duxbury and Haynie 2021, Lerner and Lomi 2020). The function
allows users to use two different weighting functions, reduce computational runtime, employ a
sliding windows framework for large relational sequences, and specify a dyadic cutoff for relational relevancy.
}
\details{
The function calculates the four-cycles network statistic for two-mode relational event models
based on the exponential weighting function used in either Lerner and Lomi
(2020) or Lerner et al. (2013).

Following Lerner and Lomi (2020), the exponential weighting function in
relational event models is:
\deqn{w(s, r, t) = e^{-(t-t') \cdot \frac{ln(2)}{T_{1/2}} }}

Following Lerner et al. (2013), the exponential weighting function in
relational event models is:
\deqn{w(s, r, t) = e^{-(t-t') \cdot \frac{ln(2)}{T_{1/2}} } \cdot \frac{ln(2)}{T_{1/2}}}

In both of the above equations, \emph{s} is the current event sender, \emph{r} is the
current event receiver (target), \emph{t} is the current event time, \emph{t'} is the
past event times that meet the weight subset (in this case, all events that
have the same sender and receiver), and \eqn{T_{1/2}} is the halflife parameter.

The formula for four-cycles for event \eqn{e_i} is:
\deqn{four cycles_{e_{i}} = \sqrt[3]{\sum_{s' and r'} w(s', r, t) \cdot w(s, r', t) \cdot w(s', r', t)}}

That is, the four-cycle measure captures all the past event structures in which the
current event pair, sender \emph{s} and target \emph{r} close a four-cycle. In particular, it
finds all events in which: a past sender \emph{s'} had a relational event with
target \emph{r}, a past target \emph{r'} had a relational event with current sender \emph{s}, and finally,
a relational event occurred between sender \emph{s'} and target \emph{r'}.

Four-cycles are computationally expensive, especially for large relational event
sequences (see Lerner and Lomi 2020 for a discussion on this), therefore this
function allows the user to input previously computed target indegree and sender
outdegree scores to reduce the runtime. Relational events where
either the event target or event sender were not involved in any prior relational
events (i.e., a target indegree or sender outdegree score of 0) will close no-four
cycles. This function exploits this feature.

Moreover, researchers interested in modeling temporal relevancy (see Quintane,
Mood, Dunn, and Falzone 2022; Lerner and Lomi 2020) can specify the dyadic
weight cutoff, that is, the minimum value for which the weight is considered
relationally relevant. Users who do not know the specific dyadic cutoff value to use, can use the
\code{\link{computeRemDyadCut}} function.

Following Lerner and Lomi (2020), if the counts of the past events are requested, the formula for four-cycles formation for
event \eqn{e_i} is:
\deqn{four cycles_{e_{i}} = \sum_{i=1}^{|S'|} \sum_{j=1}^{|R'|} \min\left[d(s'_{i}, r, t),\ d(s, r'_{j}, t),\ d(s'_{i}, r'_{j}, t)\right]}
where, \eqn{d()} is the number of past events that meet the specific set operations, \eqn{d(s'_{i},r,t)} is the number
of past events where the current event receiver received a tie from another sender \eqn{s'_{i}}, \eqn{d(s,r'_{j},t)} is the number
of past events where the current event sender sent a tie to a another receiver \eqn{r'_{j}}, and \eqn{d(s'_{i},r'_{j},t)} is the
number of past events where the sender \eqn{s'_{i}} sent a tie to the receiver \eqn{r'_{j}}. Moreover, the counting
equation can leverage relational relevancy, by specifying the halflife parameter, exponential
weighting function, and the dyadic cut off weight values (see the above sections for help with this). If the user is not interested in modeling
relational relevancy, then those value should be left at their default values.
}
\examples{
data("WikiEvent2018.first100k")
WikiEvent2018 <- WikiEvent2018.first100k[1:1000,] #the first one thousand events
WikiEvent2018$time <- as.numeric(WikiEvent2018$time) #making the variable numeric
### Creating the EventSet By Employing Case-Control Sampling With M = 5 and
### Sampling from the Observed Event Sequence with P = 0.01
EventSet <- processTMEventSeq(
 data = WikiEvent2018, # The Event Dataset
 time = WikiEvent2018$time, # The Time Variable
 eventID = WikiEvent2018$eventID, # The Event Sequence Variable
 sender = WikiEvent2018$user, # The Sender Variable
 receiver = WikiEvent2018$article, # The Receiver Variable
 p_samplingobserved = 0.01, # The Probability of Selection
 n_controls = 8, # The Number of Controls to Sample from the Full Risk Set
 seed = 9999) # The Seed for Replication

#### Estimating the Four-Cycle Statistic Without the Sliding Windows Framework
EventSet$fourcycle <- computeFourCycles(
   observed_time = WikiEvent2018$time,
   observed_sender = WikiEvent2018$user,
   observed_receiver = WikiEvent2018$article,
   processed_time = EventSet$time,
   processed_sender = EventSet$sender,
   processed_receiver = EventSet$receiver,
   halflife = 2.592e+09, #halflife parameter
   dyadic_weight = 0,
   Lerneretal_2013 = FALSE)

#### Estimating the Four-Cycle Statistic With the Sliding Windows Framework
EventSet$cycle4SW <- computeFourCycles(
   observed_time = WikiEvent2018$time,
   observed_sender = WikiEvent2018$user,
   observed_receiver = WikiEvent2018$article,
   processed_time = EventSet$time,
   processed_sender = EventSet$sender,
   processed_receiver = EventSet$receiver,
   processed_seqIDs = EventSet$sequenceID,
   halflife = 2.592e+09, #halflife parameter
   dyadic_weight = 0,
   sliding_window = TRUE,
   Lerneretal_2013 = FALSE)

#The results with and without the sliding windows are the same (see correlation
#below). Using the sliding windows method is recommended when the data are
#big' so that memory allotment is more efficient.
cor(EventSet$fourcycle, EventSet$cycle4SW)

#### Estimating the Four-Cycle Statistic  with the Counts of Events Returned
EventSet$cycle4C <- computeFourCycles(
   observed_time = WikiEvent2018$time,
   observed_sender = WikiEvent2018$user,
   observed_receiver = WikiEvent2018$article,
   processed_time = EventSet$time,
   processed_sender = EventSet$sender,
   processed_receiver = EventSet$receiver,
   processed_seqIDs = EventSet$sequenceID,
   halflife = 2.592e+09, #halflife parameter
   dyadic_weight = 0,
   sliding_window = FALSE,
   counts = TRUE,
   Lerneretal_2013 = FALSE)

cbind(EventSet$fourcycle,
     EventSet$cycle4SW,
     EventSet$cycle4C)

}
\references{
Duxbury, Scott and Dana Haynie. 2021. "Shining a Light on the Shadows: Endogenous Trade
Structure and the Growth of an Online Illegal Market." \emph{American Journal of Sociology} 127(3): 787-827.

Quintane, Eric, Martin Wood, John Dunn, and Lucia Falzon. 2022. “Temporal
Brokering: A Measure of Brokerage as a Behavioral Process.” \emph{Organizational Research Methods}
25(3): 459-489.

Lerner, Jürgen and Alessandro Lomi. 2020. “Reliability of relational event
model estimates under sampling: How to fit a relational event model to 360
million dyadic events.” \emph{Network Science} 8(1): 97-135.

Lerner, Jürgen, Margit Bussman, Tom A.B. Snijders, and Ulrik Brandes. 2013. "Modeling
Frequency and Type of Interaction in Event Networks." \emph{The Corvinus Journal of Sociology and Social Policy} 4(1): 3-32.
}
\author{
Kevin A. Carson \href{mailto:kacarson@arizona.edu}{kacarson@arizona.edu}, Diego F. Leal \href{mailto:dflc@arizona.edu}{dflc@arizona.edu}
}
