---
title: Z(log) Transformation for Laboratory Measurements
output:
    rmarkdown::html_vignette:
        toc_float: true
vignette: >
    %\VignetteIndexEntry{Z(log) Transformation for Laboratory Measurements}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{UTF-8}
bibliography: bibliography.bib
---

# Introduction

The `zlog` package offers functions to transform laboratory measurements into
standardised $z$ or $z(log)$-values as suggested in @hoffmann2017.
Therefore the lower and upper reference limits are needed. If these are not
known they could estimated from a given sample.

# Z or Z(log) Transformation

@hoffmann2017 define $z$ as follows:

$$z = (x - (limits_1 + limits_2 )/2) * 3.92/(limits_2 - limits_1)$$

Consequently the $z(log)$ is defined as:

$$zlog = (\log(x) - (\log(limits_1) + \log(limits_2))/2) * 3.92/(\log(limits_2) - \log(limits_1))$$

Where $x$ is the measured laboratory value and $limits_1$ and $limits_2$ are
the lower and upper reference limit, respectively.

Example data and reference limits are taken from @hoffmann2017, Table 2.

```{r zlog}
library("zlog")

albumin <- c(42, 34, 38, 43, 50, 42, 27, 31, 24)
z(albumin, limits = c(35, 52))
zlog(albumin, limits = c(35, 52))
```

# Inverse Z or Z(log) Transformation/Undo Transformation

```{r izlog}
izlog(zlog(albumin, limits = c(35, 52)), limits = c(35, 52))
```

# Z(log) Dependent Colour Gradient

@hoffmann2017 suggested a colour gradient to visualise laboratory measurements
for the user.

```{r zcol, echo = FALSE, out.width = "95%", fig.width = 10, fig.height = 1, fig.align = "center"}
z <- -10:10
oldpar <- par(mar = c(0, 0, 0, 0), oma = c(0, 0, 0, 0))
image(matrix(z, ncol = 1), col = zcol(z), axes = FALSE)
text(seq(0, 1, length.out=length(z)), 0, label = z)
par(oldpar)
```

It could be used to highlight the values in a table:

```{r zcoltable, echo = FALSE, result = "asis"}
bilirubin <- c(11, 9, 2, 5, 22, 42, 37, 200, 20)
zloga <- zlog(albumin, limits = c(35, 52))
zlogb <- zlog(bilirubin, limits = c(2, 21))
d <- data.frame(
    Category = c(
        rep(c(
            "blood donor",
            "hepatitis without cirrhosis",
            "hepatitis with cirrhosis"
            ),
            each = 3
        )
    ),
    albumin = albumin,
    zloga = zloga,
    bilirubin = bilirubin,
    zlogb = zlogb
)
d$albumin <- kableExtra::cell_spec(
    d$albumin, background = zcol(zloga), align = "right"
)
d$bilirubin <- kableExtra::cell_spec(
    d$bilirubin, background = zcol(zlogb), align = "right"
)
kableExtra::kable_classic(
    kableExtra::kbl(
        d,
        col.names = c(
            "Category",
            "albumin", "zlog(albumin)", "bilirubin", "zlog(bilirubin)"
        ),
        digits = 2,
        escape = FALSE,
        caption = paste0(
            "Table reproduced from @hoffmann2017, Table 2, limits used: ",
            "albumin 35-52 g/l, bilirubin 2-21 µmol/l."
        )
    ),
    "basic"
)
```

# Estimate Reference Limits

The `reference_limits` functions calculates the lower and upper 2.5 or 97.5
(or a user given probability) quantiles:

```{r reference_limits}
reference_limits(albumin)
reference_limits(albumin, probs = c(0.05, 0.95))

exp(reference_limits(log(albumin)))
```

# Working with Reference Tables

Most laboratories use their own age- and sex-specific reference limits.
The `lookup_limits` function could be used to find the correct reference limit.

```{r reference_table}
# toy example
reference <- data.frame(
    param = c("albumin", rep("bilirubin", 4)),
    age = c(0, 1, 2, 3, 7),     # days
    sex = "both",
    units = c("g/l", rep("µmol/l", 4)), # ignored
    lower = c(35, rep(NA, 4)),  # no real reference values
    upper = c(52, 5, 8, 13, 18) # no real reference values
)
knitr::kable(reference)

# lookup albumin reference values for 18 year old woman
lookup_limits(
    age = 18 * 365.25,
    sex = "female",
    table = reference[reference$param == "albumin",]
)

# lookup albumin and bilirubin values for 18 year old woman
lookup_limits(
    age = 18 * 365.25,
    sex = "female",
    table = reference
)

# lookup bilirubin reference values for infants
lookup_limits(
    age = 0:8,
    sex = rep(c("female", "male"), 5:4),
    table = reference[reference$param == "bilirubin",]
)
```

# Missing Reference Limits

Sometimes reference limits are not specified. That is often the case for
biomarkers that are related to infection or cancer. Using zero as lower
boundary results in skewed distributions [@hoffmann2017, fig. 7].
@haeckel2015 suggested to set the lower reference limit to 15 % of
the upper one.

```{r missing_reference}
# use default fractions
set_missing_limits(reference)

# set fractions manually
set_missing_limits(reference, fraction = c(0.2, 5))
```

# Impute Missing Laboratory Measurements

If laboratory measurements are missing they could be imputed using *"normal"*
values from the reference table. Using the `"logmean"` (default) or `"mean"`
reference value (default) will result in a $zlog$ or $z$-value of zero,
respectively.

```{r impute_missing_values}
x <- data.frame(
    age = c(40, 50),
    sex = c("female", "male"),
    albumin = c(42, NA)
)
x
z_df(impute_df(x, reference, method = "mean"), reference)
zlog_df(impute_df(x, reference), reference)
```

# `PBC` Example

For demonstration we choose the `pbc` dataset from the `survival` package
and exclude all non-laboratory measurements except *age* and *sex*:

```{r pbc_load}
library("survival")
data("pbc")
labs <- c(
    "bili", "chol", "albumin", "copper", "alk.phos", "ast", "trig",
    "platelet", "protime"
)
pbc <- pbc[, c("age", "sex", labs)]
knitr::kable(head(pbc), digits = 1)
```

Next we estimate all reference limits from the data. We want to use sex-specific
values for copper and aspartate aminotransferase (`"ast"`).

```{r pbc_reference_limits}
## replicate copper and ast 2 times, use the others just once
param <- rep(labs, ifelse(labs %in% c("copper", "ast"), 2, 1))
sex <- rep_len("both", length(param))

## replace sex == both with female and male for copper and ast
sex[param %in% c("copper", "ast")] <- c("f", "m")

## create data.frame, we ignore age-specific values for now and set age to zero
## (means applicable for all ages)
reference <- data.frame(
    param = param, age = 0, sex = sex, lower = NA, upper = NA
)

## estimate reference limits from sample data
for (i in seq_len(nrow(reference))) {
    reference[i, c("lower", "upper")] <-
        if (reference$sex[i] == "both")
            reference_limits(pbc[reference$param[i]])
        else
            reference_limits(pbc[pbc$sex == reference$sex[i], reference$param[i]])
}
knitr::kable(reference)
```

The `pbc` dataset contains a few missing values. We impute the with the
corresponding mean reference value (which is in this example just the sample
mean but would be in real life the mean of a e.g. healthy subpopulation).

```{r pbc_impute}
pbc[c(6, 14),]
pbc <- impute_df(pbc, reference)
pbc[c(6, 14),]
```

Subsequently we can convert the laboratory measurements into $z(log)$-values
using the `zlog_df` function that applies the `zlog` for every `numeric` column
in a `data.frame` (except the `"age"` column):

```{r pbc_zlog}
pbc <- zlog_df(pbc, reference)
```

```{r pbc_table, echo = FALSE}
pbctbl <- head(pbc, n = 25)
pbctbl[labs] <- lapply(labs, function(l) {
    kableExtra::cell_spec(
        sprintf("%.1f", unlist(pbctbl[l])),
        background = zcol(unlist(pbctbl[l])),
        align = "right"
    )
})

kableExtra::kable_classic(
    kableExtra::kbl(pbctbl, digits = 1, escape = FALSE),
    "basic"
)
```

# Session information

```{r si}
sessionInfo()
```

# References