The zlog
package offers functions to transform
laboratory measurements into standardised z or z(log)-values
as suggested in Hoffmann et al. (2017).
Therefore the lower and upper reference limits are needed. If these are
not known they could estimated from a given sample.
Hoffmann et al. (2017) define z as follows:
z = (x − (limits1 + limits2)/2) * 3.92/(limits2 − limits1)
Consequently the z(log) is defined as:
zlog = (log (x) − (log (limits1) + log (limits2))/2) * 3.92/(log (limits2) − log (limits1))
Where x is the measured laboratory value and limits1 and limits2 are the lower and upper reference limit, respectively.
Example data and reference limits are taken from Hoffmann et al. (2017), Table 2.
## [1] -0.345876 -2.190548 -1.268212 -0.115292 1.498796 -0.345876 -3.804636
## [8] -2.882300 -4.496388
## [1] -0.15472223 -2.24698167 -1.14569028 0.07826303 1.57162335 -0.15472223
## [7] -4.52949253 -3.16160843 -5.69571148
## [1] 42 34 38 43 50 42 27 31 24
Hoffmann et al. (2017) suggested a colour gradient to visualise laboratory measurements for the user.
It could be used to highlight the values in a table:
Category | albumin | zlog(albumin) | bilirubin | zlog(bilirubin) |
---|---|---|---|---|
blood donor | 42 | -0.15 | 11 | 0.88 |
blood donor | 34 | -2.25 | 9 | 0.55 |
blood donor | 38 | -1.15 | 2 | -1.96 |
hepatitis without cirrhosis | 43 | 0.08 | 5 | -0.43 |
hepatitis without cirrhosis | 50 | 1.57 | 22 | 2.04 |
hepatitis without cirrhosis | 42 | -0.15 | 42 | 3.12 |
hepatitis with cirrhosis | 27 | -4.53 | 37 | 2.90 |
hepatitis with cirrhosis | 31 | -3.16 | 200 | 5.72 |
hepatitis with cirrhosis | 24 | -5.70 | 20 | 1.88 |
The reference_limits
functions calculates the lower and
upper 2.5 or 97.5 (or a user given probability) quantiles:
## lower upper
## 24.6 48.6
## lower upper
## 25.2 47.2
## lower upper
## 24.57207 48.51429
Most laboratories use their own age- and sex-specific reference
limits. The lookup_limits
function could be used to find
the correct reference limit.
# toy example
reference <- data.frame(
param = c("albumin", rep("bilirubin", 4)),
age = c(0, 1, 2, 3, 7), # days
sex = "both",
units = c("g/l", rep("µmol/l", 4)), # ignored
lower = c(35, rep(NA, 4)), # no real reference values
upper = c(52, 5, 8, 13, 18) # no real reference values
)
knitr::kable(reference)
param | age | sex | units | lower | upper |
---|---|---|---|---|---|
albumin | 0 | both | g/l | 35 | 52 |
bilirubin | 1 | both | µmol/l | NA | 5 |
bilirubin | 2 | both | µmol/l | NA | 8 |
bilirubin | 3 | both | µmol/l | NA | 13 |
bilirubin | 7 | both | µmol/l | NA | 18 |
# lookup albumin reference values for 18 year old woman
lookup_limits(
age = 18 * 365.25,
sex = "female",
table = reference[reference$param == "albumin",]
)
## lower upper
## albumin 35 52
# lookup albumin and bilirubin values for 18 year old woman
lookup_limits(
age = 18 * 365.25,
sex = "female",
table = reference
)
## lower upper
## albumin 35 52
## bilirubin NA 18
# lookup bilirubin reference values for infants
lookup_limits(
age = 0:8,
sex = rep(c("female", "male"), 5:4),
table = reference[reference$param == "bilirubin",]
)
## lower upper
## bilirubin NA NA
## bilirubin NA 5
## bilirubin NA 8
## bilirubin NA 13
## bilirubin NA 13
## bilirubin NA 13
## bilirubin NA 13
## bilirubin NA 18
## bilirubin NA 18
Sometimes reference limits are not specified. That is often the case for biomarkers that are related to infection or cancer. Using zero as lower boundary results in skewed distributions (Hoffmann et al. 2017, fig. 7). Haeckel et al. (2015) suggested to set the lower reference limit to 15 % of the upper one.
## param age sex units lower upper
## 1 albumin 0 both g/l 35.00 52
## 2 bilirubin 1 both µmol/l 0.75 5
## 3 bilirubin 2 both µmol/l 1.20 8
## 4 bilirubin 3 both µmol/l 1.95 13
## 5 bilirubin 7 both µmol/l 2.70 18
## param age sex units lower upper
## 1 albumin 0 both g/l 35.0 52
## 2 bilirubin 1 both µmol/l 1.0 5
## 3 bilirubin 2 both µmol/l 1.6 8
## 4 bilirubin 3 both µmol/l 2.6 13
## 5 bilirubin 7 both µmol/l 3.6 18
If laboratory measurements are missing they could be imputed using
“normal” values from the reference table. Using the
"logmean"
(default) or "mean"
reference value
(default) will result in a zlog or
z-value of zero,
respectively.
## age sex albumin
## 1 40 female 42
## 2 50 male NA
## age sex albumin
## 1 40 female -0.345876
## 2 50 male 0.000000
## age sex albumin
## 1 40 female -0.1547222
## 2 50 male 0.0000000
PBC
ExampleFor demonstration we choose the pbc
dataset from the
survival
package and exclude all non-laboratory
measurements except age and sex:
library("survival")
data("pbc")
labs <- c(
"bili", "chol", "albumin", "copper", "alk.phos", "ast", "trig",
"platelet", "protime"
)
pbc <- pbc[, c("age", "sex", labs)]
knitr::kable(head(pbc), digits = 1)
age | sex | bili | chol | albumin | copper | alk.phos | ast | trig | platelet | protime |
---|---|---|---|---|---|---|---|---|---|---|
58.8 | f | 14.5 | 261 | 2.6 | 156 | 1718.0 | 137.9 | 172 | 190 | 12.2 |
56.4 | f | 1.1 | 302 | 4.1 | 54 | 7394.8 | 113.5 | 88 | 221 | 10.6 |
70.1 | m | 1.4 | 176 | 3.5 | 210 | 516.0 | 96.1 | 55 | 151 | 12.0 |
54.7 | f | 1.8 | 244 | 2.5 | 64 | 6121.8 | 60.6 | 92 | 183 | 10.3 |
38.1 | f | 3.4 | 279 | 3.5 | 143 | 671.0 | 113.2 | 72 | 136 | 10.9 |
66.3 | f | 0.8 | 248 | 4.0 | 50 | 944.0 | 93.0 | 63 | NA | 11.0 |
Next we estimate all reference limits from the data. We want to use
sex-specific values for copper and aspartate aminotransferase
("ast"
).
## replicate copper and ast 2 times, use the others just once
param <- rep(labs, ifelse(labs %in% c("copper", "ast"), 2, 1))
sex <- rep_len("both", length(param))
## replace sex == both with female and male for copper and ast
sex[param %in% c("copper", "ast")] <- c("f", "m")
## create data.frame, we ignore age-specific values for now and set age to zero
## (means applicable for all ages)
reference <- data.frame(
param = param, age = 0, sex = sex, lower = NA, upper = NA
)
## estimate reference limits from sample data
for (i in seq_len(nrow(reference))) {
reference[i, c("lower", "upper")] <-
if (reference$sex[i] == "both")
reference_limits(pbc[reference$param[i]])
else
reference_limits(pbc[pbc$sex == reference$sex[i], reference$param[i]])
}
knitr::kable(reference)
param | age | sex | lower | upper |
---|---|---|---|---|
bili | 0 | both | 0.4000 | 17.3150 |
chol | 0 | both | 174.0750 | 1086.2250 |
albumin | 0 | both | 2.5400 | 4.2200 |
copper | 0 | f | 12.8250 | 269.2750 |
copper | 0 | m | 23.5000 | 388.0000 |
alk.phos | 0 | both | 504.7500 | 9261.7400 |
ast | 0 | f | 49.6000 | 249.7438 |
ast | 0 | m | 55.4775 | 208.3350 |
trig | 0 | both | 52.0250 | 279.8000 |
platelet | 0 | both | 95.0000 | 470.4000 |
protime | 0 | both | 9.5000 | 13.1625 |
The pbc
dataset contains a few missing values. We impute
the with the corresponding mean reference value (which is in this
example just the sample mean but would be in real life the mean of a
e.g. healthy subpopulation).
## age sex bili chol albumin copper alk.phos ast trig platelet protime
## 6 66.25873 f 0.8 248 3.98 50 944 93 63 NA 11
## 14 56.22177 m 0.8 NA 2.27 43 728 71 NA 156 11
## age sex bili chol albumin copper alk.phos ast trig platelet
## 6 66.25873 f 0.8 248.0000 3.98 50 944 93 63.0000 211.3954
## 14 56.22177 m 0.8 434.8386 2.27 43 728 71 120.6507 156.0000
## protime
## 6 11
## 14 11
Subsequently we can convert the laboratory measurements into z(log)-values
using the zlog_df
function that applies the
zlog
for every numeric
column in a
data.frame
(except the "age"
column):
age | sex | bili | chol | albumin | copper | alk.phos | ast | trig | platelet | protime |
---|---|---|---|---|---|---|---|---|---|---|
58.8 | f | 1.8 | -1.1 | -1.8 | 1.3 | -0.3 | 0.5 | 0.8 | -0.3 | 1.0 |
56.4 | f | -0.9 | -0.8 | 1.8 | -0.1 | 1.7 | 0.0 | -0.7 | 0.1 | -0.6 |
70.1 | m | -0.7 | -1.9 | 0.5 | 1.1 | -1.9 | -0.3 | -1.8 | -0.8 | 0.8 |
54.7 | f | -0.4 | -1.2 | -2.0 | 0.1 | 1.4 | -1.5 | -0.6 | -0.4 | -1.0 |
38.1 | f | 0.3 | -1.0 | 0.6 | 1.1 | -1.6 | 0.0 | -1.2 | -1.1 | -0.3 |
66.3 | f | -1.2 | -1.2 | 1.5 | -0.2 | -1.1 | -0.4 | -1.5 | 0.0 | -0.2 |
55.5 | f | -1.0 | -0.6 | 1.7 | -0.2 | -1.3 | -1.5 | 1.3 | -0.1 | -1.7 |
53.1 | f | -2.3 | -0.9 | 1.5 | -0.2 | 1.0 | -3.3 | 1.0 | 1.4 | -0.2 |
42.5 | f | 0.2 | 0.5 | -0.5 | 0.4 | 0.1 | 0.6 | -0.7 | 0.4 | -0.2 |
70.6 | f | 1.6 | -1.7 | -1.4 | 1.1 | -1.2 | 0.7 | 0.4 | 0.9 | 0.3 |
53.7 | f | -0.7 | -1.1 | 1.8 | -0.3 | -0.9 | -0.8 | -1.0 | 0.5 | 0.8 |
59.1 | f | 0.3 | -1.3 | 0.6 | 0.6 | -1.7 | -0.7 | -0.6 | -2.7 | 2.4 |
45.7 | f | -1.4 | -0.9 | 1.3 | -0.5 | -0.8 | -0.6 | 0.2 | 0.4 | -0.6 |
56.2 | m | -1.2 | 0.0 | -2.8 | -1.1 | -1.5 | -1.2 | 0.0 | -0.7 | -0.2 |
64.6 | f | -1.2 | -1.4 | 1.3 | 1.4 | 1.9 | 0.3 | -0.5 | 0.8 | -0.2 |
40.4 | f | -1.4 | -1.6 | 0.9 | -1.0 | -1.5 | -1.0 | -1.7 | -0.2 | -0.4 |
52.2 | f | 0.0 | -1.0 | -0.3 | 1.3 | -0.5 | 0.1 | 0.1 | 0.1 | -0.8 |
53.9 | f | 1.5 | -1.9 | -1.2 | 3.0 | -1.1 | 2.2 | 1.2 | 0.7 | 1.2 |
49.6 | f | -1.4 | -1.3 | 0.6 | -0.5 | -0.2 | -0.4 | 0.0 | -0.0 | -0.2 |
60.0 | f | 0.7 | -0.3 | 0.5 | 1.1 | -0.2 | 0.2 | 0.3 | 1.0 | 1.8 |
64.2 | m | -1.5 | -1.2 | 1.2 | -1.2 | -1.3 | -1.5 | -0.9 | 1.1 | 0.2 |
56.3 | f | 0.3 | -1.0 | 0.8 | 2.7 | -0.6 | 0.2 | -1.8 | -0.5 | 0.4 |
56.0 | f | 2.0 | -0.2 | -0.8 | 2.9 | 1.4 | 1.7 | 1.1 | 0.0 | 0.5 |
44.5 | m | -0.2 | 0.1 | 1.5 | 0.4 | 1.3 | 2.1 | 1.5 | -2.7 | -1.5 |
45.1 | f | -1.4 | -0.8 | 1.7 | -0.5 | -1.6 | -0.1 | -1.4 | 1.0 | 0.1 |
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] survival_3.7-0 zlog_1.0.2 rmarkdown_2.29
##
## loaded via a namespace (and not attached):
## [1] vctrs_0.6.5 svglite_2.1.3 cli_3.6.3 knitr_1.49
## [5] rlang_1.1.4 xfun_0.49 stringi_1.8.4 jsonlite_1.8.9
## [9] glue_1.8.0 colorspace_2.1-1 buildtools_1.0.0 htmltools_0.5.8.1
## [13] maketools_1.3.1 sys_3.4.3 sass_0.4.9 scales_1.3.0
## [17] grid_4.4.2 evaluate_1.0.1 munsell_0.5.1 jquerylib_0.1.4
## [21] kableExtra_1.4.0 fastmap_1.2.0 yaml_2.3.10 lifecycle_1.0.4
## [25] stringr_1.5.1 compiler_4.4.2 rstudioapi_0.17.1 lattice_0.22-6
## [29] systemfonts_1.1.0 digest_0.6.37 viridisLite_0.4.2 R6_2.5.1
## [33] splines_4.4.2 magrittr_2.0.3 Matrix_1.7-1 bslib_0.8.0
## [37] tools_4.4.2 xml2_1.3.6 cachem_1.1.0