```{r setup, include=FALSE} library(methods) library(ggplot2) # library(pander) library(knitr) knitr::opts_chunk$set(echo = TRUE, comment = "", cache=TRUE, warning = FALSE) ``` ## Overall Pipeline ## Background - Obtaining manual lesion segmentations is often resource intensive. - "Gold standard": Inter- and Intra-rater variability. - Accurate and efficient methods for automatic segmentation are necessary for scalability and research progress. - In this tutorial, we will learn how to train and apply OASIS [@sweeney2013oasis], an automatic lesion segmentation model, to obtain predicted lesion probability maps. - Relies on intensity-normalized data. ```{r loading, echo=FALSE, message=FALSE, cache = FALSE} library(ms.lesion) library(neurobase) library(fslr) library(scales) library(oasis) library(dplyr) tr_files = get_image_filenames_list_by_subject( group = "training", type = "coregistered") ts_files = get_image_filenames_list_by_subject( group = "test", type = "coregistered") #tr_t1s = lapply(tr_files, function(x) readnii(x["T1"])) #tr_t2s = lapply(tr_files, function(x) readnii(x["T2"])) #tr_flairs = lapply(tr_files, function(x) readnii(x["FLAIR"])) #tr_masks = lapply(tr_files, function(x) readnii(x["Brain_Mask"])) #tr_golds = lapply(tr_files, function(x) readnii(x["mask"])) #tr_oasis = lapply(tr_files, function(x) readnii(x["Default_OASIS"])) #ts_t1s = lapply(ts_files, function(x) readnii(x["T1"])) #ts_t2s = lapply(ts_files, function(x) readnii(x["T2"])) #tr_pds = ts_pds = NULL #ts_flairs = lapply(ts_files, function(x) readnii(x["FLAIR"])) #ts_masks = lapply(ts_files, function(x) readnii(x["Brain_Mask"])) #ts_golds = lapply(ts_files, function(x) readnii(x["mask"])) #ts_oasis = lapply(ts_files, function(x) readnii(x["Default_OASIS"])) ``` ## MS Lesion Segmentation with OASIS - **O**ASIS is **A**utomated **S**tatistical **I**nference for **S**egmentation [@sweeney2013oasis]. - OASIS takes FLAIR, T1, T2, and PD (optional) images. - Produces OASIS probability maps of MS lesion presence. - These can be thresholded into a binary lesion segmentation. - The OASIS model is based on a logistic regression. - Regress binary manual segmentation labels on the images, smoothed versions of the images, and some interaction terms (e.g., supervised learning). - Performs well compared to common machine learning models [@sweeney2014comparison] ## Default OASIS Model - The OASIS library comes with default parameters that can be used to generate probability maps for new test subjects. - The default model was trained on approximately 100 MS subjects and 30 healthy subjects with manual segmentations. - Here we apply `oasis_predict` with the default model to obtain OASIS probability maps for the test subjects. ```{r default_predict_ts_show, eval=FALSE} library(oasis) default_predict_ts = function(x){ res = oasis_predict( flair=ts_flairs[[x]], t1=ts_t1s[[x]], t2=ts_t2s[[x]], pd=ts_pds[[x]], brain_mask = ts_masks[[x]], preproc=FALSE, normalize=TRUE, model=oasis::oasis_model) return(res) } default_probs_ts = lapply(1:3, default_predict_ts) ``` ## Vizualization of probability map - Here's the probability map for test subject 01. ```{r viz_01, echo=FALSE} tsflair = readnii(ts_files$test01["FLAIR"]) les_mask = readnii(ts_files$test01["Default_OASIS"]) les_mask[les_mask < .05] = 0 vx = c(122,257,341) ortho2(tsflair, les_mask, xyz = vx) ``` ## Thresholding: Getting a binary map - We must choose a cutoff to binarize the OASIS probability maps. - The `binary` argument in the `oasis_predict` function is FALSE by default, resulting in the output being the probability map. - Setting `binary=TRUE` will return the thresholded version, using the input to the `threshold` argument (default = 0.16). - 0.16 was obtained via a validation set allowing for a 0.5% false positive rate. - In practice, we might want to use a grid search over thresholds and cross validation to choose the cutoff. ## Vizualization of binary map - Here's the binary mask for test subject 01, using the default 0.16 threshold: ```{r viz_02, echo=FALSE} les_mask[les_mask > 0.16] = 1 les_mask[les_mask < 1] = 0 ortho2(tsflair, les_mask, xyz=vx, col.y=alpha("red", 0.5)) ``` ## Default OASIS Model - To evaluate how the default model with default threshold performs, we'll compare the predictions to our manual segmentations. ```{r default_predict_tr_run, eval=FALSE, echo=FALSE} ts_oasis = lapply(ts_files, function(x) readnii(x["Default_OASIS"])) default_ts = lapply(ts_oasis, function(x){ x = x > 0.16 return(x) }) ``` - Sorensen–Dice coefficient: - Similarity measure between two samples. - Ranges from 0 to 1. - (TP) - true positive, (FP) - false positive, (FN) - false negative. $$D = \frac{2TP}{2TP + FP + FN}$$ ## Default OASIS Model Results Dice coeffients for the test subjects ```{r table1, echo=FALSE, eval=FALSE} dice = function(x){ return((2*x[2,2])/(2*x[2,2] + x[1,2] + x[2,1])) } tbls_df = lapply(1:3, function(x) table( c(round(ts_golds[[x]], 0)), c(round(default_ts[[x]], 0)) )) dfDice = sapply(tbls_df, dice) diceDF = data.frame(Subject=factor(rep(1:3)), Dice=c(dfDice)) write.csv(diceDF, file='testDice.csv', row.names=FALSE) ``` ```{r table1_run, echo=FALSE} diceDF = read.csv('testDice.csv') g = ggplot(diceDF, aes(x=Subject, y=Dice)) + geom_histogram(position="dodge", stat="identity") print(g) ``` ## Improving Results - We might be able to improve the results by adjusting the threshold. - Let's optimize the threshold on the training data using a grid search (in practice, we might do cross-validation). ```{r seq_show, echo=FALSE, eval=FALSE} tr_golds = lapply(tr_files, function(x) readnii(x["mask"])) tr_oasis = lapply(tr_files, function(x) readnii(x["Default_OASIS"])) th = seq(0.05, 0.3, by=0.05) dice_tr = lapply(1:5, function(x){ img = tr_oasis[[x]] gold = tr_golds[[x]] diceTh = unlist(lapply(th, function(y){ binimg = img > y tbl = table(c(binimg), c(gold)) return(dice(tbl)) })) return(diceTh) }) avgDice = Reduce('+', dice_tr)/length(dice_tr[[1]]) write.csv(rbind(th, avgDice), file='avgDice.csv', row.names=FALSE) ``` ```{r seq_run, echo=FALSE} diceDf = read.csv('avgDice.csv') colnames(diceDf) = NULL rownames(diceDf) = c("Threshold", "Average Dice") print(round(diceDf, 3)) ``` ## Improving Results - Turns out a coarse grid search chose a threshold of 0.15, so the results are nearly identical. ```{r seq_show_15, echo=FALSE, eval=FALSE} default_ts15 = lapply(ts_oasis, function(x){ x = x > 0.15 return(x) }) tbls_df15 = lapply(1:3, function(x) table( c(round(ts_golds[[x]], 0)), c(round(default_ts15[[x]], 0)))) dfDice15 = sapply(tbls_df15, dice) diceDF15 = data.frame(Subject=factor(rep(1:3)), Dice=c(dfDice15)) write.csv(diceDF15, file='testDice15.csv', row.names=FALSE) ``` ```{r seq_run_15, echo=FALSE} diceDF15 = read.csv('testDice15.csv') g = ggplot(diceDF15, aes(x=Subject, y=Dice)) + geom_histogram(position="dodge", stat="identity") print(g) ``` ## Improving Results - We might be able to further improve the results by re-training the OASIS model using our five training subjects. - To re-train using new data, binary masks of gold standard lesion segmentations are needed and should be in T1 space. ## Making OASIS data frames - OASIS requires a particular data frame format, which we create using the function `oasis_train_dataframe`. - Includes an option to preprocess your data (`preproc`), which does (1) inhomogeneity correction using `fsl_biascorrect` and (2) rigid coregistration using `flirt` to the T1 space. - Includes an option to whole-brain intensity normalize (`normalize`). - `make_df()` below is a helper function. ```{r oasis_df_show, eval=FALSE} make_df = function(x){ res = oasis_train_dataframe( flair=tr_flairs[[x]], t1=tr_t1s[[x]], t2=tr_t2s[[x]], pd=tr_pds[[x]], gold_standard=tr_golds[[x]], brain_mask=tr_masks[[x]], preproc=FALSE, normalize=TRUE, return_preproc=FALSE) return(res$oasis_dataframe) } oasis_dfs = lapply(1:5, make_df) ``` ## Training OASIS - The function `oasis_training` takes the data frames we made and fits a logistic regression using labels and features from a subset of voxels in each subject's brain mask (top 15\% in FLAIR intensity). - The function `do.call` is a useful R function that applies the function named in the first argument to all elements of the list specified in the second argument. ```{r oasis_model_show, eval=FALSE} ms_model = do.call("oasis_training", oasis_dfs) ``` ## OASIS model object ```{r oasis_model_show2} print(ms.lesion::ms_model) ``` ## Trained OASIS Model Results ```{r trained_predict_tr_show, eval=FALSE, echo=FALSE} tr_golds = lapply(tr_files, function(x) readnii(x["mask"])) tr_oasis = lapply(tr_files, function(x) readnii(x["Trained_OASIS"])) th = seq(0.05, 0.3, by=0.05) dice_tr = lapply(1:5, function(x){ img = tr_oasis[[x]] gold = tr_golds[[x]] diceTh = unlist(lapply(th, function(y){ binimg = img > y tbl = table(c(binimg), c(gold)) return(dice(tbl)) })) return(diceTh) }) avgDice = Reduce('+', dice_tr)/length(dice_tr[[1]]) write.csv(rbind(th, avgDice), file='avgDiceReTrain.csv', row.names=FALSE) ``` ```{r trained_predict_tr_dice, eval=TRUE, echo=FALSE} diceDf = read.csv('avgDiceReTrain.csv') colnames(diceDf) = NULL rownames(diceDf) = c("Threshold", "Average Dice") print(round(diceDf, 3)) ``` ```{r trained_predict_tr_run, eval=TRUE, echo=FALSE} trained_ts = lapply(ts_files, function(x){ img = readnii(x["Trained_OASIS"]) img = img > 0.15 return(img) }) ``` - Using a threshold of 0.15. - Dice coeffients for default vs. re-trained OASIS model. ```{r table3, echo=FALSE} ts_golds = lapply(ts_files, function(x) readnii(x["mask"])) tbls_ts = lapply(1:3, function(x) table(c(ts_golds[[x]]), c(trained_ts[[x]]))) tsDice = sapply(tbls_ts, dice) diceTS = data.frame('Subject'=factor(1:3), 'Dice'=c(tsDice)) diceDF$Model = "Default" diceTS$Model = "Trained" diceAll = rbind(diceDF, diceTS) diceAll$Model = factor(diceAll$Model) g = ggplot(diceAll, aes(x=Subject, y=Dice)) + geom_histogram(position="dodge", stat="identity") + facet_wrap(~Model) print(g) ``` ## Improvement ```{r dice_mat, echo = FALSE} df = cbind(id = sprintf("%02.0f", 1:3), r1 = round((tsDice - diceDF$Dice) / diceDF$Dice * 100, 1)) df = data.frame(df, stringsAsFactors = FALSE) colnames(df) = c("ID", "Dice") ``` - Percent improvement in Dice over the default model: ```{r, echo = FALSE} knitr::kable(df) ``` ## Website [http://johnmuschelli.com/imaging_in_r](../index.html) ## References {.smaller}