Skip to content

SCCAN questions

dorianps edited this page Mar 15, 2019 · 46 revisions

SCCAN stands for Sparse Canonical Correlation Analysis for Neuroimaging. Strictly speaking, SCCAN is not a statistical method but an optimization routine that follows machine learning principles. Since many users expect SCCAN to be just a voxel-based statistical method, they tend to apply the same concepts, and often get confused by the results. Below you can find some frequently asked question and respective answers which should help understand this method.

How does SCCAN work?

SCCAN finds a set of voxels that together contribute to explain the behavioral score (multivariate method). These voxels are found slowly in a number of iterations, by giving weights to each voxel. At each iteration, weights are smoothed, and isolated voxels are set back to zero (the lesion of a single voxel does not cause a deficit, remember?). The extent of the final solution depends largely on the sparseness value. For example, when SCCAN is called to find a solution, it is basically being asked to find a solution of a certain extent (aka sparseness). However, since you don't know how extensive the results should be, LESYMAP runs internal 4-fold cross-validation procedures to find what is the best sparseness. This means that your subjects are split in 4 groups, 3/4 is used to identify the voxel weights and 1/4 is used to predict the behavioral scores with those weights. The best sparseness is the one with the most accurate predictions of new patients (the 1/4 chunks). Once the best sparseness is found, a final SCCAN is run on all subjects using the optimal sparseness value. The map you see at the end is derived from this final SCCAN run on all subjects.

Where is the map of p-values

Because SCCAN is not a voxel-wise method, and it doesn't produce p-values for each individual voxel. The only map your should care is stats.img, which contains the voxel weights - the stronger the weight the more important that voxel is in relation to behavior.

How do I know that results are not random, how can I assign a p-value to SCCAN?

There is a single p-value you should look when running SCCAN. That is the p-value of the correlation between true and predicted behavioral scores at the best sparseness value, and is called CVcorrelation.pval. This is the p-value of the solution as a whole, not of individual voxels. This global p-value is used for one thing only: to decide whether the solution is random or not. For example, if the relationship between lesions and deficit is too poor, the solutions found during cross-validation will be almost random, and will not be able to predict new patients. As a result CVcorrelation.pval might be 0.23, which indicates that brain-behavior relationships are too weak to identify voxels that can predict new patients.

Why the results don't change when I change pThreshold?

Because pThreshold is used only to decide whether the global solution is significant or not. It has no effect on the solution itself, you will keep getting the same solution for as long as the CVcorrelation.pval is below pThreshold. If you are looking to get more extensive or less extensive results you should change sparseness.

How do I change the extent of the results?

The only way to get different results is to change sparseness. Increasing sparseness towards 1 or -1 will produce larger maps, while decreasing sparseness towards 0 will produce very focal maps. Changing sparseness is not advised, however, because the optimal sparseness should be found empirically. A manual choice is an arbitrary decision. One of the benefits of our implementation of SCCAN is that the researcher does not have much room to tweak the results (unless you know what you are doing).

What happens if I increase sparseness, will I find simply more extensive results?

No. The effect of sparseness is different from the effect of thresholding a map. A larger sparseness will tell the algorithm that it has more freedom to retain voxels with smaller weights. However, this has an impact also on the other voxels. There will probably be less iterations (faster solution). But since voxel weights are refined at each iteration, the final result will have different weights also around the peaks. Typically, the peaks identified at low sparseness start to look like valleys at high sparseness. The overall pattern of results may look similar - the most relevant region is still the same - but the relationship of weights with nearby voxels will be different.

Is SCCAN 100% precise?

No. SCCAN is usually more precise than voxel-wise methods, but is not a magic method. Simulations show that slight displacements in peaks can be observed with SCCAN, too (see the comparison video). These displacements may occur for various reasons, one of which might be that the noise introduced in simulations may randomly push the behavioral score to relate better with a neighboring area than the area that produced it. Another one might be that the spatial smoothing applied iteratively to SCCAN weights may slightly push findings outside the brain-air border. Although these effects deserve more investigation, for now you can assume that some displacement may exist with SCCAN, too, but it is not something easy to identify or fix in today's standard of analyses (it would probably require simulations for your specific dataset).

I ran SCCAN outside LESYMAP and got different results.

The voxel weights obtained from SCCAN are typically very small (i.e. 0.0000003). LESYMAP normalizes these scores (-1 to 1) and removes voxels with weights closer to zero (-0.1 > weight < 0.1). This process removes a lot of voxels with weights smaller than 10% of the maximal value. The 10% threshold is an arbitrary choice, it was chosen simply because it seemed to produce the most accurate map in various scenarios. Further investigation is needed to see if this value is indeed the best one. The developer of the SCCAN method has now included a special option to use a 10% threshold in the internal SCCAN iterations (sparseDecom2 function in ANTsR). This new implementation makes SCCAN much faster while apparently producing the same results. However, we still use the traditional post-hoc thresholding in LESYMAP because more rigorous tests are needed to make sure the new method produces equivalent results. In conclusion, you should know that the true output of SCCAN is more extensive than what you see in the LESYMAP results. If you are curious about the original non-truncated weights, check the output called rawWeights.img.