Friday, 19 May 2017

Why not to do EEG+fMRI


Why not to do EEG+fMRI

Recording simultaneous EEG and fMRI data sounds like it might be fruitful. But I will argue that it 1) gains nothing over recording in separate sessions, and 2) encourages poor design and analysis.

1) What do you gain from simultaneous EEG+fMRI?

The main gain is the ability to examine trial-to-trial correlations. But how might this be useful?

Let us assume that we have two experimental conditions, A and B. We might find that the contrast (A-B) gives a region of increased BOLD signal, and also leads to a particular temporal profile of increased Gamma oscillatory power. This data could have been obtained by performing two separate studies. It yields two causal relationships: the manipulation A/B increases BOLD, and A/B changes Gamma power.  What next?

The usual approach is to then examine trial-to-trial correlations. So we might show that, for example, on trials where Gamma power is higher, BOLD signal is also higher. This gains us nothing, since it’s simply caused by A/B. This fact would be known even from the separate sessions. But let us examine just A trials, or just B trials. We might find a correlation here too, and this truly harnesses the simultaneity. What do we conclude from these correlations? We conclude that, even without a manipulation, from trial-to-trial Gamma correlates with BOLD in a region. This superficially seems interesting, but in truth we have no knowledge of why Gamma varies from trial-to-trial, nor why BOLD varies from trial-to-trial. So the fact that they correlate tells us little. There is no experimental manipulation, so it’s not a good test of any hypothesis.

What if we find different correlations in the different conditions? What if A trials show a correlation between BOLD and Gamma, but B trials do not? Prima facie, this condition-by-correlation interaction seems more interesting, since we have caused the correlation. But in fact this could arise for many reasons, not least because there might be greater variability on A trials, in BOLD, Gamma, or both. But let’s assume for argument that there is equal variability in conditions A and B. This trial-to-trial variability is of course uncontrolled, so we must model it as noise. To explain our data we require two sources of uncontrolled variability (noise “1” and “2”). What we must say is this: “A results in noise 1 influencing both BOLD and Gamma, whereas B results in noise 1 influencing BOLD and noise 2 influencing Gamma”. This is the only inference we can make (see footnote). Interpreting this is naturally condemned to be rather vague, because we are trying to examine uncontrolled noise correlations.

The correct way to examine such a relation between BOLD and Gamma would be to specifically manipulate Gamma or BOLD in another way. For example we might introduce conditions C+D which are similar to A+B, but include a task feature that is known to increase Gamma, say, by some aspect of task-relevant processing. This gives a conclusive result without resorting to the hack of simultaneity.

2) Simultaneous recording methods encourage us to do bad analysis

It should be clear now that we buy little from simultaneity. The only way to harness simultaneity is to examine correlations across trials, within single conditions. These correlations arise from natural variability between trials. In other words, this is fundamentally the study of unexplained noise, not the study of manipulations -- which puts it in the same game as resting state analysis.

The variability is, sadly, not well characterised, and in particular are usually non-normally distributed (consider an RT distribution), and different participants will have different mean, variance and skewness. Correlations must be done within subjects and care must be taken regarding assumptions. It is not clear in this case whether it is valid to use mixed linear models to combine data from different subjects. These problems all arise because we are trying to correlate two signals whose noise sources we failed to control for.

More importantly, what hypotheses do we actually have about the origin of trial-to-trial variability? We might include history / sequential effects, learning, fatigue and motivational changes. Surely these should be manipulated, instead of digging into trial-to-trial correlation. There is also true noise, such as thermal / synaptic channel noise, and unmodellable “attentional lapses” -- and perhaps we might argue that these things are best ignored for now!

Far better would be to actually compare manipulations that are known to give rise to different Gamma or BOLD signals, than to tap into uncontrolled trial-to-trial variability. This surmounts many problems of correlating noise. If you are not keen to perform these manipulations, it’s probably because you don’t have appropriate hypotheses: you probably should not be doing this trial-to-trial analysis in the first place! Stick to separate EEG and fMRI experiments.




FOOTNOTE

Numerically, if BOLD has independent noise and Gamma has independent noise , then for each trial i, the simplest expression of the effect of condition is
BOLDi=0+1 Ai+i
Gammai=0+1Ai+i.
where and are parameter estimates. To model correlations across trials due to natural variability, we can include additional terms in Gamma:
Gammai=0+1Ai+i+2 i+3 Ai i
were 2 is the correlation of Gamma with BOLD fluctuations, and 3 is the interaction of BOLD with condition. My argument is that we cannot attach any causal significance to 3 (even though it seems like there is some causality at work), because we have no idea what the distinction between fluctuations and means.

Tuesday, 31 January 2017

I wish I knew MATLAB had... kron


    • The function kron creates the Kronecker (tensor) product of two arrays. Why on earth would this ever be useful?
    • You can use it instead of repmat for certain tasks. Let's say you have an array 
      x = [1,3,5,7]
    • What if you now wanted to triplicate each element, to get [1,1,1,3,3,3,5,5,5,7,7,7]? You might try repmat:
      y = repmat(x,3,1);   y=y(:);
    • which simply creates three rows and then "flattens" the matrix to a vector. But using kron can be simpler:
      y = kron(x,[1 1 1])
    • This replaces each element of y with y(i) * [1 1 1], giving the desired 9 elements.

    Friday, 22 August 2014

    Contra Oscillations


    I recently wrote a criticism of the use of network theory in neuroscience.
    Unfortunately I have similar problems with oscillations.

    1) Is anything oscillating? 

    In physics, oscillations are defined by the presence of simple harmonic motion, in which a restoring force (proportional to the deviation from a set-point) causes a circular trajectory of the system in the complex plane. The projections onto measurables generally result in sinusoidally varying output. Do such things exist in the brain? It is not clear. Alpha waves, for example, do not really look sinusoidal. Well, this could be due to additional resonances in the system -- for example if there were two oscillating elements A and B, in which A is kept in phase with B, but a also has an exact whole-number multiple of the frequency of B. For an alpha wave, perhaps A has three times the frequency of B. This is the natural explanation for an oscillation theorist, as it explains the two spectral peaks, but will it do for a neurobiologist? A pair of synchronized harmonic oscillators?

    A more sophisticated version could be that, like a violin string, there are a large number of coupled oscillators, which can be out of phase in different subpopulations. Due to the coupling, perhaps they can be excited at different frequencies, leading to superimposed harmonic modes that can give the unusual shaped waveform seen in alpha.

    But notice also that the oscillations described have varying phases. If you look at alpha, it changes phase very often! The presence of a phase-change in real oscillators is evidence for a sudden perturbation of the system. But if those phase-changes are basically part-and-parcel of your steady-state signal, as perhaps is the case most of the time in EEG, what does the fourier phase component represent? I think it is unwise to call this kind of behaviour an 'oscillation'. It cannot be described by what we normally call 'oscillation'. Perhaps we need a better model of how this unusual kind of signal can be generated in the brain, for example using synaptic biophysical or neuronal mass-models. Anyhow, even if alpha-band power can metaphorically be tied to an oscillation, I think calling gamma an 'oscillation' is probably hugely overstepping the mark, from what we currently know.

    It may be more parsimonious to dump the oscillation theory of EEG fourier components altogether.

    2) Masked fallacies

    We may be making the same error that the network theorists made: we found a number to quantify, and this number correlates with lots of behaviour. Therefore it must explain how the brain operates. Just because we can examine different frequencies by performing a FFT, this doesn't mean that these frequency bands are really relevant to brain function. Any signal whatsoever is amenable to spectral analysis, even morse code or digital signals; but this on its own does not mean that those frequency components bear any relation to what is being signalled.

    More importantly, the correlation of frequencies to behaviours does not carry any explanatory weight. For example, let us imagine that an overexcited scientist discovers that occipital alpha waves are found both when the eyes are closed, and when people are not paying attention to a visual scene (but not when attending). What is she to conclude from this? Is it justified to conclude that alpha "does" anything? Syllogism:
       P1: When performing well,  alpha is always absent.
       P2: When performing badly, alpha is always present.
    Now which conclusion do you think follows logically from the premises?
       C1: Absent  alpha is necessary  for performing well.
       C2: Absent  alpha is sufficient for performing well.
       C3: Present alpha is necessary  for performing badly.
       C4: Present alpha is sufficient for performing badly.
    Well by pure modus ponens, all of them do. Most scientists want to conclude C1. Some distraction theorists may want to conclude C3. Nobody really glances at C2 and C4. The dangerous consequence is that it encourages us to attribute causality when there is none, or even in the wrong direction.

    We have developed a new language for frequency domain analysis - a language that may be significantly more opaque than we have realised, when attempting to describe events at the neuronal level. We may just be re-describing things that are better seen on standard ERP, i.e. averaged event-locked time-domain traces.

    People have started to realise this, and now commonly mix frequency- and time-domain analyses freely. Oscillations tend to be relatively transient -- if that's not an oxymoron! And although fourier spectra are entirely capable of representing transients, as well as the temporal evolution of envelopes, thought has shifted towards wavelet transforms. But natural physical systems generating transient wavelet components are much less common and more contrived -- consequently making interpretations from this is a recipe for disaster. This is clear enough for transients and envelopes. But I argue, we have no real reason to believe this problem should vanish for non-transient EEG data.

    As a minimum, you must be clear in your mind how your signal processing pathway will represent a step or gradual change in signal. Moreover the signal processing pathway must generate a representation that matches some theory of how signals are generated at the source.

    3) Power and phase should not logically be separated

    Until recently, most studies examined spectral power. This entails performing a time-windowed fourier transform and discarding phase information. We think we have an intuitive grasp of what this spectral power means. But what does this really mean? Can power in different frequency bands tell us about how the brain is working?

    If you take a sound and discard the phase information, you have an unrecognisable mess. This is true however you window or filter the sound. Similarly, if you take an image and discard the phase of the 2D fourier spectrum, you are left with garbage. Here's a line of MATLAB to hear de-phased sound:

      load handel; sound(y) % normal
      w=64; h=hamming(w); for t=1:length(y)-w;
        z=fft(y(t:t+w-1).*h); z2=ifft(z.*exp(rand*2*pi*t*1j));

        y2(t)=real(z2(w/2));
      end; sound(y2) % dephased


    As you can hear, the sound retains some of its rhythmic characteristics, but it really bears no other resemblance to the original! You can try other frequency transforms: if you change 'rand' to '0.1', you can actually keep the phase information, but amplify the phase -- but it still sound pretty alien.

    Well, it turns out that most early 'oscillation' work discarded phase. I'm not arguing that sound or pictures are a good model for what EEG activity is like. But I would like to say that it's really quite unclear what these 'band powers' mean on their own, in terms of interpreting a signal.

    4) Linear assumptions give artefactual couplings

    Phase information has been used to study phase-amplitude coupling in a single channel. What does that mean in terms of the signal? Aficionados would tell you that PAC occurs when the power of some higher frequency correlates with the phase of an (independent) lower frequency. For example, gamma power might be higher whenever theta is in a trough, compared to when theta is at a peak. In noise, different frequencies are independently-extractable components of the signal, so the presence of any correlation is meaningful. So what causes this?

    The simplest explanation is that the two signals are not independent. There might be some saturation or nonlinearity in your recording electrode / amplifier / skull -- for example making fast oscillations appear smaller in amplitude when you are at the peak of theta, compared to at the trough. Any kind of nonlinearity could generate 'cross-frequency coupling' - because it clearly breaks the independence between the frequencies.

    However we now have enough single-cell electrophysiology to convince us there might be some truth to phase-amplitude coupling: a cell is more likely to fire when the LFP is at particular phases. This 'preferred phase' might be different for different brain areas and in different tasks, but is similar for many neurones in a region. In theory, this might cause EEG gamma to increase in amplitude at certain theta phases, but artefactual causes must still be ruled out. This is especially concerning if gamma appears strongest in theta troughs.

    5) Synchrony buys nothing over ERP


    Studies often report cross-frequency coupling in a single channel. What does this mean? The afficionados will tell you that those independent frequency components of the signal have similar phase differences, across trials. For example, on any given trial, the phase of gamma is at a given lag compared to alpha. The consistency across trials gives you CFC.

    A similar conclusion can be drawn by measuring synchrony across all frequencies. For example, "synchrony increases" at key moments during a trial. The usual meaning of synchrony, at a particular frequency, is that across trials, the phase of the Fourier decomposition at a given frequency is similar. A recent paper showed "increased gamma synchrony" around the time of a decision.

    But this is obviously going to be the case! It follows directly from the ERP in fact. Something happens at a moment in time. Therefore, whatever shape the wave is, the fact that the wave has a distinct spectrum will lead to CFC.  The specific phase differences between frequencies are determined purely by the shape of the ERP surrounding the stimulus.
    If the ERP shows a readiness potential, then we might expect to see 'synchrony' (phase coupling) increasing before the event - since this is required to produce an ERP that sums across trials.

    6) The usual culprits

    I have deliberately left out the obvious, well-known problems with oscillation analysis, such as arbitrariness in choice of filtering, incomplete eye movement and blink removal, the compromise between time resolution and windowing artifact, etc....

    New uses for complex numbers


    I wish I'd known MATLAB had... complex numbers!
    If you do any geometry in matlab, you'll find complex numbers really handy.
    • The great advantage is, if you have 10x10 points, normally you'd need to store them either as
      • X = 10x10 matrix of X-coordinates, Y = 10x10 matrix of Y-coordinates
        Or as:
      • M = cat(3,X,Y),  which is a 10x10x2 matrix of coordinates; the first slice is the X's and the second slice is Y's.
       The first is irritating because if you want to copy the array, transform or pass it to functions etc... you always have 2 variables.
       The second is irritating because proper matrix operations don't work in 3 dimensions. Also you have to "squeeze" to extract coordinates e.g. M=SQUEEZE(M(:,:,1)). Also, if you are using many dimensions, you have to remember which dimension is the X/Y dimension. 

    • The complex solution is easy and aesthetically pleasing!
          Z=X+1j*Y     which is a 10x10 matrix of complex numbers.
    • To quickly calculate the distances of each of the points in a matrix, to another point P=x+iy,
          distance = abs(Z-P)
    • and similarly for the angles, use the argument function, angle(Z-P)    (in most other languages this function is called "arg".
    • The real beauty is if you wanted to rotate all the points in the matrix by 45 degrees around the point P=x+iy, simply do
          (Z-P)*exp(1j*pi/4)+P - which translates by -P, rotates by pi/4, and adds back P. Now imagine doing that with X and Y coordinates!
      If you are dealing with a trajectory of points, the complex numbers can be fed into standard SMOOTH or CONV to perform smoothing.
    • If you use i and j in for loops, don't worry! For complex numbers always use 1i or 1j.
    • important functions are REAL, IMAG, ABS, ANGLE
    Caveat: NaN behaves in an interesting way. Although ISNAN will be true if either the X or Y coordinate is NaN, the number still retains the X and Y coordinates separately!

    New uses for Matrix Multiplication


    I wish I'd known MATLAB had...     matrix multiplication
    • Often overlooked, matrix multiplication can do many useful things!
    • Using multiplication instead of repmat. If you have a column vector V, and you want to repeat it horizontally 5 times,
      • REPMAT(V, 1, 5)  is the same as
                V*[1 1 1 1 1].
      • REPMAT(V,1,10) is the same as
               V*ones(1,10)
      • REPMAT(V, 1, length(V)) will make a square, and so will V*(1+0*V)
      • You can create the times table for 1 to 12 with [1:12]'*[1:12]
    • Using multiplication instead of SUM. If you have a column vector V, and you want to sum it,
      • SUM(V)  is the same as (1+0*V)'*V     note that 0*V is a handy way of creating a zero matrix of the same size as V.
      • if you want a weighted sum, where W is the weights, 
        SUM(V.*W) is the same as V'*W   or W'*V.

    Binary function with singleton expansion

    BSXFUN( function,  A, B )

    This great function really helps when you want to do an "outer product". That is, if you need to combine each item in list A with each item in list B. It generalises to any number of dimensions, too!

    • "Binary function with singleton expansion" is a fancy way of saying automatic dimension replication.
    •  A binary function is simply any function that takes two parameters. For example "plus(A,B)", which is the function that does A+B
    • Often repmat can be replaced  with BSXFUN. I'm not sure whether this improves readability or not, but it works well in some situations.
        BSXFUN(@plus, [1,2,3],[5;6])
            = [ 6 7  8
                7 8 9 ]

        BSXFUN(@gt, [1 2 3 4 5 6 7 8 9],[3; 6]) =
                  = [0 0 0 1 1 1 1 1 1
                     0 0 0 0 0 0 1 1 1 ]

        note the following useful binary operations:
    • gt(a,b) = a>b (greater than)
    • lt(a,b) = a<b ( less than)
    • ge, le (greater-or-equal, less-or-equal)
    • plus, times, rdivide
    * note that you can usually achieve the same thing with repmat, The disadvantage of REPMAT is that you have to know the sizes explicitly, or calculate them explicitly
        Y = BSXFUN(@plus, [1 2 3],[5;6])
       is the same as
        Y = plus(repmat([1 2 3], 2,1), repmat([5;6],1,3)    or
        Y = repmat([1 2 3],2,1)+repmat([1;2],1,3)  or
    * as another example, if you wanted to create the times-tables for 1 to 12:
        BSXFUN(@times,[1:12]',[1:12])
    * Note that singleton expansion is automatic for scalars! i.e.
        1+V  == ones(size(V)) + V
        2*V   == times(2*ones(size(V)), V)

       Python's numpy automatically expands dimensions for standard binary operations.

     Also try combining these techniques with
        - creating quick functions
        - matrix multiplication expansion

    Tuesday, 17 June 2014

    Contra Networks

    Contra networks

    Olaf Sporns, "Contributions and challenges for network models in cognitive neuroscience"

    Nature Neuroscience 17:652-60 (2014)

    This paper might be one of the most stimulating I have read recently - mainly because I agree with very little of it! In summarry he reviews studies that have anaylsed the connections between brain areas (histological, diffusion tensor, imaging and functional connectivity) using network models. Network theory is, in my eyes, just a posh way of getting numbers that characterise a particular way of connecting nodes together. Sporns offers to take a critical look at some of the advantages and limitations of network models in cognitive neuroscience. Unsurprisingly he has a very positive take.

    Comments

    1. You are eating your tail. Typical hollow and meaningless conclusions are:

      • Network models offer a theoretical framework that encompasses both local and global processes and thus resolves the long standing conflict between localised and distributed processing.
      • By addressing how connectivity mediates both segregation and integration, network approaches not only reconcile these seemingly opposing perspectives, they also sugget that their coexistence is fundamental for brain function.
      • Integration in large-scale brain networks is supported by a set of specialised brain regions that transiently orchestrate interactions between functional modules
      • Integrative processes have a distinct anatomical substrate
      • There are robust relationships between network architecture and functional specialization

      Basically, in each case, the conclusion is foreborne by the premises. If you start by delineating modules, calculating connectivity, and proximity measures, these conclusions say nothing more than "I am using this method". They explain nothing.

      Just as an example, let's take the first one - that networks might allow us to resolve the
      global-local processing problem. The argument appears to be like this:
      • Premise 1: "global network meausures capture important characteristics related to overall signaling capacity"
        (note that this is itself a rather fanciful comment for functional networks -- even if I accept the doctrine of networks! In real computer networks, signalling capacity is the rate of transmission - mainly a function of compression at encoding, and how fast a channel can change its signal) 
      • Premise 2: "most RSNs are associated with specific behavioural or cognitive domains... network communities corresponding to specific domains were found to be associated with distinct functional fingerprints" - OK
      • Conclusion: we have found numbers characterising networks at local and global scales, so we have solved the dilemma of whether computation is done at local or global scales.
       Hopefully you can see that all the arguments above are of this form:
       "I can measure new numbers meaning X - so this explains how the brain works by using X"

       In the same vein, functional networks that show changes in such global/local parameters
       say nothing more than: "global/local correlations change over time, so this solves dilemmas we had about how global/local processing coexist".

      (Incidentally, if you believe in these functional changes over time, they invalidate any conclusions you have made about structural networks!)
    2. Is it surprising that functional connectivity parallels structural connectivity? Could it be any other way?
    3. Quantitative is overrated. Yes, it allows applications of quantitative measures of network topology. But to what end? We don't yet have even a qualitative theory of how the cognition is done. Perhaps we should start working on that?
    4. Claims that
      • it allows identification of structural network elements that are specialised for carrying out integrative function.; and
      • quantitative approaches provide information on regional roles in neural processing within and across communities.
      I.e. Betweenness (Centrality, Degree) are markers of a node having integrative function. Why? couldn't we just be looking at a relay station? The thalamus might be a relay station (I don't think it is, of course) yet still be at the centre of network topology - with no integrative function whatsoever.
    5. Areas which are highly functionally connected are unlikely to be computing anything interesting. In fact, if the activity of two areas is highly correlated, it suggests that they represent information in similar ways, and thus no real computation has been performed. If the frontoparietal module correlates with visual and tactile networks, that means that it encodes the two types of information (visual and tactile) in parallel. I.e. frontoparietal activity is a linear combination of visual and tactile information, and thus no computation has been performed, only superposition.

      In fact, shouldn't computation be defined as occurring just when areas are structurally connected but no functional correlation is seen?
    6. what are the fundamental rules and principles of network science? They are either tautologous (trivially true) or not based on the kind of information processing the brain does. No conclusion is going to be relevant to the neural basis of cognition if you neglect:
      1. the direction of connections,
      2. inhibitiory vs excitatory,
      3. preservation / distortion / manipulation of representations,
      4. cortical lamination...
      Even if you add these "instatiation properties" to network theory, what does the network theory itself add?
    7. You are going to tell me that, in general, areas that are nearby each other in the brain are more likely to connect together. What is surely more interesting is the cases in which that rule is broken, i.e. where the greatest deviation from standard white-matter-distance occurs?
      In these situations, connectivity might be informative, because there must be a reason behind connections that are are not predictable from anatomical location.
    8. Do you really believe that the kinds of processes that generate human thought, understanding, belief, reasoning etc. can be even coarsely described in terms of the topology of the network? I'd say certainly not, at least with anything resembling the kind of network we're talking about today.
    On the other hand, in favour of networks,
    1. MEG and connectivity might help understand short term plasticity.
    2. It is still possible that network statistics of a "node" might help understand why lesions to some brain areas cause more symptoms than others. One might argue that a node with high centrality would cause more deficits than a node with low centrality. But in drawing this conclusion, you make certain silent assumptions. In particular, you are suggesting that lesions that fail to disconnect two regions, because there are other (indirect) routes between the two regions, then this redundancy of connectivity allows "information flow" "around" the lesion. First, it is not clear that connections in a network represent information flow, even if they were directional. Functionally derived networks, in particular, notoriously connect nodes which are simply driven by a common source. Second, it seems incoherent to suppose that nodes are performing computations, if you then speak of a damaged node being replaced by a chain of nodes that effectively "conduct around" it. Third, if we bite the bullet and suggest that some connections really do pass right through some nodes, e.g. if white matter and grey matter were not fully distinguished (as is likely to be the case in the striatum), then it is entirely unsuprising that a lesion would affect distant areas - this is just a glamorisation of the age-old phenomenon of diaschisis, and needs no network statistics to explain it.
    3. ?
    In conclusion:  

    Somewhere, the vague and heterogeneous metaphors of electrical conduction, internet servers, ethernet hubs, logic gates, connectionist synaptic weights and telephone switchboards have become muddled and confused, to the point where teasing out meaningful conclusions seems futile.