Friday 22 August 2014

Contra Oscillations


I recently wrote a criticism of the use of network theory in neuroscience.
Unfortunately I have similar problems with oscillations.

1) Is anything oscillating? 

In physics, oscillations are defined by the presence of simple harmonic motion, in which a restoring force (proportional to the deviation from a set-point) causes a circular trajectory of the system in the complex plane. The projections onto measurables generally result in sinusoidally varying output. Do such things exist in the brain? It is not clear. Alpha waves, for example, do not really look sinusoidal. Well, this could be due to additional resonances in the system -- for example if there were two oscillating elements A and B, in which A is kept in phase with B, but a also has an exact whole-number multiple of the frequency of B. For an alpha wave, perhaps A has three times the frequency of B. This is the natural explanation for an oscillation theorist, as it explains the two spectral peaks, but will it do for a neurobiologist? A pair of synchronized harmonic oscillators?

A more sophisticated version could be that, like a violin string, there are a large number of coupled oscillators, which can be out of phase in different subpopulations. Due to the coupling, perhaps they can be excited at different frequencies, leading to superimposed harmonic modes that can give the unusual shaped waveform seen in alpha.

But notice also that the oscillations described have varying phases. If you look at alpha, it changes phase very often! The presence of a phase-change in real oscillators is evidence for a sudden perturbation of the system. But if those phase-changes are basically part-and-parcel of your steady-state signal, as perhaps is the case most of the time in EEG, what does the fourier phase component represent? I think it is unwise to call this kind of behaviour an 'oscillation'. It cannot be described by what we normally call 'oscillation'. Perhaps we need a better model of how this unusual kind of signal can be generated in the brain, for example using synaptic biophysical or neuronal mass-models. Anyhow, even if alpha-band power can metaphorically be tied to an oscillation, I think calling gamma an 'oscillation' is probably hugely overstepping the mark, from what we currently know.

It may be more parsimonious to dump the oscillation theory of EEG fourier components altogether.

2) Masked fallacies

We may be making the same error that the network theorists made: we found a number to quantify, and this number correlates with lots of behaviour. Therefore it must explain how the brain operates. Just because we can examine different frequencies by performing a FFT, this doesn't mean that these frequency bands are really relevant to brain function. Any signal whatsoever is amenable to spectral analysis, even morse code or digital signals; but this on its own does not mean that those frequency components bear any relation to what is being signalled.

More importantly, the correlation of frequencies to behaviours does not carry any explanatory weight. For example, let us imagine that an overexcited scientist discovers that occipital alpha waves are found both when the eyes are closed, and when people are not paying attention to a visual scene (but not when attending). What is she to conclude from this? Is it justified to conclude that alpha "does" anything? Syllogism:
   P1: When performing well,  alpha is always absent.
   P2: When performing badly, alpha is always present.
Now which conclusion do you think follows logically from the premises?
   C1: Absent  alpha is necessary  for performing well.
   C2: Absent  alpha is sufficient for performing well.
   C3: Present alpha is necessary  for performing badly.
   C4: Present alpha is sufficient for performing badly.
Well by pure modus ponens, all of them do. Most scientists want to conclude C1. Some distraction theorists may want to conclude C3. Nobody really glances at C2 and C4. The dangerous consequence is that it encourages us to attribute causality when there is none, or even in the wrong direction.

We have developed a new language for frequency domain analysis - a language that may be significantly more opaque than we have realised, when attempting to describe events at the neuronal level. We may just be re-describing things that are better seen on standard ERP, i.e. averaged event-locked time-domain traces.

People have started to realise this, and now commonly mix frequency- and time-domain analyses freely. Oscillations tend to be relatively transient -- if that's not an oxymoron! And although fourier spectra are entirely capable of representing transients, as well as the temporal evolution of envelopes, thought has shifted towards wavelet transforms. But natural physical systems generating transient wavelet components are much less common and more contrived -- consequently making interpretations from this is a recipe for disaster. This is clear enough for transients and envelopes. But I argue, we have no real reason to believe this problem should vanish for non-transient EEG data.

As a minimum, you must be clear in your mind how your signal processing pathway will represent a step or gradual change in signal. Moreover the signal processing pathway must generate a representation that matches some theory of how signals are generated at the source.

3) Power and phase should not logically be separated

Until recently, most studies examined spectral power. This entails performing a time-windowed fourier transform and discarding phase information. We think we have an intuitive grasp of what this spectral power means. But what does this really mean? Can power in different frequency bands tell us about how the brain is working?

If you take a sound and discard the phase information, you have an unrecognisable mess. This is true however you window or filter the sound. Similarly, if you take an image and discard the phase of the 2D fourier spectrum, you are left with garbage. Here's a line of MATLAB to hear de-phased sound:

  load handel; sound(y) % normal
  w=64; h=hamming(w); for t=1:length(y)-w;
    z=fft(y(t:t+w-1).*h); z2=ifft(z.*exp(rand*2*pi*t*1j));

    y2(t)=real(z2(w/2));
  end; sound(y2) % dephased


As you can hear, the sound retains some of its rhythmic characteristics, but it really bears no other resemblance to the original! You can try other frequency transforms: if you change 'rand' to '0.1', you can actually keep the phase information, but amplify the phase -- but it still sound pretty alien.

Well, it turns out that most early 'oscillation' work discarded phase. I'm not arguing that sound or pictures are a good model for what EEG activity is like. But I would like to say that it's really quite unclear what these 'band powers' mean on their own, in terms of interpreting a signal.

4) Linear assumptions give artefactual couplings

Phase information has been used to study phase-amplitude coupling in a single channel. What does that mean in terms of the signal? Aficionados would tell you that PAC occurs when the power of some higher frequency correlates with the phase of an (independent) lower frequency. For example, gamma power might be higher whenever theta is in a trough, compared to when theta is at a peak. In noise, different frequencies are independently-extractable components of the signal, so the presence of any correlation is meaningful. So what causes this?

The simplest explanation is that the two signals are not independent. There might be some saturation or nonlinearity in your recording electrode / amplifier / skull -- for example making fast oscillations appear smaller in amplitude when you are at the peak of theta, compared to at the trough. Any kind of nonlinearity could generate 'cross-frequency coupling' - because it clearly breaks the independence between the frequencies.

However we now have enough single-cell electrophysiology to convince us there might be some truth to phase-amplitude coupling: a cell is more likely to fire when the LFP is at particular phases. This 'preferred phase' might be different for different brain areas and in different tasks, but is similar for many neurones in a region. In theory, this might cause EEG gamma to increase in amplitude at certain theta phases, but artefactual causes must still be ruled out. This is especially concerning if gamma appears strongest in theta troughs.

5) Synchrony buys nothing over ERP


Studies often report cross-frequency coupling in a single channel. What does this mean? The afficionados will tell you that those independent frequency components of the signal have similar phase differences, across trials. For example, on any given trial, the phase of gamma is at a given lag compared to alpha. The consistency across trials gives you CFC.

A similar conclusion can be drawn by measuring synchrony across all frequencies. For example, "synchrony increases" at key moments during a trial. The usual meaning of synchrony, at a particular frequency, is that across trials, the phase of the Fourier decomposition at a given frequency is similar. A recent paper showed "increased gamma synchrony" around the time of a decision.

But this is obviously going to be the case! It follows directly from the ERP in fact. Something happens at a moment in time. Therefore, whatever shape the wave is, the fact that the wave has a distinct spectrum will lead to CFC.  The specific phase differences between frequencies are determined purely by the shape of the ERP surrounding the stimulus.
If the ERP shows a readiness potential, then we might expect to see 'synchrony' (phase coupling) increasing before the event - since this is required to produce an ERP that sums across trials.

6) The usual culprits

I have deliberately left out the obvious, well-known problems with oscillation analysis, such as arbitrariness in choice of filtering, incomplete eye movement and blink removal, the compromise between time resolution and windowing artifact, etc....

New uses for complex numbers


I wish I'd known MATLAB had... complex numbers!
If you do any geometry in matlab, you'll find complex numbers really handy.
  • The great advantage is, if you have 10x10 points, normally you'd need to store them either as
    • X = 10x10 matrix of X-coordinates, Y = 10x10 matrix of Y-coordinates
      Or as:
    • M = cat(3,X,Y),  which is a 10x10x2 matrix of coordinates; the first slice is the X's and the second slice is Y's.
   The first is irritating because if you want to copy the array, transform or pass it to functions etc... you always have 2 variables.
   The second is irritating because proper matrix operations don't work in 3 dimensions. Also you have to "squeeze" to extract coordinates e.g. M=SQUEEZE(M(:,:,1)). Also, if you are using many dimensions, you have to remember which dimension is the X/Y dimension. 

  • The complex solution is easy and aesthetically pleasing!
        Z=X+1j*Y     which is a 10x10 matrix of complex numbers.
  • To quickly calculate the distances of each of the points in a matrix, to another point P=x+iy,
        distance = abs(Z-P)
  • and similarly for the angles, use the argument function, angle(Z-P)    (in most other languages this function is called "arg".
  • The real beauty is if you wanted to rotate all the points in the matrix by 45 degrees around the point P=x+iy, simply do
        (Z-P)*exp(1j*pi/4)+P - which translates by -P, rotates by pi/4, and adds back P. Now imagine doing that with X and Y coordinates!
  If you are dealing with a trajectory of points, the complex numbers can be fed into standard SMOOTH or CONV to perform smoothing.
  • If you use i and j in for loops, don't worry! For complex numbers always use 1i or 1j.
  • important functions are REAL, IMAG, ABS, ANGLE
Caveat: NaN behaves in an interesting way. Although ISNAN will be true if either the X or Y coordinate is NaN, the number still retains the X and Y coordinates separately!

New uses for Matrix Multiplication


I wish I'd known MATLAB had...     matrix multiplication
  • Often overlooked, matrix multiplication can do many useful things!
  • Using multiplication instead of repmat. If you have a column vector V, and you want to repeat it horizontally 5 times,
    • REPMAT(V, 1, 5)  is the same as
              V*[1 1 1 1 1].
    • REPMAT(V,1,10) is the same as
             V*ones(1,10)
    • REPMAT(V, 1, length(V)) will make a square, and so will V*(1+0*V)
    • You can create the times table for 1 to 12 with [1:12]'*[1:12]
  • Using multiplication instead of SUM. If you have a column vector V, and you want to sum it,
    • SUM(V)  is the same as (1+0*V)'*V     note that 0*V is a handy way of creating a zero matrix of the same size as V.
    • if you want a weighted sum, where W is the weights, 
      SUM(V.*W) is the same as V'*W   or W'*V.

Binary function with singleton expansion

BSXFUN( function,  A, B )

This great function really helps when you want to do an "outer product". That is, if you need to combine each item in list A with each item in list B. It generalises to any number of dimensions, too!

  • "Binary function with singleton expansion" is a fancy way of saying automatic dimension replication.
  •  A binary function is simply any function that takes two parameters. For example "plus(A,B)", which is the function that does A+B
  • Often repmat can be replaced  with BSXFUN. I'm not sure whether this improves readability or not, but it works well in some situations.
    BSXFUN(@plus, [1,2,3],[5;6])
        = [ 6 7  8
            7 8 9 ]

    BSXFUN(@gt, [1 2 3 4 5 6 7 8 9],[3; 6]) =
              = [0 0 0 1 1 1 1 1 1
                 0 0 0 0 0 0 1 1 1 ]

    note the following useful binary operations:
  • gt(a,b) = a>b (greater than)
  • lt(a,b) = a<b ( less than)
  • ge, le (greater-or-equal, less-or-equal)
  • plus, times, rdivide
* note that you can usually achieve the same thing with repmat, The disadvantage of REPMAT is that you have to know the sizes explicitly, or calculate them explicitly
    Y = BSXFUN(@plus, [1 2 3],[5;6])
   is the same as
    Y = plus(repmat([1 2 3], 2,1), repmat([5;6],1,3)    or
    Y = repmat([1 2 3],2,1)+repmat([1;2],1,3)  or
* as another example, if you wanted to create the times-tables for 1 to 12:
    BSXFUN(@times,[1:12]',[1:12])
* Note that singleton expansion is automatic for scalars! i.e.
    1+V  == ones(size(V)) + V
    2*V   == times(2*ones(size(V)), V)

   Python's numpy automatically expands dimensions for standard binary operations.

 Also try combining these techniques with
    - creating quick functions
    - matrix multiplication expansion

Tuesday 17 June 2014

Contra Networks

Contra networks

Olaf Sporns, "Contributions and challenges for network models in cognitive neuroscience"

Nature Neuroscience 17:652-60 (2014)

This paper might be one of the most stimulating I have read recently - mainly because I agree with very little of it! In summarry he reviews studies that have anaylsed the connections between brain areas (histological, diffusion tensor, imaging and functional connectivity) using network models. Network theory is, in my eyes, just a posh way of getting numbers that characterise a particular way of connecting nodes together. Sporns offers to take a critical look at some of the advantages and limitations of network models in cognitive neuroscience. Unsurprisingly he has a very positive take.

Comments

  1. You are eating your tail. Typical hollow and meaningless conclusions are:

    • Network models offer a theoretical framework that encompasses both local and global processes and thus resolves the long standing conflict between localised and distributed processing.
    • By addressing how connectivity mediates both segregation and integration, network approaches not only reconcile these seemingly opposing perspectives, they also sugget that their coexistence is fundamental for brain function.
    • Integration in large-scale brain networks is supported by a set of specialised brain regions that transiently orchestrate interactions between functional modules
    • Integrative processes have a distinct anatomical substrate
    • There are robust relationships between network architecture and functional specialization

    Basically, in each case, the conclusion is foreborne by the premises. If you start by delineating modules, calculating connectivity, and proximity measures, these conclusions say nothing more than "I am using this method". They explain nothing.

    Just as an example, let's take the first one - that networks might allow us to resolve the
    global-local processing problem. The argument appears to be like this:
    • Premise 1: "global network meausures capture important characteristics related to overall signaling capacity"
      (note that this is itself a rather fanciful comment for functional networks -- even if I accept the doctrine of networks! In real computer networks, signalling capacity is the rate of transmission - mainly a function of compression at encoding, and how fast a channel can change its signal) 
    • Premise 2: "most RSNs are associated with specific behavioural or cognitive domains... network communities corresponding to specific domains were found to be associated with distinct functional fingerprints" - OK
    • Conclusion: we have found numbers characterising networks at local and global scales, so we have solved the dilemma of whether computation is done at local or global scales.
     Hopefully you can see that all the arguments above are of this form:
     "I can measure new numbers meaning X - so this explains how the brain works by using X"

     In the same vein, functional networks that show changes in such global/local parameters
     say nothing more than: "global/local correlations change over time, so this solves dilemmas we had about how global/local processing coexist".

    (Incidentally, if you believe in these functional changes over time, they invalidate any conclusions you have made about structural networks!)
  2. Is it surprising that functional connectivity parallels structural connectivity? Could it be any other way?
  3. Quantitative is overrated. Yes, it allows applications of quantitative measures of network topology. But to what end? We don't yet have even a qualitative theory of how the cognition is done. Perhaps we should start working on that?
  4. Claims that
    • it allows identification of structural network elements that are specialised for carrying out integrative function.; and
    • quantitative approaches provide information on regional roles in neural processing within and across communities.
    I.e. Betweenness (Centrality, Degree) are markers of a node having integrative function. Why? couldn't we just be looking at a relay station? The thalamus might be a relay station (I don't think it is, of course) yet still be at the centre of network topology - with no integrative function whatsoever.
  5. Areas which are highly functionally connected are unlikely to be computing anything interesting. In fact, if the activity of two areas is highly correlated, it suggests that they represent information in similar ways, and thus no real computation has been performed. If the frontoparietal module correlates with visual and tactile networks, that means that it encodes the two types of information (visual and tactile) in parallel. I.e. frontoparietal activity is a linear combination of visual and tactile information, and thus no computation has been performed, only superposition.

    In fact, shouldn't computation be defined as occurring just when areas are structurally connected but no functional correlation is seen?
  6. what are the fundamental rules and principles of network science? They are either tautologous (trivially true) or not based on the kind of information processing the brain does. No conclusion is going to be relevant to the neural basis of cognition if you neglect:
    1. the direction of connections,
    2. inhibitiory vs excitatory,
    3. preservation / distortion / manipulation of representations,
    4. cortical lamination...
    Even if you add these "instatiation properties" to network theory, what does the network theory itself add?
  7. You are going to tell me that, in general, areas that are nearby each other in the brain are more likely to connect together. What is surely more interesting is the cases in which that rule is broken, i.e. where the greatest deviation from standard white-matter-distance occurs?
    In these situations, connectivity might be informative, because there must be a reason behind connections that are are not predictable from anatomical location.
  8. Do you really believe that the kinds of processes that generate human thought, understanding, belief, reasoning etc. can be even coarsely described in terms of the topology of the network? I'd say certainly not, at least with anything resembling the kind of network we're talking about today.
On the other hand, in favour of networks,
  1. MEG and connectivity might help understand short term plasticity.
  2. It is still possible that network statistics of a "node" might help understand why lesions to some brain areas cause more symptoms than others. One might argue that a node with high centrality would cause more deficits than a node with low centrality. But in drawing this conclusion, you make certain silent assumptions. In particular, you are suggesting that lesions that fail to disconnect two regions, because there are other (indirect) routes between the two regions, then this redundancy of connectivity allows "information flow" "around" the lesion. First, it is not clear that connections in a network represent information flow, even if they were directional. Functionally derived networks, in particular, notoriously connect nodes which are simply driven by a common source. Second, it seems incoherent to suppose that nodes are performing computations, if you then speak of a damaged node being replaced by a chain of nodes that effectively "conduct around" it. Third, if we bite the bullet and suggest that some connections really do pass right through some nodes, e.g. if white matter and grey matter were not fully distinguished (as is likely to be the case in the striatum), then it is entirely unsuprising that a lesion would affect distant areas - this is just a glamorisation of the age-old phenomenon of diaschisis, and needs no network statistics to explain it.
  3. ?
In conclusion:  

Somewhere, the vague and heterogeneous metaphors of electrical conduction, internet servers, ethernet hubs, logic gates, connectionist synaptic weights and telephone switchboards have become muddled and confused, to the point where teasing out meaningful conclusions seems futile.

Tuesday 10 June 2014

Discarding function outputs

I wish I'd known MATLAB had... [~,Y]=F()

  • If you don't want the first return value from a function, but you want the second one, you can discard the first one with ~.
  • The main advantage is it doesn't clutter your workspace with dummy variables like temp, z, or whatever you use for unwanted variables.
  • Common examples might include:
    [~,E]=eig(X)      if you want the eigenvectors but not the eigenvalues
    [~,~,kcode] = KbCheck
            
    in psychtoolbox to get just the key codes of currently depressed keys

Unpacking colons to indices


I wish I'd known MATLAB had... X(':')

  • The colon character can be used as an index, to mean a whole row/column/slice etc. But you can use the quoted character too!
  • Why would this ever be useful?  When using cell-expansion indexing.

Example:

Let us say that you have a line that extracts a single vector from a matrix, but sometimes you need the first row, and other times you need the first column.
i.e.. sometimes you want Y=X(1,:), and other times you want Y=X(:,1).
  •   Can this be done without an IF statement?
  •   Yes:
         indices = {':',1};
         indices = fliplr(indices);
         Y=X(indices{:})
  • This works because indices{:} expands to a comma-separated list, in this case 1 and ':'.
  • Expansion of cell arrays to these lists can be used as input to functions, or indices of an array.
  • This is also how varargin{:} works.  

Signed angles

I wish I'd known MATLAB had... ATAN2

  • ATAN2 is the arctangent, but takes X and Y separately. It's not quite the same as ATAN(X/Y).
  • The problem with normal ATAN is that, because you are calculating X/Y, the function can't distinguish beween when both X and Y are positive, and when both X and Y are negative! So when an angle is returned, its range is only pi, not 2*pi. ATAN2 solves this problem.
  • ATAN2(X,Y) gives the angle in radians of a point (x,y) from the origin. The angle is measured from the horizontal line going rightwards, i.e. the positive x-axis. Positive angles go anticlockwise, and negative angles go clockwise from this line, pi and -pi both indicate points on the negative x-axis going leftwards, i.e. x is negative and y is 0.
  • ATAN2 is great because it is signed (+/-)! so you don't lose information.
  • Note that you can also do this easily with complex numbers:
   ATAN2(X,Y) is the same as   angle(X+1j*Y)

Thursday 5 June 2014

MATLAB Unwanted text output

How to find a line that is producing unwanted text output in the console?


  • If you have Matlab R12, load the .m file in the editor, and look in the right-hand margin.
  • There should be orange markers for parser warnings. 
  • If you hover over items, any statements that produce unspecified text output will be marked as "Terminate statement with semicolon to suppress output", and a "Fix" button!


Getting the sort ordering


I wish I'd known MATLAB had... [Y,I] = SORT(X)

  • Sort has two outputs! The second output is the indices of the elements, i.e. the re-ordering vector
  • Y is the same as X(I)
  • Very handy if you want to sort a matrix by one column [~,I] = SORT(M(:,1)); % sort M by its first column Y = M(I,:);

Pairwise differences

I wish I'd known MATLAB had... DIFF

  • DIFF calculates the sequential differences between numbers in an array. 
  • It can work on any dimension along the array: Y = DIFF(X,2) calculates differences along the rows.
  • For vectors, you can do a diff in different ways - e.g.
        Y = DIFF(X)   is similar to
        Y = [X nan]-[nan X] (except it has an 2 extra columns)
  • Warning - DIFF(X) always has one fewer rows (or columns etc. depeding on dimension) than X! I often pad the diff with an extra row:
       Y = [ nan(1,size(X,2))
             DIFF(X)            ]
  • Also note that there is a function GRADIENT that does something similar. In particular, the gradient at a point is calculated using the DIFF of the points on either side of it. So,
        Y=GRADIENT(X)        if X is a vector, is similar to
        Y=MEAN([ [nan;GRADIENT(X)], [GRADIENT(X),nan] ])
    except at the first and last points (thanks Sean Fallon for this info!).