next up previous
Next: Bibliography

A simple system for detection of EEG artifacts
in polysomnographic recordings

P. J. Durka1, H. Klekowicz, K. J. Blinowska, W. Szelenberger, Sz. Niemcewicz


download compressed PostScript

Abstract
We present an efficient parametric system for the automatic detection of EEG artifacts in polysomnographic recordings. For each of the selected types of artifacts, a relevant parameter was calculated for a given epoch. If any of these parameters exceeded a threshold, the epoch was marked as an artifact. Performance of the system, evaluated on 18 overnight polysomnographic recordings, revealed concordance with decisions of human experts close to the inter-expert agreement and the repeatability of expert's decisions, assessed via a double-blind test. Complete software (Matlab source code) for the presented system is freely available from the Internet at http://brain.fuw.edu.pl/artifacts.

Experimental data
Each of the 20 polysomnographic recordings of overnight sleep of healthy volunteers contained 21 EEG channels from the 10-20 system plus A1/A2 references, two EOG channels, breathing, EMG and ECG. EEG was sampled at 128 Hz and visually examined for the presence of artifacts in fixed 4-sec epochs. The data used in this study was free of ECG artifacts, hence this type was not taken into account.

For the purpose of evaluation of inter-expert concordance, in three of these recordings artifacts were independently marked by two experts. One of the experts marked artifacts in the same recording after three weeks, not knowing that the filename was changed to satisfy the conditions of a double-blind test. Additionally, in two all-night recordings, different types of artifacts were marked separately, in time intervals related to their actual occurrence, not constrained by fixed epoch length. Only these two recordings were used for optimization of the default values of the thresholds.

ROC curves
In a single EEG epoch, the system can detect an artifact (positive detection) or not (negative detection). Depending on the ``true'' status of this epoch, as indicated by the expert's decision, these detections can be true or false, leading to four possibilities: (1) true positive (TP), when an artifact was detected in an epoch marked as an artifact also by the expert, (2) false positive (FP), when an artifact was detected in an epoch marked by the expert as non-artifact EEG, (3) true negative (TN), when no artifact was detected in an epoch marked as EEG, and (4) false negative (FN)--no detection in an epoch marked by the expert as artifact.

We can introduce strict definitions for true positive proportion: $ \mathrm{TP}/(\mathrm{TP}+\mathrm{FN})$ and false positive proportion: $ \mathrm{FP}/(\mathrm{FP}+\mathrm{TN})$. These two values are on the axes of the ROC (Receiver Operating Characteristics) curves [1].

Types of artifacts and their parametrization

Eye movements
are manifested as negative correlation of the two EOG derivations. We used also F7-F8 and T3-T4 correlations, which improves the system's robustness for the case when the EOG is disturbed or unavailable. The lowest of these three correlations was compared to the threshold, adjustable between $ -1$ and $ 0.1$ (default $ -0.45$). If any of these values was lower than the threshold, ocular artifact was detected.

Eyeblinks
Correlations between bipolar derivations Fp1-F3 and F3-C3 (for the left hemisphere) and between Fp2-F4 and F4-C4 (for the right hemisphere) were calculated in 1.5-sec epochs, with overlapping windows moved by 1 second. The largest of these correlations calculated for epochs falling within the 4-sec segment in question was taken as parameter representing eyeblinks. Threshold was set between $ 0.75$ and $ 1$ (default $ 0.875$). If any of these correlations exceeded the threshold, an eyeblink artifact was detected.

Power supply (AC 50/60 Hz)
Spectral power from 48 to 52 Hz (mains frequency in Europe is 50 Hz), calculated in 1-sec epochs, was divided by the total power of the corresponding epoch. If this parameter exceeded the threshold for any of the 1-sec epochs, the whole 4-sec epoch was marked as an artifact. Range for the threshold of this parameter was set between $ 0.005$ and $ 0.7$, with default value $ 0.325$.

The above parameters are relatively insensitive to the calibration and other conditions possibly changing between recordings. Reported above values of thresholds were optimized on the two overnight recordings, where different types of artifacts were marked separately. Ranges and default values of these thresholds were unchanged for all the analyzed recordings.

Other parameters (presented below) are directly related to the signal's energy distribution in the frequency or time domain, which for EEG can obviously vary between subjects and recordings. Therefore the actual values of the thresholds for artifact detection were set relative to the statistical properties of signals in each questioned recording. To estimate these statistics, values of given parameter are calculated for all the EEG epochs in each analyzed recording. All the thresholds used for detection of the following artifacts were related to these statistics.

Breathing (low frequency)
These artifacts were described by a relative power in low frequencies. For a proper estimation of spectral power in low frequencies, each 4-sec segment was extended by 2 seconds before and after, and 8-sec rectangular window was used. Upper limit of the frequency integral was optimized for best detectability of breathing artifacts marked separately in two overnight recordings. The optimal value of 0.625 Hz resulted in the following equation:

$\displaystyle \dfrac{\displaystyle\int_{0}^{0.625Hz} \hat{s}(f) df} {\displaystyle\int_0^{f_N} \hat{s}(f) df - \int_{48Hz}^{52Hz} \hat{s}(f) df} ,$ (1)

where $ f_N$ stands for the half of the sampling frequency, and spectral power around the AC power frequency is substracted from the normalization in denominator.

Distribution of this parameter was estimated separately for each derivation: it was calculated for all the epochs of a given recording, and the median $ m_{0.5}$ of its distribution in each channel was used to set the default threshold (different for each channel) as $ 0.75+m_{0.5}/4$. Allowed range of this parameter was $ (0.5(1+m_{0.5}), 1)$.

Estimation of statistical distributions of the remaining parameters (reflecting abrupt slopes, electrode-pop and muscle artifacts) was slightly more complicated. Logarithmic transformation of the values provided distributions close to Gaussian. Based upon the assumption of Gaussian distribution, the variance was calculated only from the data between the first and third quartiles, i.e. neglecting the 25% of tails from each side of the distribution. Such a procedure estimates distribution for the EEG epochs not contaminated by the given type of artifact, if we assume that the parameters related to artifacts fall into the region of outliers.

Muscle
EEG was divided into disjoint 1-sec. epochs. For each of the EEG channels $ s(t)$, spectral power from 40 Hz to the Nyquist frequency $ f_N$ was normalized to the total power with exclusion of the power around the AC frequency (50 Hz in Europe), i.e.

$\displaystyle \dfrac{\displaystyle\int_{40Hz}^{f_N} \hat{s}(f) df - \int_{48Hz}...
... {\displaystyle\int_0^{f_N} \hat{s}(f) df - \int_{48Hz}^{52Hz} \hat{s}(f) df} ,$ (2)

where $ \hat{s}(f)$ -- Fourier transform of $ s(t)$ multiplied by a cosine window. The exact value of 40 Hz as the lower frequency of muscle-related power was found by optimization of the detector's performance measured on the two recordings with muscle artifacts marked separately. Allowed range for the threshold of this parameter was $ (m_{0.5}+1.5\sigma, m_{0.5}+6\sigma)$ with default in the middle of this range. $ m_{0.5}$ and $ \sigma$ denote median and variance estimated separately for each channel. If the value of this parameter exceeded the threshold for any of the 1-sec epochs, the whole 4-sec epoch was marked as an artifact.

Abrupt slopes (electrode-pop)
Maximum change of potential in 31 milliseconds (5 points); thresholds calculated similarly as for muscle artifacts.

Outlier values
An artifact (e.g. saturation or spikes of external origin) is detected if the maximum of absolute value of potential in given epoch exceeds the threshold, set between $ m_{0.5}+2\sigma$ and $ m_{0.5}+10\sigma$ (default $ m_{0.5}+6\sigma$).

Values of all these parameters, reflecting the presence of different types of artifacts, must be combined into the final decision regarding the analyzed epoch: artifact or ``good'' EEG. We chose a simple logical alternative, i.e. if any of the parameters exceeded the corresponding threshold, the epoch was marked as an artifact.

Setting of the mentioned threshold values draws a border between EEG and artifacts. Thresholds can be adapted/corrected for the changing environments by the encephalographers who use the system--this procedure was implemented in a graphical user interface. Each of the parameters can be changed separately from the others, so the relative sensitivity of the system to different artifacts can be controlled.

Results

Figure 1: Detection of EEG artifacts in the ROC plane. Upper panel: inter-expert concordance (triangles mark repeatability of the same expert's decision after 3 weeks in a double-blind test). Lower panel: concordance of the proposed system with expert's decisions for different settings of parameters (defaults circled) and for the 18 overnight recordings used for evaluation.
\includegraphics[width=\columnwidth,height=\columnwidth]{figures/fig1a.eps} \includegraphics[width=\columnwidth,height=\columnwidth]{figures/fig1b.eps}
Figure 1 presents, in the ROC plane, the performance of the proposed automatic detection (relative to experts decisions), evaluated on the 18 overnight polysomnographic recordings not used in the process of designing or tuning the system. For each of these recordings, points on the ROC curve were obtained by uniform changing all the discussed thresholds, i.e. values of thresholds for detection of all the types of artifacts were changed together, in small steps within the allowed ranges. Therefore we may say that this picture relates to an unsupervised, ``quick'' application to new data, however, coming from the same recording environment as the datasets used in the construction of the system. Points describing the inter-expert concordance and repeatability are marked on the upper panel of Figure 1.

To quantify the coherency of visual detection and its concordance with the automatic procedure, we approximated the expert-expert and expert-system ROC by curves fitted numerically to the available points: their shape reflected ROC in the idealized case of classification of items from two overlaping Gaussian distributions. This allowed for calculation of the AUC (area under the curve) parameter proposed in [1]. Resulting values were 0.954 for expert-expert and 0.915 for average of expert-system concordance. In case of a particular priority like e.g. obtaining very ``clear'' EEG at a cost of tagging some more of the ``good'' EEG as artifacts, it is easy to set the thresholds in a way to move the system to another point on the ROC curve, where the proportion of true-positive detections reaches almost arbitrary value.

Discussion
The importance of the problem of EEG artifacts is generally acknowledged (for a review see e.g. [2]). The proposed system operates in the space of simple parameters and gives promising, repeatable results. Easy adjustment of settings depending on priority (very strict rejection of artifacts or availability of more EEG data) allows us to get almost arbitrary small ratios of either true-positive or false-positive detections. Artifacts occurring in some electrodes can be ignored by restricting the set of derivations taken into account. This feature can be useful when only a subset of EEG derivations will be used in further analysis. After setting these parameters (thresholds and rejected derivations) the procedure is fully repeatable and insensitive to any arbitrary factor, like e.g. context of current sleep stage (limited repeatability of sleep staging is discussed in [3]).

In its current form, the presented system can be used for pre-screening large and uniform datasets. However, the proposed approach has some limitations:

  1. Marking epochs of fixed length, instead of the actual temporal extent of an artifact. However, it dramatically simplifies comparisons and is usually convenient for choosing epochs for further automatic analysis.
  2. Limited use of spatial context improves robustness of the system in case of complete failure of some electrodes.

These simplifications were driven by the main goal of portability to different environments (montages and data formats). This goal is crucial, since the system is intended for testing in different laboratories on different types of recordings, including e.g. patients with epileptic activities and receiving CNS-activating medications. This communication is the first step towards a generally applicable and robust system, as well as an invitation for those interested in improving the software (freely available at http://brain.fuw.edu.pl/artifacts) or in cooperation in its clinical evaluation.

Acknowledgements
This work was supported by a grant of Committee for Scientific Research (Poland) to the Institue of Experimental Physics, Warsaw University. The EEG was recorded on equipment donated by the AJUS&KAJUS Foundation, in memory of Prof. Andrzej Jus, M.D., the pioneer of Polish Clinical EEG and the first person to introduce polygraphic studies of sleep in Poland.




next up previous
Next: Bibliography
Piotr J. Durka 2002-11-19