🧹 clean

🧹 clean#

Clean Eye-Tracking Data#

The clean() function helps you clean eye-tracking data by:

  1. Removing rows where the β€˜trial’ column contains NaN values

  2. Removing columns that are entirely empty (all NaN) or filled with zeros

This is particularly useful for preparing eye-tracking data for analysis by removing data points that might not be useful or could potentially affect the analysis results.

Parameters#

  • samples (str or DataFrame): Path to samples CSV file or pandas DataFrame with samples data

  • events (str or DataFrame): Path to events CSV file or pandas DataFrame with events data

  • verbose (bool): Print cleaning statistics (default: True)

  • copy (bool): If True, saves to new files instead of overwriting originals when path is provided (default: False)

Returns#

  • When file paths are provided and copy=False: (samples_path, events_path) - Paths to the cleaned files

  • When file paths are provided and copy=True: (new_samples_path, new_events_path) - Paths to the new cleaned files

  • When DataFrames are provided: (samples_df, events_df) - Cleaned DataFrames

import etformat as et
import pandas as pd

# Load sample data
samples = pd.read_csv(r"D:\Github_web_page_website\test_samples.csv")
events = pd.read_csv(r"D:\Github_web_page_website\test_events.csv")

# Clean the data using DataFrames
cleaned_samples, cleaned_events = et.clean(samples, events)
πŸ“– etformat 1.1.1 - For Documentation, visit: https://ahsankhodami.github.io/etformat/intro.html
🧹 Starting data cleaning for eye-tracking data
   πŸ“Š Samples: DataFrame with shape (1520309, 56)
   πŸ“Š Events: DataFrame with shape (18261, 38)
============================================================

πŸ” Processing SAMPLES data...
   Original samples shape: (1520309, 56)
   Columns: 56
   ❌ Removed 2522 rows with NaN trials
   ❌ Removed 15 columns:
      β€’ time_rel (all_nan)
      β€’ pxL (all_nan)
      β€’ pyL (all_nan)
      β€’ hxL (all_nan)
      β€’ hyL (all_nan)
      β€’ paL (all_nan)
      β€’ gxL (all_nan)
      β€’ gyL (all_nan)
      β€’ hdata1 (all_zero)
      β€’ hdata6 (all_zero)
      β€’ hdata7 (all_zero)
      β€’ input (all_zero)
      β€’ buttons (all_zero)
      β€’ htype (all_nan)
      β€’ errors (all_zero)
   βœ… Samples cleaned: (1520309, 56) β†’ (1517787, 41)

πŸ” Processing EVENTS data...
   Original events shape: (18261, 38)
   Columns: 38
   ❌ Removed 83 rows with NaN trials
   ❌ Removed 7 columns:
      β€’ time (all_zero)
      β€’ sttime_rel (all_nan)
      β€’ entime_rel (all_nan)
      β€’ status (all_zero)
      β€’ flags (all_zero)
      β€’ input (all_zero)
      β€’ buttons (all_zero)
   βœ… Events cleaned: (18261, 38) β†’ (18178, 31)
============================================================
πŸŽ‰ Cleaning completed!