π§Ή clean#
Clean Eye-Tracking Data#
The clean()
function helps you clean eye-tracking data by:
Removing rows where the βtrialβ column contains NaN values
Removing columns that are entirely empty (all NaN) or filled with zeros
This is particularly useful for preparing eye-tracking data for analysis by removing data points that might not be useful or could potentially affect the analysis results.
Parameters#
samples
(str or DataFrame): Path to samples CSV file or pandas DataFrame with samples dataevents
(str or DataFrame): Path to events CSV file or pandas DataFrame with events dataverbose
(bool): Print cleaning statistics (default: True)copy
(bool): If True, saves to new files instead of overwriting originals when path is provided (default: False)
Returns#
When file paths are provided and
copy=False
: (samples_path, events_path) - Paths to the cleaned filesWhen file paths are provided and
copy=True
: (new_samples_path, new_events_path) - Paths to the new cleaned filesWhen DataFrames are provided: (samples_df, events_df) - Cleaned DataFrames
import etformat as et
import pandas as pd
# Load sample data
samples = pd.read_csv(r"D:\Github_web_page_website\test_samples.csv")
events = pd.read_csv(r"D:\Github_web_page_website\test_events.csv")
# Clean the data using DataFrames
cleaned_samples, cleaned_events = et.clean(samples, events)
π etformat 1.1.1 - For Documentation, visit: https://ahsankhodami.github.io/etformat/intro.html
π§Ή Starting data cleaning for eye-tracking data
π Samples: DataFrame with shape (1520309, 56)
π Events: DataFrame with shape (18261, 38)
============================================================
π Processing SAMPLES data...
Original samples shape: (1520309, 56)
Columns: 56
β Removed 2522 rows with NaN trials
β Removed 15 columns:
β’ time_rel (all_nan)
β’ pxL (all_nan)
β’ pyL (all_nan)
β’ hxL (all_nan)
β’ hyL (all_nan)
β’ paL (all_nan)
β’ gxL (all_nan)
β’ gyL (all_nan)
β’ hdata1 (all_zero)
β’ hdata6 (all_zero)
β’ hdata7 (all_zero)
β’ input (all_zero)
β’ buttons (all_zero)
β’ htype (all_nan)
β’ errors (all_zero)
β
Samples cleaned: (1520309, 56) β (1517787, 41)
π Processing EVENTS data...
Original events shape: (18261, 38)
Columns: 38
β Removed 83 rows with NaN trials
β Removed 7 columns:
β’ time (all_zero)
β’ sttime_rel (all_nan)
β’ entime_rel (all_nan)
β’ status (all_zero)
β’ flags (all_zero)
β’ input (all_zero)
β’ buttons (all_zero)
β
Events cleaned: (18261, 38) β (18178, 31)
============================================================
π Cleaning completed!