================================================================================ DESCRIPTIVE STATISTICS AND SUMMARY ANALYSIS ================================================================================ Dataset Shape: 557 rows × 19 columns DATA QUALITY SUMMARY -------------------------------------------------------------------------------- Total columns: 19 Columns with >50% missing data: 11 Columns with >90% missing data: 9 Columns with high missing data (>50%): Unnamed__4, Unnamed__5, Unnamed__6, Unnamed__7, Number_of_animals_arrived_, Unnamed__9, Unnamed__10, Unnamed__11, Unnamed__12, Unnamed__13, Unnamed__18 NUMERIC VARIABLES ANALYSIS -------------------------------------------------------------------------------- Number of numeric columns analyzed: 5 P24003_CAGE_numeric: Mean: 2368.17 Median: 3607.00 Std Dev: 1500.78 Range: [326.33, 3881.00] Missing: 8 (1.4%) Unnamed__4_numeric: Mean: 328.22 Median: 329.00 Std Dev: 31.79 Range: [248.00, 419.00] Missing: 317 (56.9%) P24003_CAGE_1_numeric: Mean: 2391.07 Median: 3610.00 Std Dev: 1493.06 Range: [382.30, 3881.00] Missing: 14 (2.5%) Unnamed__18_numeric: Mean: 328.22 Median: 329.00 Std Dev: 31.79 Range: [248.00, 419.00] Missing: 317 (56.9%) CATEGORICAL VARIABLES ANALYSIS -------------------------------------------------------------------------------- Number of categorical columns analyzed: 6 Trial_code: Unique values: 14 Most frequent: 'P24015-CAGE' (300 occurrences) Missing: 5 (0.9%) Unnamed__2: Unique values: 21 Most frequent: 'T02' (60 occurrences) Missing: 6 (1.1%) Unnamed__3: Unique values: 61 Most frequent: 'T08-4' (10 occurrences) Missing: 16 (2.9%) Trial_code_1: Unique values: 14 Most frequent: 'P24015-CAGE' (300 occurrences) Missing: 5 (0.9%) Unnamed__16: Unique values: 15 Most frequent: 'T02' (60 occurrences) Missing: 12 (2.2%) OVERALL DATASET SUMMARY -------------------------------------------------------------------------------- Total observations: 557 Total variables: 23 Numeric variables: 5 Categorical variables: 18 Data completeness: 0.0% complete rows Total missing values: 6336 (49.46%) KEY FINDINGS -------------------------------------------------------------------------------- 1. High Missing Data Alert: 9 columns have >90% missing data. These columns may not be useful for analysis. 2. Most Complete Columns: 8 columns have <10% missing data. These are the most reliable for analysis. 3. Data Type Distribution: Object/String columns: 18 Numeric columns: 5 RECOMMENDATIONS -------------------------------------------------------------------------------- 1. Consider removing columns with >90% missing data 2. Investigate the cause of missing data patterns 3. Focus analysis on columns with <20% missing data 4. Consider data imputation strategies for important variables 5. Verify data types and convert where appropriate ================================================================================ END OF ANALYSIS ================================================================================