Synthetic Text Data Quality Report

Model
Evaluate Model
Model UID 66c4b66716e2a054e09083d1
Project
Generated 08/20/2024, 15:30
Good
Synthetic Text Data Quality Score
The Synthetic Text Data Quality Score is computed by taking a weighted combination of the individual quality metrics: Text Semantic Similarity and Text Structure Similarity. The report supports 50+ languages, including: English, French, German, Dutch, Italian, Portuguese, Spanish, Russian, Polish, Arabic, Turkish, Chinese, Japanese, Thai and Korean.

Data Summary Statistics

Moderate
Text Semantic Similarity
Excellent
Text Structure Similarity
Training Data Synthetic Data
Row Count 5000 5000
Column Count 1 1
Training Lines Duplicated - 16
Missing Values 0 0
Unique Values 5000 4963
Average Words Per Sentence 9.45 9.23
Average Characters Per Word 4.73 4.93
Average Sentence Count 1.01 1.03

Semantic Similarity Principal Component Analysis 

Text Structure Similarity