Academic Works
Bachelor’s Thesis
Detection of Machine Translation Through Distributional Text Properties
University of Alicante · 2024 · [PDF available upon request]
Abstract:
This work develops automated methods to distinguish between human and machine-translated texts by analyzing linguistic patterns. The key components include:
- Analysis of Beam Search algorithm limitations in machine translation
- Feature engineering using lexical diversity metrics and TF-IDF statistics
- Comparative evaluation of logistic regression vs. tree-based models (CART, Random Forest)
- Cross-domain testing with parallel corpora of varying lengths and topics
Key Finding: Logistic regression using TF-IDF features achieved superior performance in detecting machine-translated content, particularly when trained on diverse text lengths and domains.