Academic Works

Bachelor’s Thesis

Detection of Machine Translation Through Distributional Text Properties

University of Alicante · 2024 · [PDF available upon request]

Abstract:
This work develops automated methods to distinguish between human and machine-translated texts by analyzing linguistic patterns. The key components include:

  • Analysis of Beam Search algorithm limitations in machine translation
  • Feature engineering using lexical diversity metrics and TF-IDF statistics
  • Comparative evaluation of logistic regression vs. tree-based models (CART, Random Forest)
  • Cross-domain testing with parallel corpora of varying lengths and topics

Key Finding: Logistic regression using TF-IDF features achieved superior performance in detecting machine-translated content, particularly when trained on diverse text lengths and domains.

Back to top