Network Information Security Risks in English Translation Based on Corpus
Main Article Content
Abstract
At present, the lack of domain-adapted corpora makes dynamic corpus processing difficult, the risk of privacy leakage and the "black box" of tools are prominent, and the research on information security and linguistics is insufficient. This paper combines corpus technology with research on network information security risks to fill the gap in the intersection of translation research and network security. Data is collected through the Scrapy framework, text classification is performed using the BERT (Bidirectional Encoder Representations from Transformers) model, MongoDB and Elasticsearch are used to achieve efficient data storage and retrieval, and a model combining dynamic corpus and network security technology is constructed. On this basis, sensitive information identification, encryption and shielding technologies are introduced to ensure the protection of sensitive information in the translation process and improve the security and interpretability of translation tools. The research results show that the integrated model performs better than traditional methods in multi-domain text environments, with classification accuracy reaching 97% in scientific literature and 98% in the medical and health field. The regular expression method has poor classification effect, with an accuracy of only 80% in literary works. Regarding translation quality evaluation, the original data BLEU score is as high as 0.95 and the leakage rate is as high as 15%. After introducing the character replacement method, the BLEU score dropped to 0.85 and the data leakage rate dropped to 5%. Using AES encryption, data security is improved, the leakage rate is reduced to 0.5%, and encryption time and system load are significantly increased. The combined model of dynamic corpus and network security technology provides a new solution for sensitive information processing of translation tools, significantly improving its security and reliability. This method has broad application potential in high-security fields such as scientific research and medical translation. It opens up a new direction for the deep integration of linguistics and network security.