Key Objectives of this Role:
* Learn and implement data generation strategies for NLP applications (e.g., dialogue systems, text summarization).
* Develop tailored datasets for mainstream architectures (BERT, GPT variants) under guided training.
* Design quality evaluation frameworks** to optimize synthetic data effectiveness.
* Collaborate with cross-functional teams** to explore synergies between data generation and model training.
* Stay ahead of the curve through internal workshops on cutting-edge techniques like LoRA training and data augmentation.
Qualification, Experience and Skills:
* Graduates in Computer Science, AI, Linguistics, or related fields
* Proficient in Python with strong grasp of data structures/algorithms
* Foundational NLP knowledge (e.g., word embeddings, attention mechanisms)
* Fast learner with passion for solving technical challenges
* Excellent teamwork and communication skills