Data Curation Intern
Quick answer
Karya is hiring a Data Curation Intern to assist in building high-quality, multilingual AI datasets through data engineering and linguistic analysis.
- Role
- Data Curation Intern
- Organization
- Karya
- Level
- Entry
- Category
- Engineering & Technology
The role
Karya is seeking a Data Curation Intern to help build high-quality datasets for AI/ML models with a focus on Indian languages. This role involves auditing, cleaning, and structuring large open-source datasets to ensure they meet the rigorous requirements of modern machine learning pipelines. Interns will progress from text data curation to speech and voice model data preparation while working on real-world projects. This position offers a unique opportunity to gain hands-on experience in computational linguistics and data engineering.
What you'll do
- Audit and profile open-source datasets to assess quality and noise levels.
- Implement data cleaning pipelines including deduplication and noise removal.
- Apply metadata tagging schemas to categorize text data.
- Develop validation checklists and quality scorecards for dataset readiness.
- Prepare text passages and metadata standards for speech and voice model training.
What it takes
- Strong attention to detail.
- Proficiency in Python for data processing (pandas, regex, spaCy, NLTK).
- Familiarity with text data formats such as CSV, JSONL, and Parquet.
- Ability to work independently and document processes clearly.
- Curiosity about AI/ML or computational linguistics.
What you'll bring
Frequently asked questions
What is the focus of this internship role?
This role focuses on data curation for AI/ML models, specifically cleaning and preparing multilingual Indian language datasets for both text and voice training.
What technical skills are required for this position?
Applicants should be comfortable using Python for data processing, including libraries like pandas, regex, and NLP tools such as spaCy or NLTK.
Is compensation provided for this internship?
The provided job description does not explicitly specify the compensation for this role.
How to apply
Apply directly on Karya's site. We link straight through — no resume parsing, no profile to fill out.
This listing is aggregated from a third-party source and its summary may be auto-generated, so details can be inaccurate or out of date. ForGood is not the employer and is not liable for the content — please verify everything on Karya's official posting before applying.