FIND INTERNSHIPS

Junior Data Analyst

Posted on March 2, 2025 by MEDIA METER

  • Full Time

Junior Data Analyst

As a Data Analyst, you will play a crucial role in the data preprocessing phase of our project to fine-tune the Whisper model for Taglish and other languages. Your responsibilities will include collecting, organizing, cleaning, and preparing high-quality multilingual data for model training. You will work closely with the machine learning team to ensure that the data meets the necessary standards for effective model training.

Key Responsibilities:

Data Collection and Organization:

- Gather raw audio files in various formats (e.g., MP3, WAV, FLAC) from diverse sources such as interviews, podcasts, and YouTube videos.

- Organize files into a structured directory hierarchy, ensuring a clear and consistent file naming convention.

Audio Preprocessing:

- Convert audio files to the required format (16kHz mono, 16-bit signed integer WAV) using tools like FFmpeg.

- Transcribe audio files, either manually or through a transcription service, and store text files with corresponding filenames.

Data Cleaning and Normalization:

- Clean and normalize text data to address spelling variations, punctuation issues, and formatting inconsistencies.

- Standardize abbreviations and contractions, and remove special characters or unnecessary symbols.

Data Segmentation and Labeling:

- Split lengthy audio recordings into smaller, manageable segments.

- Create and maintain a metadata file that maps audio files to their corresponding transcriptions and alignment details.

Quality Assurance and Validation:

- Conduct thorough quality checks to validate the dataset for accuracy, consistency, and completeness.

- Identify and resolve issues in the audio and text data, such as misalignments or incorrect transcriptions.

Data Analysis and Reporting:

- Use data analysis techniques to evaluate dataset health and completeness.

- Provide regular reports on data collection progress, challenges, and recommendations for improvements.

Collaboration and Communication:

- Work closely with the machine learning team to address any data-related issues.

- Provide regular updates on data collection and preprocessing progress.

Qualifications:

● Strong Proficiency in Python: Experience with data manipulation, cleaning, and preprocessing using Python libraries such as Pandas, NumPy, and TensorFlow.

● Data Cleaning and Preprocessing: Proven ability to clean, organize, and preprocess data for machine learning applications.

● NLP Knowledge: Familiarity with natural language processing techniques, including text normalization and handling multilingual or code-mixed data.

● SQL Skills: Experience with SQL for data querying and management.

● Problem-Solving Skills: Ability to identify and solve complex data-related problems with creativity and efficiency.

● Work Under Pressure: Capable of handling multiple tasks simultaneously and meeting deadlines in a fast-paced environment.

● Adaptability: Willingness to learn new tools and techniques as needed for the project.

● Attention to Detail: Meticulous attention to detail to ensure data accuracy and integrity.

● Communication Skills: Excellent communication skills to collaborate effectively with cross-functional teams.

Desired Skills:

● Familiarity with audio processing tools like FFmpeg.

● Familiarity with transcription tools and alignment software (e.g., Aeneas, Gentle).

● Knowledge of Taglish language nuances and variations.

● Experience with version control systems like Git.

● Familiarity with code-mixing or multilingual NLP techniques.

Job Types: Full-time, Permanent

Pay: Php23,000.00 - Php33,000.00 per month

Benefits:

  • Health insurance

Schedule:

  • 8 hour shift

Supplemental Pay:

  • 13th month salary

Education:

  • Bachelor's (Preferred)

Experience:

  • Web Development: 1 year (Preferred)

Advertised until:
April 1, 2025


Are you Qualified for this Role?


Click Here to Tailor Your Resume to Match this Job


Share with Friends!

Similar Internships


Junior Data Analyst

Job Title: Data Analyst – Corporate BankingRoleOverviewWe are seeking a highly analytical and…

ESG Junior Data Analyst (Mandarin Speaker)

Let’s be #BrilliantTogether OVERVIEW ISS ESG offers expertise across a full range of environm…

Junior Data Analyst – Stockholm (part-time)

Are you a meticulous and analytical student with a sharp eye for numbers and structure? Do you want…

Junior Data Analyst - Finance Analytics

Ready for a Challenge? Then Just Eat Takeaway.com might be the place for you. We’re a leading…

Marketing Analytics, Junior Data Analyst (2025 Fresh Graduates)

Department Marketing LevelEntry Level LocationSingapore Our Marketing teams conceptualise and imple…

Junior Data Analyst

Location: Amsterdam, North Holland, Netherlands | Agency: Kinesso - Netherlands Ref#: 10243 | Type …