[Job Opening] AI Researcher – Speech & Video Signal Processing

21 May 2025

AI Job Openings

Location: Paris / Partial remote work possible
Contract type: Full-time permanent position (CDI)
Start date: As soon as possible

We are looking for an experienced AI researcher in medical speech analysis to join our team in Paris and help advance life-saving voice-based technologies.

About e-sensia

At e-sensia, we develop cutting-edge medical assistance technology based on voice analysis.
Our mission: save lives by enabling faster and more accurate detection of medical emergencies using artificial intelligence.

Our solution is already used by major emergency response organizations to analyze real-time calls.
It is powered by a unique speech-processing pipeline, enriched with millions of real-world emergency calls and associated medical data.

Responsibilities

As an AI researcher in medical speech analysis, you will play a key role in advancing the scientific capabilities of our product by designing and developing innovative models, including:

Fine-tuning of large audio language models (Whisper, Wav2Vec, EnCodec, etc.)
Raw speech or spectrogram processing, optionally combined with video for multimodal approaches
Automatic detection of critical events (breathing irregularities, distress, emotional states, silence)
Recognition of subtle vocal patterns associated with medical conditions
Close collaboration with clinicians to validate AI models in real-world emergency scenarios
Contributing to the scientific strategy of the company
Participation in scientific publications, patent filings, or collaborative European research projects

Profile

PhD or equivalent experience in AI, signal processing, or deep learning for audio/video
Proven ability to initiate and lead research projects
Strong expertise with deep learning frameworks, including:
– PyTorch for fine-grained modeling and experimental workflows
– Hugging Face Transformers for rapid deployment and adaptation of pretrained models (Wav2Vec2, Whisper, Hubert, etc.)
– TensorFlow/Keras or JAX/Flax as complementary tools for prototyping and benchmarking
In-depth understanding of modern neural architectures for audio and video, including:
– Self-supervised models: Wav2Vec 2.0, HuBERT, Whisper, EnCodec, AudioMAE
– CNN and RNN architectures for long time-series modeling
– Transformer-based models for audio (AST, conformers, SpeechTransformer)
– Multimodal fusion techniques (audio + video) are a strong plus
Hands-on experience with raw audio signal processing, including:
– Feature extraction: MFCCs, log-Mel spectrograms, pitch, formants, etc.
– Data augmentation: SpecAugment, noise injection, vocal tract length perturbation
– Temporal/frequency analysis: FFT, STFT, CQT
– (Bonus) Synchronized video analysis (e.g., lip reading, emotion recognition)
Ability to optimize models for real-time deployment, including:
– Quantization, pruning, knowledge distillation
– Deployment to GPU environments using ONNX, TorchScript, or Triton
Strong interest in healthcare and socially impactful technologies
Professional English required; spoken French appreciated

Why join us?

A mission-driven environment where every algorithm can help save lives
A deeptech startup backed by major players in healthcare and innovation
A scientifically rigorous and supportive work culture
A passionate and multidisciplinary team working on real-world impact

How to apply

Please send your resume, a short motivation message, and if possible, an example of a relevant project or publication to:
job@e-sensia.com

Tags :

AI Job Openings