Location: Paris / Partial remote work possible
Contract type: Full-time permanent position (CDI)
Start date: As soon as possible
We are looking for an experienced AI researcher in medical speech analysis to join our team in Paris and help advance life-saving voice-based technologies.
About e-sensia
At e-sensia, we develop cutting-edge medical assistance technology based on voice analysis.
Our mission: save lives by enabling faster and more accurate detection of medical emergencies using artificial intelligence.
Our solution is already used by major emergency response organizations to analyze real-time calls.
It is powered by a unique speech-processing pipeline, enriched with millions of real-world emergency calls and associated medical data.
Responsibilities
As an AI researcher in medical speech analysis, you will play a key role in advancing the scientific capabilities of our product by designing and developing innovative models, including:
- Fine-tuning of large audio language models (Whisper, Wav2Vec, EnCodec, etc.)
- Raw speech or spectrogram processing, optionally combined with video for multimodal approaches
- Automatic detection of critical events (breathing irregularities, distress, emotional states, silence)
- Recognition of subtle vocal patterns associated with medical conditions
- Close collaboration with clinicians to validate AI models in real-world emergency scenarios
- Contributing to the scientific strategy of the company
- Participation in scientific publications, patent filings, or collaborative European research projects
Profile
- PhD or equivalent experience in AI, signal processing, or deep learning for audio/video
- Proven ability to initiate and lead research projects
- Strong expertise with deep learning frameworks, including:
– PyTorch for fine-grained modeling and experimental workflows
– Hugging Face Transformers for rapid deployment and adaptation of pretrained models (Wav2Vec2, Whisper, Hubert, etc.)
– TensorFlow/Keras or JAX/Flax as complementary tools for prototyping and benchmarking - In-depth understanding of modern neural architectures for audio and video, including:
– Self-supervised models: Wav2Vec 2.0, HuBERT, Whisper, EnCodec, AudioMAE
– CNN and RNN architectures for long time-series modeling
– Transformer-based models for audio (AST, conformers, SpeechTransformer)
– Multimodal fusion techniques (audio + video) are a strong plus - Hands-on experience with raw audio signal processing, including:
– Feature extraction: MFCCs, log-Mel spectrograms, pitch, formants, etc.
– Data augmentation: SpecAugment, noise injection, vocal tract length perturbation
– Temporal/frequency analysis: FFT, STFT, CQT
– (Bonus) Synchronized video analysis (e.g., lip reading, emotion recognition) - Ability to optimize models for real-time deployment, including:
– Quantization, pruning, knowledge distillation
– Deployment to GPU environments using ONNX, TorchScript, or Triton - Strong interest in healthcare and socially impactful technologies
- Professional English required; spoken French appreciated
Why join us?
- A mission-driven environment where every algorithm can help save lives
- A deeptech startup backed by major players in healthcare and innovation
- A scientifically rigorous and supportive work culture
- A passionate and multidisciplinary team working on real-world impact
How to apply
Please send your resume, a short motivation message, and if possible, an example of a relevant project or publication to:
job@e-sensia.com