[Job Opening] AI Researcher – Speech & Video Signal Processing

AI Job Openings
AI researcher medical speech analysis

Location: Paris / Partial remote work possible
Contract type: Full-time permanent position (CDI)
Start date: As soon as possible

We are looking for an experienced AI researcher in medical speech analysis to join our team in Paris and help advance life-saving voice-based technologies.

About  e-sensia

At e-sensia, we develop cutting-edge medical assistance technology based on voice analysis.
Our mission: save lives by enabling faster and more accurate detection of medical emergencies using artificial intelligence.

Our solution is already used by major emergency response organizations to analyze real-time calls.
It is powered by a unique speech-processing pipeline, enriched with millions of real-world emergency calls and associated medical data.

Responsibilities

As an AI researcher in medical speech analysis, you will play a key role in advancing the scientific capabilities of our product by designing and developing innovative models, including:

  • Fine-tuning of large audio language models (Whisper, Wav2Vec, EnCodec, etc.)
  • Raw speech or spectrogram processing, optionally combined with video for multimodal approaches
  • Automatic detection of critical events (breathing irregularities, distress, emotional states, silence)
  • Recognition of subtle vocal patterns associated with medical conditions
  • Close collaboration with clinicians to validate AI models in real-world emergency scenarios
  • Contributing to the scientific strategy of the company
  • Participation in scientific publications, patent filings, or collaborative European research projects
Profile
  • PhD or equivalent experience in AI, signal processing, or deep learning for audio/video
  • Proven ability to initiate and lead research projects
  • Strong expertise with deep learning frameworks, including:
    PyTorch for fine-grained modeling and experimental workflows
    Hugging Face Transformers for rapid deployment and adaptation of pretrained models (Wav2Vec2, Whisper, Hubert, etc.)
    TensorFlow/Keras or JAX/Flax as complementary tools for prototyping and benchmarking
  • In-depth understanding of modern neural architectures for audio and video, including:
    – Self-supervised models: Wav2Vec 2.0, HuBERT, Whisper, EnCodec, AudioMAE
    – CNN and RNN architectures for long time-series modeling
    – Transformer-based models for audio (AST, conformers, SpeechTransformer)
    – Multimodal fusion techniques (audio + video) are a strong plus
  • Hands-on experience with raw audio signal processing, including:
    – Feature extraction: MFCCs, log-Mel spectrograms, pitch, formants, etc.
    – Data augmentation: SpecAugment, noise injection, vocal tract length perturbation
    – Temporal/frequency analysis: FFT, STFT, CQT
    – (Bonus) Synchronized video analysis (e.g., lip reading, emotion recognition)
  • Ability to optimize models for real-time deployment, including:
    – Quantization, pruning, knowledge distillation
    – Deployment to GPU environments using ONNX, TorchScript, or Triton
  • Strong interest in healthcare and socially impactful technologies
  • Professional English required; spoken French appreciated
Why join us?
  • A mission-driven environment where every algorithm can help save lives
  • A deeptech startup backed by major players in healthcare and innovation
  • A scientifically rigorous and supportive work culture
  • A passionate and multidisciplinary team working on real-world impact
How to apply

Please send your resume, a short motivation message, and if possible, an example of a relevant project or publication to:
job@e-sensia.com

Tags :
AI Job Openings
Share This :