Skip to main content
Full-Time
On-Site

Développeur(-euse) principal(e)*, Données non structurées

View on Map

Description

The Data Science and Artificial Intelligence (SDIA) team supports the organization in the responsible and large-scale adoption of AI by establishing robust technological foundations, assisting business teams in transforming their practices, and contributing to the management of risks associated with these technologies. The Unstructured Data team specializes in the acquisition, extraction, and large-scale processing of complex data sources such as documents, texts, web content, and images. This role actively participates in the design, development, and operationalization of automated processes for extracting unstructured data and searching/collecting web data. These processes convert raw data into exploitable structured data, integrating components like generative AI, OCR, and specialized libraries. The individual also acts as an expert and partner to teams requiring extraction and search support.

What We're Looking For

Design, develop, and evolve unstructured data extraction and automated search pipelines within Databricks and AWS cloud environments (e.g., processing PDFs and financial documents, extraction via OCR and LLMs, web data search and collection).,Ensure end-to-end processing of large volumes of documents: ingestion, parsing, cleaning, enrichment, extraction, and structured storage.,Implement and maintain quality and observability mechanisms: pipeline monitoring, error management, evaluation of extraction quality, and traceability of results.,Contribute to the evolution of the team's technical foundations: evaluation frameworks, structured prompt engineering, integration of document AI services, and deployment automation.,Participate in technology watch and evaluation of new solutions and approaches in document processing and generative AI.,Act as an advisory role to partners regarding the responsible extraction, processing, and use of unstructured data.

Ideal Candidate

Minimum of three (3) to six (6) years of relevant experience in software engineering, data engineering, or applied data science.,University degree (2nd cycle, Master's or equivalent) in computer engineering, data science, artificial intelligence, computer science, or any other related discipline.,Intellectual curiosity for complex and varied issues.,Strong interpersonal skills and teamwork.,Enthusiasm for new technologies and dynamic environments where AI data has a concrete and strategic impact.,Creativity and a propensity to think outside the box to propose new ideas.

Minimum Education

Master's Degree or equivalent

Hard Skills

Python
Unstructured (asset)
LlamaIndex (asset)
Scrapy (asset)
Beautiful Soup (asset)
PDF parsing (asset)
HTML parsing (asset)
Generative AI (LLMs via API
prompt engineering
model output validation/structuring) (asset)
Apache Spark (asset)
PySpark (asset)
Databricks (asset)
AWS (asset)
Azure (asset)
CI/CD
Git
Containerization
MLOps

Soft Skills

Intellectual curiosity
Interpersonal skills
Teamwork
Creativity
Problem-solving
Adaptability

Benefits

Competitive total compensation, comprehensive health and dental insurance, defined benefit pension plan, professional development training, fitness center, and flexible work arrangements.

About the Company

C

Caisse de dépôt et placement du Québec (CDPQ)

CDPQ is a global investment group that manages funds for public pension and insurance plans. It invests in major financial markets, private equity, infrastructure, real estate, and private debt to generate long-term value for its depositors and the Quebec economy.

Professional
Long-term
Global
Collaborative
Impact-driven
View all jobs at Caisse de dépôt et placement du Québec (CDPQ)

    We respect your privacy

    BerryMap uses cookies to provide essential features, analyze usage, and improve your experience. You can customize your preferences below.