The successful candidate for the Data Scientist position will develop and implement data-driven solutions to aid the physiological characterization of cell factories and deliver actionable insights for strain & process optimization workflows. The person in this positin will primarily contribute to the qualification and implementation of current data science approaches, integrate these within the data science platform, and make insights available for decision making. Using this platform, they will also perform multi-omics analysis across the R&D portfolio, effectively share results, and iterate on newly identified challenges. The successful candidate will have demonstrated proficiency in applying her/his technical knowledge and scientific creativity to develop innovative solutions in a team environment.

typical roles and responsibilities for Data Scientist

  • Utilize bioinformatics expertise for database mining, gene prediction, gene annotation, enzyme discovery, genome assembly, and genomic comparisons.
  • Analyze & generate insights from multi-omics data (e.g. genomics, transcriptomics, proteomics, metabolomics) and work with other members in R&D and the Systems Bioengineering team to identify metabolic engineering and process development targets. Prepare material for scientific presentation and consistently share results, experimental design, and execution plans with project teams and stakeholders.
  • Identify and evaluate new data science algorithms to facilitate the analysis of multi-dimensional biological datasets towards actionable engineering insights.
  • Contribute to the development and deployment of an in-house systems biology knowledge base by synthesizing publicly available data sources, internally curated data sources and literature to make learnings more accessible for programmatic analysis and visual analysis by Scientists.
  • Develop data processing workflows which leverage process models to extract relevant metrics in existing and developing experimental workflows including fermentation (lab and commercial scale) and small-scale phenotyping platforms.
  • Develop software applications including custom javascript visualizations and data-driven backend to support capturing experimental metadata, raw data, and analysis outcomes

Requirements for Data Scientist – Life Sciences

  • Master’s with 3-5 experience and/or PhD degree in Bioinformatics, Chemical/Biochemical Engineering, Bioengineering, Bioprocess Engineering, Microbiology or related field with 0-2 years related industrial experience
  • Direct relevant experience in high-dimensionality biological data management, processing, analysis, and communication; may include post-doctoral experience; or equivalent combination of education and experience.
  • Experience with one or more programming languages (e.g. Python, Matlab, Java, javascript)
  • Familiar with statistical packages (e.g. scikit-learn, scipy) to prepare data (denoising, feature selection, normalization, etc.), and discover patterns (clustering, dimensionality reduction, regression, neural networks, etc.) within multi-dimensional biological data.
  • Expertise in integrative analysis of multiple biological data types (e.g. genomics, transcriptomics, proteomics, metabolomics, fermentation, small scale, etc.) and providing context-rich interpretations and recommendations
  • Familiar with database technologies, data storage and retrieval (SQL); specific-knowledge of biological data management (functional annotations, gene ontology, etc.) preferred
  • Strong communication skills (both verbal and written) and interpersonal skills to convey objectives and results concisely to a diverse audience including computational biologists, fermentation engineers, microbiologists, molecular biologists, and enzymologists.
  • Ability to thrive in a fast-paced yet intellectually stimulating environment, creativity, independent thinking, analytical excellence, and passion
  • Experience developing automated data processing/wrangling pipelines for high-volume, high-dimensionality data sets
  • Experience with web technologies for custom visualizations and UI (e.g. javascript, d3.js, react)
  • Experience with large-scale database driven visualization software (e.g. Spotfire, Tableau)
  • Experience in analysis of Next Gen Sequencing data analysis tools, open-source software and genomic databases (e.g., Geneious, GenBank)
  • Experience developing applications to support data capture, databasing, visualization & analysis
  • Experience in a chemical engineering, microbiology, fermentation, or metabolic engineering setting
  • Experience with metabolic phenotyping systems including fed-batch fermentation (lab and/or commercial scale), plate-based platforms and chemostat cultures
  • Strong data analysis and visualizations skills, attention to detail, and competent computer skills (Microsoft Office, Excel, PowerPoint, etc.)