Data Science

Get help with your Data science R pipelines

As modern research becomes increasingly computational, leveraging more diverse forms of data and employing programming for data wrangling, modelling and analysis across more domains, it begins to increasingly fits the definition of data science.

What I offer

  • Project organisation for portability, transparency and orientation.
  • Code modularisation and function development.
  • Code Optimisation.
  • Code testing and validation.
  • Exception handling with actionable, human friendly user messages.
  • Documentation (function, long form).
  • Data cleaning, processing, munging validation and documentation.
  • Literate Programming, Report and manuscript writing in Rmarkdown or Quarto.
  • Setting you up to write your thesis in R using bookdown.
  • Dependency and computational environment management (including containerisation with Docker of Singularity).
  • Version control and transparent code sharing (publicly or privately) through online repositories.
  • Continuous integration (for testing or automated documentation builds).
  • Packaging papers, code and data into Research Compendia.

Why bother?

Modern Research ≅ Data science

The open source statistical programming language R, has a rich history in academia and is increasingly a tool of choice in data science due to its broad statistical capabilities and active community supporting a rich ecosystem of powerful and continuously evolving packages.

Such blending of freely available programming and statistical capabilities means deploying powerful and sophisticated statistical models becomes far more accessible. With great power however comes great responsibility and it can take more time, effort and skill to develop robust, reproducible and portable analysis pipelines.

This is where I come in!

Why me?

Experienced in Data Science

However complex your analysis, whether a single script analysis of tabular data to full scale multi step analysis pipelines involving data harvesting, interacting with databases or other software, data cleaning, profiling and validation, data munging, data modelling, prediction, visualisation and analysis reporting, I can provide support for all your Research Data Science needs.

While I expect to work closely with my clients on their research questions, I have broad prior experience in modelling, statistical analysis and prediction including:

  • Frequentist statistics including generalised and mixed modelling, timeseries analysis, spatial statistics and more.
  • Modern machine learning techniques incl. deep learning.
  • Bayesian statistics.

A research software engineering approach

Ensuring transparency, robustness, efficiency and reproducibility is just as important. My experience in Research Software Engineering means such considerations are baked in from the start for all projects. I can also help with refactoring your own project code according to software engineering best practice.