Installing software packages
Guide to installing Python, R and STATA software packages in the Secure Data Environment (SDE).
Quick reference
| Language | Command |
|---|---|
| Python | pip install --index-url https://packages.sde.digital.nhs.uk/repository/pypi-mirror/simple <package_name> |
| R | install.packages("<package_name>", repos = "https://packages.sde.digital.nhs.uk/repository/cran-mirror/") |
Databricks: Add % to the start (for example, %pip install...).
Overview
The Secure Data Environment (SDE) is a highly secure platform with no open access to the internet. You cannot install software packages directly from external websites. Instead, we provide an SDE Package Manager which hosts internal mirrors of approved packages from CRAN (for R) and PyPI (for Python). Additionally, we also provide STATA packages, for those users who have an active license to use STATA within the SDE.
Use the guidance below to manage software packages in your environment.
Checking available packages and versions
Before attempting to install or request a package, you should check which packages are currently available in our repository.
- Go to the desktop of your virtual desktop (VDI).
- Open the folder named 'Package Manager'.
- Inside, you will find two text files:
- One listing all available Python packages and versions.
- One listing all available R packages and versions.
Use these lists to verify package availability and version compatibility before beginning your work.
Installing packages on the VDI
The VDI is your personal computing environment. You can install Python and R packages here, and they will persist in your local user area.
R users: Open RStudio desktop. It is pre-configured to use the internal mirror.
Python users: Open a terminal or your preferred IDE. Use `pip` with the internal index URL to install packages.
Installing packages on Databricks
Databricks behaves differently from the VDI.
Notebook-scoped libraries
We do not currently support installing packages directly to a Databricks Cluster (Global Installs). Instead, all installations are Notebook-Scoped. This means:
- packages are installed only for the current notebook session
- if you detach the notebook or restart the cluster, installations are lost
- you must include your installation commands (%pip install, for example) at the top of every notebook
Databricks Runtime
Every Databricks cluster runs a 'Databricks Runtime' which is a pre-configured environment that comes with many popular packages (like pandas, numpy, and matplotlib) already installed.
Finding your Databricks Runtime
To check which packages are pre-installed on your Databricks cluster, you need to know which Runtime version your cluster is using. You can find this information by following these steps:
- In Databricks, click the Compute tab in the left sidebar.
- Select the cluster associated with your data sharing agreement.
- Locate the Configuration tab.
- See the field labelled Databricks Runtime Version - for example, 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12).
Once you know your version (such as '10.4 LTS'), you can then lookup the full list of pre-installed Python and R packages in the official Databricks release notes.
Conflicting packages
If you try to install a specific version of a package that is already pre-installed in the Databricks Runtime (such as installing an older version of matplotlib), you may encounter conflicts.
How to fix conflicts
Databricks loads the pre-installed packages when the cluster starts. If you overwrite one of these packages using `%pip install`, Python will not 'see' the new version until you restart the Python process.
If you see errors after installing a package, run the following command in a new cell immediately after your install command:
dbutils.library.restartPython()
Databricks example workflow
# 1. Install specific version (using magic command)
%pip install --index-url https://packages.sde.digital.nhs.uk/repository/pypi-mirror/simple matplotlib==3.2.1
# 2. Restart kernel to apply changes immediately
dbutils.library.restartPython()
# 3. Verify version
import matplotlib
print(matplotlib.__version__)
Problems installing R packages
Our internal R package repository hosts binary packages only.
When Install R packages you may come one or more of the following:
'Build from source' pop-up:
If asked to build from source, select 'No'. Sometimes, R will still attempt to install a source package, which will result in a failed installation. If this occurs, then you should explicitly declare the package 'type' when installing:
install.packages("package name", type = "binary")
Rtools warning:
Occasionally, you may see a warning that 'Rtools is not installed'. You can ignore this warning; it is only needed for source compilation.
Requesting new packages
If a package is not available in the SDE Package Manager:
- Check to confirm the package is not listed in the relevant file located within the Package Manager folder on your VDI desktop.
- Email the National Service Desk at [email protected] with the package name and a link to its CRAN/PyPI page.
We cannot mirror directly from GitHub, only from official CRAN/PyPI repositories.
Last edited: 20 January 2026 3:24 pm