Legacy Quarterly AKI Lab data export Process

Legacy Process

Please do not run this process as is without refactoring to meet the new requirements located here!

Every quarter(normal quarter + 2, March, June, September, December), data from the AKI database (dbname: akireg, server: rr-sql-live) needs to be sent to the CCG (Clinical Commissioning Group).

An extract is created in the form of a CSV file that passes through the Statistics team to process before they send it on wards to the CCG.

In the current extract process there are a few fields that are derived and/or updated:

  • The patients Hashed and Encrypted NHS number and Date of Birth fields in akireg.ALERT_RAW and akireg.CREATININE_RAW.

  • The ctry, ccg and lsoa11 fields in akireg.ALERT_RAW.
  • The patients traced NHS number in akireg.ALERT_RAW and their Date of Death.

All the changes need to happen before the extract is created bar the updating of akireg.ALERT_RAW with traced NHS numbers and date of deaths.

The extract that the Statistics team receive has had the NHS number, Date of Birth and Postcode made anonymous.


What to Run?

There are several Python scripts that need to be run in sequence, they can be found in the AKI-Validation-Repository, the links to each script are below.

The one exception is the adhoc.py script which can be found in the NHS-Tracing-Repository.

They are:

  • encrypt.py(Location: encrypt.py): Updates akireg with Hashed and Encrypted NHS numbers and Dates of birth.
  • lsoa_inserter.py(Location: lsoa_inserter.py): Updates akireg with ctry, ccg and lsoa11 codes.
  • exporter.py(Location: exporter.py): Produces a CSV file to be converted into tracing format, usually named 'akireg_tracing.csv'.
  • adhoc.py(Location: adhoc.py): Creates a tracing file from the output of 'exporter.py', code is in 'nhs_tracing' on the 'RenalRegistry' BitBucket project.
  • tracematcher.py(Location: tracematcher.py): Uses the file sent back from NHS tracing and uses to match patients in akireg to produce actual extract file to pass on to Stats team. 
  • insertdods.py(Location: insertdods.py): Updates Date of Death in akireg with new data from NHS tracing.
  • update_nhsno.py(Location: update_nhsno.py): Updates NHS numbers on akireg with new data from NHS tracing.

Script Running Order


Python Environment

Make sure you set up Devpi client to install libraries from instructions can be found in RenalRegistry-Devpi, the server is already set up so just follow the instructions in the 'Installing Devpi Client' section.

Once you have set one up activate it, you will need to install the dependencies for both repositories by changing into the repositories root directory and typing:

Installing Dependencies
pip install -r dev-requirements.txt

This will make pip scan the dev-requirements.txt file and install

TODO: Expand

Currently Python 3.5.2 is able to run each script.

  • python 3.5.2
  • make sure you have git
  • make sure you have installed
  • dev-requirements
  • remove pymssql
  • pymssql from source wheel
  • make sure you have tkinter installed

They need there requirements text files to be called using PIP this can.

Both repositories can be cloned using git.


How to Run?

Encrypt.py:

To run: 

encrypt
python encrypt.py
Lsoa_inserter.py

To run:

Check LSOA Table

Please make sure on line 16 the SQL query is pointing to the table containing the current LSOA data:

Check LSOA Table
select_lsoa = 'SELECT pcd, pcd2, ctry, ccg, lsoa11 FROM lsoa201911'

If not please change the "FROM" clause to comply with this.

After checking the correct LSOA table is being used simply run:

lsoa_inserter
python lsoa_inserter.py
Exporter.py

To run:

exporter
python exporter.py

Exporter Output

Currently the script creates a CSV file in the current working directory called "akireg_tracing.csv", you can change this by altering the first argument in the io.open function to your desired filename and path on line 25 in the script:

Altering Output File Path
output = io.open('akireg_tracing.csv', 'w', encoding='utf-8', newline='')
Adhoc.py

Checkout https://bitbucket.renalregistry.nhs.uk/projects/RR/repos/nhs_tracing/browse and setup the associated venv. Note you also need to manually install the package itself (via "pip install -e .") and pymssql (which is an optional requirement).

Run  -

python nhs_tracing/adhoc.py --columns id,nhs_no,birth_date,surname,forename,sex,postcode C:\temp\akireg_tracing.csv C:\Temp\akireg_tracing_output.csv

This will produce one or more files depending on how many records there are in the input file. These should be submitted to tracing.


For further information see /wiki/spaces/RR/pages/1295843817
Tracematcher.py

To run:

Ensure Script Variables Are Set

There are three variables that need to be checked and set to the correct values if not:

Set Variables
SRC = "AKIREG_COMBINED.csv"
OUT = "alerts_done-20190603_traced.csv"
CUTOFF_DATE = "2019-06-01"

As you can see from above the variables are pretty self explanatory; "SRC" is the Tracing Output (you may need to reduce the tracing output down to one file if spread over many), "OUT" is the file path you would like the script to output the file it produces to (make sure you include the .csv extension) and finally "CUTOFF_DATE" is the maximum DATE_LOADED date from the ALERT_RAW table you are interested in (usually end of the quarter).

When you have correctly set the variables in the script you can run it:

Tracematcher
python tracematcher.py

Once run the output CSV needs to be passed onto the stats team for further processing in SAS.

Insertdod.py

To run:

In order to run the script you need to pass in the file path of the csv file that was output by the tracing step:

Insertdod
python insertdods.py tracing_output_file.csv
 Update_nhsno.py

To run:

Update Script With Traced File

Make sure you update the script on line 109 with the correct tracing output file:

Update Tracing File
    for line in read_traced('AKIREG_COMBINED.CSV'):

Overwrite the string in read_traced() to the file path of your traced output file.

Once the correct input file has been entered into the script to run:

Update_nhsno
python update_nhsno.py

Once this is done the process is complete. 

The process needs refactoring!

Creatinines

This is extremely  inefficient. If it needs running it would be quicker to re-write as a single linked update statement.

python update_creatinine_nhsno.py
python export_creatinines.py