AKI Extract

Purpose

Requirements

The main requirement is to:

Build out the infrastructure, mechanisms and processes to allow the Statistics team access to the correct data for their reports.

The above needs to be broken down even further and the text in bold needs defining so what is the "correct data"? This comprises of two aspects a temporal one and a data choice one:

Temporal:

This is relating to what point in time are the CCG and therefore the Statistics team interested in for each of their reports. The quarterly report needs to include data from two months before the quarter and one month(or up to present if less than a month has passed). The bi-annual report needs data from the reporting month plus data up to a year before.

Data Choice:

This relates to the "What data do they need?" question, each row represents a lab order for a specific patient:

Data Point	Notes	Current AKI Database Table (akireg@rr-sql-live)	Source
NHS Number	Due to Information Governance has to be anonymous when viewed by Statistics team for now. There is a need to distinguish between patients so the resulting cipher text needs to be the same for each instance of the same number.	ALERT_RAW	AKI Database
NHS Number from Patient Tracing	Again needs to be anonymous for now. Cipher text equality should span the NHS Number data point above if the number is the same.		Patient Tracing
Lab Code		ALERT_RAW	AKI Database
Date of Birth	Agreed that the month and year of the date can be correct, however day must always equal 15th of the month.	ALERT_RAW	AKI Database
Date of Birth from Patient Tracing	Same as above day must always equal 15^th of the month		Patient Tracing
Gender		ALERT_RAW	AKI Database
Gender from Patient Tracing			Patient Tracing
Specimen Number		ALERT_RAW	AKI Database
Source of Request		ALERT_RAW	AKI Database
Care Field Indicator		ALERT_RAW	AKI Database
Date of Alert		ALERT_RAW	AKI Database
AKI Warning Test Result		ALERT_RAW	AKI Database
Creatinine		ALERT_RAW	AKI Database
eGFR MDRD		ALERT_RAW	AKI Database
eGFR CKD EPI		ALERT_RAW	AKI Database
Date of Death from Patient Tracing	The akireg schema contains a table to store dates of death which the legacy process was using to populate a already known date of death. This may mean that you miss any false positives/negatives, but this is a relatively rare edge case.	DEAD_PATIENT/Patient Tracing	Patient Tracing
Postcode		ALERT_RAW	AKI Database
Postcode from Patient Tracing			Patient Tracing
Patient Tracing Response Code			Patient Tracing

As you can see from the above table Patient Tracing data is required this adds some further requirements:

Patients need to have been submitted to tracing and a response obtained prior to any report being generated by the Statistics team.
Patient tracing should be carried out as close to their data being loaded into the AKI Database as reasonable (legacy process traced every quarter).
Tracing data should be persisted in the AKI Database.
Tracing data should be used to update Patient details if original patient data for lab order (NHS number being incorrect for example).
Tracing date of death should be used to update AKI Database for any patients that have been reported as dead in returned tracing results.

Anonymous Data

Currently in the AKI Database there are a few extra fields due to the need to make the NHS number and Date of Birth anonymous to some extent. With regards to the NHS Number these are a field containing a encrypted version of the number(using the RSA algorithm) and another one containing the hashed version of the number(using the sha256 algorithm) these are both computed in Python and the functions are in the utils.py file of the aki_validation BitBucket repository:

The Encrypt Function:

Encrypt

def encrypt(value, pubkey=None):
    if pubkey is None:
        pubkey = load_pubkey()
    if value is None:
        return None
    encrypted = rsa.encrypt(str(value).encode('utf-8'), pubkey)
    hexlified = binascii.hexlify(encrypted)
    return hexlified.decode('utf-8')

The Hash Function:

Hash Function

@functools.lru_cache(maxsize=4096)
def hash_value(value):
    if value is None or value == '':
        return value
    encoded = '{}'.format(value).encode('ascii')
    salt = SALT.encode('ascii')
    hashed = hashlib.pbkdf2_hmac('sha256', encoded, salt, ITERATIONS)
    return binascii.hexlify(hashed).decode('ascii')

As it stands encrypting the NHS number is not needed as there is no requirement to be able to read the plain text NHS number after it has been encrypted and was not included in the data set passed on by the legacy processes. The hashed field was used for both the NHS Number field and the NHS Number From Tracing field, however hashing a large amount of NHS numbers at once is computationally expensive and will increase the runtime complexity of any code that carries this out and therefore should be avoided.

Date of births are also semi anonymous currently as the day is set to the 15^th of the month, so if the plain text date of birth is "18/05/1960" the transformed version would be "15/05/1960".

There is some discussion as to whether the data needs to be anonymous when accessed by the Statistics team as sometimes it hinders their work, this issue is still in the air. However the patient postcode is now a required data point where in the legacy process the Lower Layer Super Output Area (LSOA) data was supplied instead of the postcode. But due to the need of the Statistics team to link to different LSOA and other area data to the patient, the patient postcode is now provided instead.

Legacy process

Please read the Legacy Quarterly AKI Lab data export Process. This documents how the process worked, however the requirements have changed and the process needs to be refactored as some of the scripts were failing due to the size of the data that was being processed.

Technical Notes on Meeting the Requirements

There seem to be two parts to this:

Getting the data to the Statistics Team.
Updating and Transforming data in the AKI Database.

As the Systems team is under resourced there seems to be a need to completely automate this and make it as easy to maintain as possible. The simplest way to get the data to the Statistics team would be giving them READ access to the AKI database. However this may need some thinking through as there may be some IG on this to take into account, this could be circumvented by creating some database Views or a Reporting layer that they can access instead of being able to view the entire database. Also Anna is going to have an extra function as a "Data Scientist" which will allow her to view the plain text AKI Database in line with IG. This would mean we do not have to maintain or develop an extract process and would meet the requirement for adhoc access when the Statistics team needs it.

The main piece of work would be creating the tracing/transformation part, tracing files need to be created and then either loaded into the database or used to update any records. Be aware UPDATE is less efficient than INSERT, which is something to think about when designing the mechanism to deal with trace responses. Some transformations like hashing(if actually needs to be done) the NHS number etc could be done on the loading of the data which would reduce the amount of hashing that needs to be done or a separate table to store known hashes alongside their plain text counterpart so that they can be reused.

A JIRA Epic has been created to track this:

Error rendering macro 'jira' : Unable to locate Jira server for this macro. It may be due to Application Link configuration.

Systems Team

AKI Extract

Purpose

Requirements

Temporal:

Data Choice:

Anonymous Data

Legacy process

Technical Notes on Meeting the Requirements

Related content