API documentation

ukbiobank

Created on Fri Mar 27 12:38:10 2020

@author: jnecus

class ukbiobank.ukbio.ukbio(ukb_csv=None)[source]
Parameters

ukb_csv (String, mandatory) – Path to ukbiobank csv file. ipthon

Example usage:

import ukbiobank
ukb = ukbiobank.ukbio(ukb_csv='path/to/ukbiobank_data.csv')
Returns

ukbio objects are required as an input when using ukbiobank-tools functions. ukbio objects contain import information such as:

  • variable codings

  • path to ukbiobank csv file

Return type

ukbio object.

ukbiobank.utils.utils

Created on Wed Mar 25 13:33:29 2020

@author: Joe

UKBiobank data loading utilities

ukbiobank.utils.utils.addFields(ukbio=None, df=None, fields=None, instances=None)[source]
Parameters
  • ukbio (ukbio object) – ukbiobank.ukbio object

  • df (pandas dataframe) – df containing ukbiobank data

  • fields (List, Mandatory) – Accepts UKB field ID or text string (or mixed), e.g. ‘31-0.0’ or ‘Sex’.

  • instances (integer or list of integers, Optional) – If present, fields will be filtered for chosens instance(s) before being added.

  • usage:: (Example) – df = ukbiobank.utils.addFields(ukbio = ukb, df = df, fields = [‘Sex’])

Returns

df – df containing all instances of chosen fields.

Return type

Pandas dataframe

ukbiobank.utils.utils.calculateChangeInCognitiveScores(ukbio=None, df=None)[source]
Parameters
  • ukbio (ukbio object. Mandatory.) –

  • df (pandas df loaded using ukbiobank-tools. Mandatory.) –

Returns

  • out_df (pandas df containing cognitive decline score)
    • Currently subtracts cognitive test score at instance 3 from instance 2 for the tests listed below. Output variables are labelled as ‘change in x’

  • ’Mean time to correctly identify matches’->’Change in Mean time to correctly identify matches

  • ’Maximum digits remembered correctly (Field ID: 4282) -> ‘Change in x

  • Fluid intelligence score (Field ID: 20016) -> ‘Change in x

  • ’Number of incorrect matches in round (Field ID: 399) -> ‘Change in x

  • ’Number of puzzles correctly solved -> ‘Change in x

  • ’Number of puzzles correct -> ‘Change in x

  • ’Number of word pairs correctly associated -> Change in x

  • Duration to complete alphanumeric path (trail #2) (Field ID: 6350) -> Change in x

  • ’Number of symbol digit matches made correctly (Field ID: 23324) -> Change in x

ukbiobank.utils.utils.calculateCognitiveDeclineScore(ukbio=None, df=None, percentage_thresh=0, missing_tolerance=0)[source]

Currently generates a composite ‘cognitive_decline_score-3.0’ between instances 2 & 3 (imaging visits) +1 point is added if the score on a test changed by a given percentage’ between instances 2 & 3. Deafult percentage is ‘0’, i.e. any decline in a score will count, if ‘+20, improvements of 19% and any decrease in performance will count.

Future to do’s

-include additional tests, instances etc -account for age. .

Parameters
  • ukbio (ukbio object. Mandatory.) –

  • df (pandas df loaded using ukbiobank-tools. Mandatory.) –

  • percentage_thresh (int. Optional. Default: 0.) – This parameter determines how much of a percentage change in scoring is required before a ‘+1’ cognitive decline score is to be added for that test. E.g. if set to ‘-50’, then the RT must have gotten worse by 50%, or the number of items remembered must have dropped by 50% ..

  • missing_thresh (int. optional, default 0.) – This parameter determine how many mising tests the calculation will tolerate. E.g. if scores are missing for all tests, then a subject will by default recieve an overall score of 0. However, if the missing tolerance is 0, then any missing test will assign a NaN score to this subject.

Returns

out_df

Return type

pandas df containing cognitive decline score

ukbiobank.utils.utils.calculateHealthySleepScore(ukbio=None, df=None, instances=[2, 3])[source]
Generates a composite healthy sleep score (0-5) accoring the methods used:

https://academic.oup.com/eurheartj/article/41/11/1182/5678714#200787415

Parameters
  • ukbio (ukbio object. Mandatory.) –

  • List of int. Default = [2 (instances.) –

  • 3]

  • df (pandas df loaded using ukbiobank-tools. Mandatory.) –

Returns

out_df

Return type

pandas df containing healthy sleep score (0-5)

ukbiobank.utils.utils.fieldIdsToNames(ukbio=None, df=None, ids=None)[source]
Parameters
  • ukbio (ukbio object, mandatory) –

  • df (pandas dataframe (generated using ukbio loadCsv), optional) –

  • ids (a list of ids (can be mixed text & id) to be converted to text, optional) –

Returns

  • Pandas dataframe (column names converted to text names)

  • or

  • List of fieldnames

ukbiobank.utils.utils.fieldNamesToIds(ukbio=None, df=None)[source]
Parameters
  • ukbio (ukbio object, mandatory) –

  • df (pandas dataframe with column headers containing text name (e.g. Illness Code-2.0')) –

Returns

Return type

Pandas dataframe (column text names converted to ukbio id (e.g. ‘Illness Code-2.0’ -> 20002-2.0)

ukbiobank.utils.utils.getFieldIdsFromNames(ukbio, field_names=None)[source]
Parameters
  • ukbio (ukbio object) –

  • fieldNames (List of strings, mandatory) – Ukbiobank field_names.

Returns

Return type

List of ALL ukbiobank fieldId_instance_array’s associated with given fieldname.

ukbiobank.utils.utils.getFieldIdsInstancesFromCategoryId(ukbio, field_ids=None)[source]
Parameters
  • ukbio (ukbio object) –

  • fieldNames (List of integers, mandatory) – Ukbiobank field category ids.

Returns

Return type

List of ALL ukbiobank fieldId_instance_array’s associated with given field category id.

ukbiobank.utils.utils.getFieldIdsInstancesFromNamesInstances(ukbio, field_names=None)[source]
Parameters
  • ukbio (ukbio object) –

  • fieldNames (List of strings e.g. 'Sex-0.0') –

Returns

Return type

List of ALL ukbiobank fieldId_instance_array’s associated with given field category ‘id-instance.array’ .

ukbiobank.utils.utils.getFieldnames(ukbio)[source]
Parameters

ukbio (ukbio object) –

Returns

List of available ukbiobank fieldnames.

Return type

List

ukbiobank.utils.utils.getFieldsInstancesArrays(ukb_csv=None, data_dict=None)[source]
Parameters
  • ukbcsv (String path to ukbiobank csv file) – The default is None.

  • data_dict (Pandas dataframe.) – Uk biobank data dictionary

Returns

  • field_instance_array pandas dataframe.

  • This is used as a reference for decoding columns heads & filtering based on column, instance etc

ukbiobank.utils.utils.illnessCodesToText(ukbio=None, df=None)[source]
Parameters
  • ukbio (ukbio object, mandatory) –

  • df (pandas dataframe (generated using ukbio loadCsv)) –

Returns

Return type

Pandas dataframe (column names converted to text names)

ukbiobank.utils.utils.loadCsv(ukbio=None, fields=None, n_rows=None, instance=None)[source]
Parameters
  • ukbio (ukbio object) –

  • fields (List of strings, Mandatory) – Accepts UKB field category, ID or text string (or mixed), e.g. 21, ‘21-0.0’ or ‘Sex’.

  • instance (Integer (either 0,1,2,3), optional.) – Performs filtering of columns by instance

Returns

df – df containing all instances of chosen fields.

Return type

Pandas dataframe

ukbiobank.utils.utils.meltByInstance(ukbio=None, df=None)[source]
Parameters
  • ukbio (ukbio object. Mandatory.) –

  • df (pandas df loaded using ukbiobank-tools. Mandatory.) –

Returns

out_df

Return type

pandas df which has been re-fromatted to include the following columns: “Variable | Instance | Value”

ukbiobank.utils.utils.removeOutliers(df=None, std=3, cols=None)[source]
Parameters
  • df (pandas dataframe) –

  • std (int, defauult 3. Number of standard deviations threshold to exclude outliers.) –

  • cols (columns in pandas df to exlcude outliers.) –

Returns

df

Return type

Pandas dataframe

ukbiobank.filtering.filtering

Created on Wed Mar 25 13:33:29 2020

@author: jnecus

UKBiobank data filtering utilities

ukbiobank.filtering.filtering.filterByField(ukbio=None, df=None, fields_to_include=None, instances=[0, 1, 2, 3], arrays=None)[source]
Parameters
  • ukbio (ukbio object, mandatory) –

  • df (pandas dataframe (currently only accepts FieldID headers as column headers)) –

  • fields_to_include (Dictionary whereby keys: 'fields to include', values:'values to include') –

  • IN FIELDS_TO_INCLUDE MUST BE IN FIELD_ID FORM* e.g. '20002' (not 'Self-reported Illness') * (*FIELDS) –

  • IN FIELDS_TO_INCLUDE MUST BE IN CODED FORM* e.g. '1074' (*VALUES) –

  • 'angina') * ((not) –

  • instances (list of integers, Default is [0,1,2,3] (include all instances)) –

  • arrays (list of integers) –

Returns

  • Pandas dataframe with data-fields filtered for selected fields, values, instances, arrays.

  • *This function uses ‘OR’ logic, i.e. if any of the values/fields included are present then they will be included*

ukbiobank.filtering.filtering.filterInstancesArrays(ukbio=None, df=None, instances=None, arrays=None)[source]
Parameters
  • ukbio (ukbio object, mandatory) –

  • df (pandas dataframe (generated using ukbio loadCsv)) –

  • instances (List of integers. Default is none (include all instances)) –

  • arrays (List of integers. Default is none (include all arrays)) –

Returns

Dataframe with datafields filtered for selected instances/arrays

Return type

Pandas dataframe

ukbiobank.filtering.illness

Created on Wed Mar 25 13:33:29 2020

@author: jnecus

UKBiobank data filtering utilities

ukbiobank.filtering.illness.healthy_unhealthy_split(ukbio=None, df=None, instances=[0, 1, 2, 3], return_filter_fields=False)[source]

Splits dataframe into healthy_df and unhealthy_df based upon exclusion criteria used in Cole 2020 https://www.sciencedirect.com/science/article/pii/S0197458020301056

Exclusion critera were: An ICD-10 diagnosis (#41270), Self-reported long-standing illness disability or infirmity (UK Biobank data field #2188), Self-reported diabetes (field #2443) Stroke history (field #4056), Not having good or excellent self-reported health (field #2178).

Note: According to these criteria, around ~20% of the data are ‘healthy’, with 80% deemed ‘unhealthy’

ukbio : ukbio object, mandatory

df : pandas dataframe (generated using ukbio loadCsv). Mandatory

instances : list of integers, Default is [0,1,2,3] (include all instances)

return_filter_fieldsBoolean, default False.

If True, the fields used to filter according to healthy/unhealthy criteria are included in return dataframes (this can be useful for validation and see investigate the cause of healthy/unhealthy classification). If False, returned dataframes contain the same fields as the input dataframe.

healthy_df, dataframe with individuals not matching exclusion criteria : Pandas dataframe

unhealthy_df, dataframe with individuals containing one or more matching exclusion criteria : Pandas dataframe

Generated Index

Part of the sphinx build process in generate and index file: Index.