API documentation ¶

Table of Contents

API documentation

`ukbiobank`¶

Created on Fri Mar 27 12:38:10 2020

@author: jnecus

class ukbiobank.ukbio.ukbio(ukb_csv=None)[source]¶

Parameters: ukb_csv (String, mandatory) – Path to ukbiobank csv file. ipthon

Example usage:

import ukbiobank
ukb = ukbiobank.ukbio(ukb_csv='path/to/ukbiobank_data.csv')

Returns

ukbio objects are required as an input when using ukbiobank-tools functions. ukbio objects contain import information such as:

variable codings

path to ukbiobank csv file

Return type

ukbio object.

`ukbiobank.utils.utils`¶

Created on Wed Mar 25 13:33:29 2020

@author: Joe

UKBiobank data loading utilities

ukbiobank.utils.utils.addFields(ukbio=None, df=None, fields=None, instances=None)[source]¶

Parameters

ukbio (ukbio object) – ukbiobank.ukbio object
df (pandas dataframe) – df containing ukbiobank data
fields (List, Mandatory) – Accepts UKB field ID or text string (or mixed), e.g. ‘31-0.0’ or ‘Sex’.
instances (integer or list of integers, Optional) – If present, fields will be filtered for chosens instance(s) before being added.
usage:: (Example) – df = ukbiobank.utils.addFields(ukbio = ukb, df = df, fields = [‘Sex’])

Returns

df – df containing all instances of chosen fields.

Return type

Pandas dataframe

ukbiobank.utils.utils.calculateChangeInCognitiveScores(ukbio=None, df=None)[source]¶

Parameters

ukbio (ukbio object. Mandatory.) –
df (pandas df loaded using ukbiobank-tools. Mandatory.) –

Returns

out_df (pandas df containing cognitive decline score)
- Currently subtracts cognitive test score at instance 3 from instance 2 for the tests listed below. Output variables are labelled as ‘change in x’

’Mean time to correctly identify matches’->’Change in Mean time to correctly identify matches
’Maximum digits remembered correctly (Field ID: 4282) -> ‘Change in x
Fluid intelligence score (Field ID: 20016) -> ‘Change in x
’Number of incorrect matches in round (Field ID: 399) -> ‘Change in x
’Number of puzzles correctly solved -> ‘Change in x
’Number of puzzles correct -> ‘Change in x
’Number of word pairs correctly associated -> Change in x
Duration to complete alphanumeric path (trail #2) (Field ID: 6350) -> Change in x
’Number of symbol digit matches made correctly (Field ID: 23324) -> Change in x

ukbiobank.utils.utils.calculateCognitiveDeclineScore(ukbio=None, df=None, percentage_thresh=0, missing_tolerance=0)[source]¶

Currently generates a composite ‘cognitive_decline_score-3.0’ between instances 2 & 3 (imaging visits) +1 point is added if the score on a test changed by a given percentage’ between instances 2 & 3. Deafult percentage is ‘0’, i.e. any decline in a score will count, if ‘+20, improvements of 19% and any decrease in performance will count.

Future to do’s

-include additional tests, instances etc -account for age. .

Parameters

ukbio (ukbio object. Mandatory.) –
df (pandas df loaded using ukbiobank-tools. Mandatory.) –
percentage_thresh (int. Optional. Default: 0.) – This parameter determines how much of a percentage change in scoring is required before a ‘+1’ cognitive decline score is to be added for that test. E.g. if set to ‘-50’, then the RT must have gotten worse by 50%, or the number of items remembered must have dropped by 50% ..
missing_thresh (int. optional, default 0.) – This parameter determine how many mising tests the calculation will tolerate. E.g. if scores are missing for all tests, then a subject will by default recieve an overall score of 0. However, if the missing tolerance is 0, then any missing test will assign a NaN score to this subject.

Returns

out_df

Return type

pandas df containing cognitive decline score

ukbiobank.utils.utils.calculateHealthySleepScore(ukbio=None, df=None, instances=[2, 3])[source]¶

Generates a composite healthy sleep score (0-5) accoring the methods used:: https://academic.oup.com/eurheartj/article/41/11/1182/5678714#200787415

Parameters

ukbio (ukbio object. Mandatory.) –
List of int. Default = [2 (instances.) –
3] –
df (pandas df loaded using ukbiobank-tools. Mandatory.) –

Returns

out_df

Return type

pandas df containing healthy sleep score (0-5)

ukbiobank.utils.utils.fieldIdsToNames(ukbio=None, df=None, ids=None)[source]¶

Parameters

ukbio (ukbio object, mandatory) –
df (pandas dataframe (generated using ukbio loadCsv), optional) –
ids (a list of ids (can be mixed text & id) to be converted to text, optional) –

Returns

Pandas dataframe (column names converted to text names)
or
List of fieldnames

ukbiobank.utils.utils.fieldNamesToIds(ukbio=None, df=None)[source]¶

Parameters

ukbio (ukbio object, mandatory) –
df (pandas dataframe with column headers containing text name (e.g. Illness Code-2.0')) –

Returns

Return type

Pandas dataframe (column text names converted to ukbio id (e.g. ‘Illness Code-2.0’ -> 20002-2.0)

ukbiobank.utils.utils.getFieldIdsFromNames(ukbio, field_names=None)[source]¶

Parameters

ukbio (ukbio object) –
fieldNames (List of strings, mandatory) – Ukbiobank field_names.

Returns

Return type

List of ALL ukbiobank fieldId_instance_array’s associated with given fieldname.

ukbiobank.utils.utils.getFieldIdsInstancesFromCategoryId(ukbio, field_ids=None)[source]¶

Parameters

ukbio (ukbio object) –
fieldNames (List of integers, mandatory) – Ukbiobank field category ids.

Returns

Return type

List of ALL ukbiobank fieldId_instance_array’s associated with given field category id.

ukbiobank.utils.utils.getFieldIdsInstancesFromNamesInstances(ukbio, field_names=None)[source]¶

Parameters

ukbio (ukbio object) –
fieldNames (List of strings e.g. 'Sex-0.0') –

Returns

Return type

List of ALL ukbiobank fieldId_instance_array’s associated with given field category ‘id-instance.array’ .

ukbiobank.utils.utils.getFieldnames(ukbio)[source]¶

Parameters: ukbio (ukbio object) –
Returns: List of available ukbiobank fieldnames.
Return type: List

ukbiobank.utils.utils.getFieldsInstancesArrays(ukb_csv=None, data_dict=None)[source]¶

Parameters

ukbcsv (String path to ukbiobank csv file) – The default is None.
data_dict (Pandas dataframe.) – Uk biobank data dictionary

Returns

field_instance_array pandas dataframe.
This is used as a reference for decoding columns heads & filtering based on column, instance etc

ukbiobank.utils.utils.illnessCodesToText(ukbio=None, df=None)[source]¶

Parameters

ukbio (ukbio object, mandatory) –
df (pandas dataframe (generated using ukbio loadCsv)) –

Returns

Return type

Pandas dataframe (column names converted to text names)

ukbiobank.utils.utils.loadCsv(ukbio=None, fields=None, n_rows=None, instance=None)[source]¶

Parameters

ukbio (ukbio object) –
fields (List of strings, Mandatory) – Accepts UKB field category, ID or text string (or mixed), e.g. 21, ‘21-0.0’ or ‘Sex’.
instance (Integer (either 0,1,2,3), optional.) – Performs filtering of columns by instance

Returns

df – df containing all instances of chosen fields.

Return type

Pandas dataframe

ukbiobank.utils.utils.meltByInstance(ukbio=None, df=None)[source]¶

Parameters

ukbio (ukbio object. Mandatory.) –
df (pandas df loaded using ukbiobank-tools. Mandatory.) –

Returns

out_df

Return type

pandas df which has been re-fromatted to include the following columns: “Variable | Instance | Value”

ukbiobank.utils.utils.removeOutliers(df=None, std=3, cols=None)[source]¶

Parameters

df (pandas dataframe) –
std (int, defauult 3. Number of standard deviations threshold to exclude outliers.) –
cols (columns in pandas df to exlcude outliers.) –

Returns

df

Return type

Pandas dataframe

`ukbiobank.filtering.filtering`¶

Created on Wed Mar 25 13:33:29 2020

@author: jnecus

UKBiobank data filtering utilities

ukbiobank.filtering.filtering.filterByField(ukbio=None, df=None, fields_to_include=None, instances=[0, 1, 2, 3], arrays=None)[source]¶

Parameters

ukbio (ukbio object, mandatory) –
df (pandas dataframe (currently only accepts FieldID headers as column headers)) –
fields_to_include (Dictionary whereby keys: 'fields to include', values:'values to include') –
IN FIELDS_TO_INCLUDE MUST BE IN FIELD_ID FORM* e.g. '20002' (not 'Self-reported Illness') * (*FIELDS) –
IN FIELDS_TO_INCLUDE MUST BE IN CODED FORM* e.g. '1074' (*VALUES) –
'angina') * ((not) –
instances (list of integers, Default is [0,1,2,3] (include all instances)) –
arrays (list of integers) –

Returns

Pandas dataframe with data-fields filtered for selected fields, values, instances, arrays.
*This function uses ‘OR’ logic, i.e. if any of the values/fields included are present then they will be included*

ukbiobank.filtering.filtering.filterInstancesArrays(ukbio=None, df=None, instances=None, arrays=None)[source]¶

Parameters

ukbio (ukbio object, mandatory) –
df (pandas dataframe (generated using ukbio loadCsv)) –
instances (List of integers. Default is none (include all instances)) –
arrays (List of integers. Default is none (include all arrays)) –

Returns

Dataframe with datafields filtered for selected instances/arrays

Return type

Pandas dataframe

`ukbiobank.filtering.illness`¶

Created on Wed Mar 25 13:33:29 2020

@author: jnecus

UKBiobank data filtering utilities

ukbiobank.filtering.illness.healthy_unhealthy_split(ukbio=None, df=None, instances=[0, 1, 2, 3], return_filter_fields=False)[source]¶

Splits dataframe into healthy_df and unhealthy_df based upon exclusion criteria used in Cole 2020 https://www.sciencedirect.com/science/article/pii/S0197458020301056

Exclusion critera were: An ICD-10 diagnosis (#41270), Self-reported long-standing illness disability or infirmity (UK Biobank data field #2188), Self-reported diabetes (field #2443) Stroke history (field #4056), Not having good or excellent self-reported health (field #2178).

Note: According to these criteria, around ~20% of the data are ‘healthy’, with 80% deemed ‘unhealthy’

ukbio : ukbio object, mandatory

df : pandas dataframe (generated using ukbio loadCsv). Mandatory

instances : list of integers, Default is [0,1,2,3] (include all instances)

return_filter_fieldsBoolean, default False.
If True, the fields used to filter according to healthy/unhealthy criteria are included in return dataframes (this can be useful for validation and see investigate the cause of healthy/unhealthy classification). If False, returned dataframes contain the same fields as the input dataframe.

healthy_df, dataframe with individuals not matching exclusion criteria : Pandas dataframe

unhealthy_df, dataframe with individuals containing one or more matching exclusion criteria : Pandas dataframe

Generated Index ¶

Part of the sphinx build process in generate and index file: Index.

API documentation¶

ukbiobank¶

ukbiobank.utils.utils¶

ukbiobank.filtering.filtering¶

ukbiobank.filtering.illness¶

Generated Index¶

API documentation ¶

`ukbiobank`¶

`ukbiobank.utils.utils`¶

`ukbiobank.filtering.filtering`¶

`ukbiobank.filtering.illness`¶

Generated Index ¶