API documentation¶
Table of Contents
ukbiobank¶
Created on Fri Mar 27 12:38:10 2020
@author: jnecus
-
class
ukbiobank.ukbio.ukbio(ukb_csv=None)[source]¶ - Parameters
ukb_csv (String, mandatory) – Path to ukbiobank csv file. ipthon
Example usage:
import ukbiobank ukb = ukbiobank.ukbio(ukb_csv='path/to/ukbiobank_data.csv')
- Returns
ukbio objects are required as an input when using ukbiobank-tools functions. ukbio objects contain import information such as:
variable codings
path to ukbiobank csv file
- Return type
ukbio object.
ukbiobank.utils.utils¶
Created on Wed Mar 25 13:33:29 2020
@author: Joe
UKBiobank data loading utilities
-
ukbiobank.utils.utils.addFields(ukbio=None, df=None, fields=None, instances=None)[source]¶ - Parameters
ukbio (ukbio object) – ukbiobank.ukbio object
df (pandas dataframe) – df containing ukbiobank data
fields (List, Mandatory) – Accepts UKB field ID or text string (or mixed), e.g. ‘31-0.0’ or ‘Sex’.
instances (integer or list of integers, Optional) – If present, fields will be filtered for chosens instance(s) before being added.
usage:: (Example) – df = ukbiobank.utils.addFields(ukbio = ukb, df = df, fields = [‘Sex’])
- Returns
df – df containing all instances of chosen fields.
- Return type
Pandas dataframe
-
ukbiobank.utils.utils.calculateChangeInCognitiveScores(ukbio=None, df=None)[source]¶ - Parameters
ukbio (ukbio object. Mandatory.) –
df (pandas df loaded using ukbiobank-tools. Mandatory.) –
- Returns
- out_df (pandas df containing cognitive decline score)
Currently subtracts cognitive test score at instance 3 from instance 2 for the tests listed below. Output variables are labelled as ‘change in x’
’Mean time to correctly identify matches’->’Change in Mean time to correctly identify matches
’Maximum digits remembered correctly (Field ID: 4282) -> ‘Change in x
Fluid intelligence score (Field ID: 20016) -> ‘Change in x
’Number of incorrect matches in round (Field ID: 399) -> ‘Change in x
’Number of puzzles correctly solved -> ‘Change in x
’Number of puzzles correct -> ‘Change in x
’Number of word pairs correctly associated -> Change in x
Duration to complete alphanumeric path (trail #2) (Field ID: 6350) -> Change in x
’Number of symbol digit matches made correctly (Field ID: 23324) -> Change in x
-
ukbiobank.utils.utils.calculateCognitiveDeclineScore(ukbio=None, df=None, percentage_thresh=0, missing_tolerance=0)[source]¶ Currently generates a composite ‘cognitive_decline_score-3.0’ between instances 2 & 3 (imaging visits) +1 point is added if the score on a test changed by a given percentage’ between instances 2 & 3. Deafult percentage is ‘0’, i.e. any decline in a score will count, if ‘+20, improvements of 19% and any decrease in performance will count.
Future to do’s
-include additional tests, instances etc -account for age. .
- Parameters
ukbio (ukbio object. Mandatory.) –
df (pandas df loaded using ukbiobank-tools. Mandatory.) –
percentage_thresh (int. Optional. Default: 0.) – This parameter determines how much of a percentage change in scoring is required before a ‘+1’ cognitive decline score is to be added for that test. E.g. if set to ‘-50’, then the RT must have gotten worse by 50%, or the number of items remembered must have dropped by 50% ..
missing_thresh (int. optional, default 0.) – This parameter determine how many mising tests the calculation will tolerate. E.g. if scores are missing for all tests, then a subject will by default recieve an overall score of 0. However, if the missing tolerance is 0, then any missing test will assign a NaN score to this subject.
- Returns
out_df
- Return type
pandas df containing cognitive decline score
-
ukbiobank.utils.utils.calculateHealthySleepScore(ukbio=None, df=None, instances=[2, 3])[source]¶ - Generates a composite healthy sleep score (0-5) accoring the methods used:
https://academic.oup.com/eurheartj/article/41/11/1182/5678714#200787415
- Parameters
ukbio (ukbio object. Mandatory.) –
List of int. Default = [2 (instances.) –
3] –
df (pandas df loaded using ukbiobank-tools. Mandatory.) –
- Returns
out_df
- Return type
pandas df containing healthy sleep score (0-5)
-
ukbiobank.utils.utils.fieldIdsToNames(ukbio=None, df=None, ids=None)[source]¶ - Parameters
ukbio (ukbio object, mandatory) –
df (pandas dataframe (generated using ukbio loadCsv), optional) –
ids (a list of ids (can be mixed text & id) to be converted to text, optional) –
- Returns
Pandas dataframe (column names converted to text names)
or
List of fieldnames
-
ukbiobank.utils.utils.fieldNamesToIds(ukbio=None, df=None)[source]¶ - Parameters
ukbio (ukbio object, mandatory) –
df (pandas dataframe with column headers containing text name (e.g. Illness Code-2.0')) –
- Returns
- Return type
Pandas dataframe (column text names converted to ukbio id (e.g. ‘Illness Code-2.0’ -> 20002-2.0)
-
ukbiobank.utils.utils.getFieldIdsFromNames(ukbio, field_names=None)[source]¶ - Parameters
ukbio (ukbio object) –
fieldNames (List of strings, mandatory) – Ukbiobank field_names.
- Returns
- Return type
List of ALL ukbiobank fieldId_instance_array’s associated with given fieldname.
-
ukbiobank.utils.utils.getFieldIdsInstancesFromCategoryId(ukbio, field_ids=None)[source]¶ - Parameters
ukbio (ukbio object) –
fieldNames (List of integers, mandatory) – Ukbiobank field category ids.
- Returns
- Return type
List of ALL ukbiobank fieldId_instance_array’s associated with given field category id.
-
ukbiobank.utils.utils.getFieldIdsInstancesFromNamesInstances(ukbio, field_names=None)[source]¶ - Parameters
ukbio (ukbio object) –
fieldNames (List of strings e.g. 'Sex-0.0') –
- Returns
- Return type
List of ALL ukbiobank fieldId_instance_array’s associated with given field category ‘id-instance.array’ .
-
ukbiobank.utils.utils.getFieldnames(ukbio)[source]¶ - Parameters
ukbio (ukbio object) –
- Returns
List of available ukbiobank fieldnames.
- Return type
List
-
ukbiobank.utils.utils.getFieldsInstancesArrays(ukb_csv=None, data_dict=None)[source]¶ - Parameters
ukbcsv (String path to ukbiobank csv file) – The default is None.
data_dict (Pandas dataframe.) – Uk biobank data dictionary
- Returns
field_instance_array pandas dataframe.
This is used as a reference for decoding columns heads & filtering based on column, instance etc
-
ukbiobank.utils.utils.illnessCodesToText(ukbio=None, df=None)[source]¶ - Parameters
ukbio (ukbio object, mandatory) –
df (pandas dataframe (generated using ukbio loadCsv)) –
- Returns
- Return type
Pandas dataframe (column names converted to text names)
-
ukbiobank.utils.utils.loadCsv(ukbio=None, fields=None, n_rows=None, instance=None)[source]¶ - Parameters
ukbio (ukbio object) –
fields (List of strings, Mandatory) – Accepts UKB field category, ID or text string (or mixed), e.g. 21, ‘21-0.0’ or ‘Sex’.
instance (Integer (either 0,1,2,3), optional.) – Performs filtering of columns by instance
- Returns
df – df containing all instances of chosen fields.
- Return type
Pandas dataframe
ukbiobank.filtering.filtering¶
Created on Wed Mar 25 13:33:29 2020
@author: jnecus
UKBiobank data filtering utilities
-
ukbiobank.filtering.filtering.filterByField(ukbio=None, df=None, fields_to_include=None, instances=[0, 1, 2, 3], arrays=None)[source]¶ - Parameters
ukbio (ukbio object, mandatory) –
df (pandas dataframe (currently only accepts FieldID headers as column headers)) –
fields_to_include (Dictionary whereby keys: 'fields to include', values:'values to include') –
IN FIELDS_TO_INCLUDE MUST BE IN FIELD_ID FORM* e.g. '20002' (not 'Self-reported Illness') * (*FIELDS) –
IN FIELDS_TO_INCLUDE MUST BE IN CODED FORM* e.g. '1074' (*VALUES) –
'angina') * ((not) –
instances (list of integers, Default is [0,1,2,3] (include all instances)) –
arrays (list of integers) –
- Returns
Pandas dataframe with data-fields filtered for selected fields, values, instances, arrays.
*This function uses ‘OR’ logic, i.e. if any of the values/fields included are present then they will be included*
-
ukbiobank.filtering.filtering.filterInstancesArrays(ukbio=None, df=None, instances=None, arrays=None)[source]¶ - Parameters
ukbio (ukbio object, mandatory) –
df (pandas dataframe (generated using ukbio loadCsv)) –
instances (List of integers. Default is none (include all instances)) –
arrays (List of integers. Default is none (include all arrays)) –
- Returns
Dataframe with datafields filtered for selected instances/arrays
- Return type
Pandas dataframe
ukbiobank.filtering.illness¶
Created on Wed Mar 25 13:33:29 2020
@author: jnecus
UKBiobank data filtering utilities
-
ukbiobank.filtering.illness.healthy_unhealthy_split(ukbio=None, df=None, instances=[0, 1, 2, 3], return_filter_fields=False)[source]¶ Splits dataframe into healthy_df and unhealthy_df based upon exclusion criteria used in Cole 2020 https://www.sciencedirect.com/science/article/pii/S0197458020301056
Exclusion critera were: An ICD-10 diagnosis (#41270), Self-reported long-standing illness disability or infirmity (UK Biobank data field #2188), Self-reported diabetes (field #2443) Stroke history (field #4056), Not having good or excellent self-reported health (field #2178).
Note: According to these criteria, around ~20% of the data are ‘healthy’, with 80% deemed ‘unhealthy’
ukbio : ukbio object, mandatory
df : pandas dataframe (generated using ukbio loadCsv). Mandatory
instances : list of integers, Default is [0,1,2,3] (include all instances)
- return_filter_fieldsBoolean, default False.
If True, the fields used to filter according to healthy/unhealthy criteria are included in return dataframes (this can be useful for validation and see investigate the cause of healthy/unhealthy classification). If False, returned dataframes contain the same fields as the input dataframe.
healthy_df, dataframe with individuals not matching exclusion criteria : Pandas dataframe
unhealthy_df, dataframe with individuals containing one or more matching exclusion criteria : Pandas dataframe
Generated Index¶
Part of the sphinx build process in generate and index file: Index.