API Description¶

Study class¶

class tmtk.Study(study_params_path=None, minimal=False)[source]¶

Bases: tmtk.utils.validate.ValidateMixin

Describes an entire TranSMART study. This is the main object used in tmtk. Studies can be initialized by pointing to a study.params file. This study has to be structured according to specification for transmart-batch.

>>> import tmtk
>>> study = tmtk.Study('./studies/valid_study/study.params')

This will create the study object which can be used as a starting point for custom curation or directly in The Arborist.

To use the more limited 16.2 data model with transmart-batch set this option before creating this object.

>>> tmtk.options.transmart_batch_mode = True

all_files¶: All file objects in this study.

annotation_files¶: All annotation file objects in this study.

apply_blueprint(blueprint, omit_missing=False)[source]¶

Apply a blueprint to current study.

Parameters:	blueprint – blueprint object (e.g. dictionary) or link to blueprint json on disk. omit_missing – if True, then variable that are not present in the blueprint

will be set to OMIT.

call_boris(height=650)[source]¶

Launch The Arborist GUI editor for the concept tree. This starts a Flask webserver in an IFrame when running in a Jupyter Notebook.

While The Arborist is opened, the GIL prevents any other actions. :param height: set the height of the output cell

clinical_files¶: All clinical file objects in this study.

concept_tree¶: ConceptTree object for this study.

concept_tree_json¶: Stringified JSON that is used by JSTree in The Arborist.

concept_tree_to_clipboard()[source]¶: Send stringified JSON that is used by JSTree in The Arborist to clipboard.

create_clinical()[source]¶: Add clinical data to a study object by creating empty params.

ensure_metadata()[source]¶: Create the Tags object for this study. Does nothing if it is already present.

files_with_changes()[source]¶: Find dataframes that have changed since they have been loaded.

find_annotation(platform=None)[source]¶

Search for annotation data with this study and return it.

Parameters:	platform – platform id to look for in this study.
Returns:	an Annotations object or nothing.

find_params_for_datatype(datatypes=None)[source]¶

Search for parameter files within this study object and return them as list.

Parameters:	datatypes – single string datatype or list of strings
Returns:	a list of parameter objects for specific datatype in this study

get_dimensions()[source]¶: Returns a list of dimensions applicable to study

get_object_from_params_path(path)[source]¶: Returns object that belongs to the params path given

get_objects(of_type)[source]¶

Search for objects that have inherited from a certain type.

Parameters:	of_type – type to match against.
Returns:	generator for the found objects.

high_dim_files¶: All high dimensional file objects in this study.

load_to¶

publish_to_baas(url, study_name=None, username=None)[source]¶

Publishes a tree on a Boris as a Service instance.

Parameters:	url – url to a instance (e.g. http://transmart-arborist.thehyve.nl/). study_name – a nice name. username – if no username is given, you will be prompted for one.
Returns:	the url that points to the study you’ve just uploaded.

sample_mapping_files¶: All subject sample mapping file objects in this study.

security_required¶

study_blob¶: JSON data that can be loaded in the study blob. This will be added as a separate file next to the study.params. The STUDY_JSON_BLOB parameter will be set to point to this file.

study_id¶: The study ID as it is set in study params.

study_name¶: The study name, extracted from study param TOP_NODE.

tag_files¶

top_node¶

update_from_baas(url, username=None)[source]¶

Give url to a tree in BaaS.

Parameters:	url – url that has both the study and version of a tree in BaaS (e.g. http://transmart-arborist.thehyve.nl/trees/study-name/1/~edit/). username – if no username is given, you will be prompted for one.

update_from_treefile(treefile)[source]¶

Give path to a treefile (from Boris as a Service or otherwise) and update the current study to match made changes.

Parameters:	treefile – path to a treefile (stringified JSON).

validate_all(verbosity='WARNING')[source]¶

Validate all items in this study.

Parameters:	verbosity – only display output of this level and above. Levels: ‘debug’, ‘info’, ‘okay’, ‘warning’, ‘error’, ‘critical’. Default is ‘WARNING’.
Returns:	True if no errors or critical is encountered.

write_to(root_dir, overwrite=False, return_new=True)[source]¶

Write this study to a new directory on file system.

Parameters:	root_dir – the base directory to write the study to. overwrite – set this to True to overwrite existing files. return_new – if True load the study object from the new location and return it.
Returns:	new study object if return_new == True.

Params classes¶

Params Container¶

class tmtk.params.Params.Params(study_folder=None)[source]¶

Bases: tmtk.utils.validate.ValidateMixin

Container class for all params files, called by Study to locate all params files.

add_params(path, parameters=None)[source]¶

Add a new parameter file to the Params object.

Parameters:	path – a path to a parameter file. new – if new, create parameter object. parameters – add dict here with parameters if you want to create a new parameter file.

static create_params(path, parameters=None, subdir=None)[source]¶

Create a new parameter file object.

Parameters:	path – a path to a parameter file. parameters – add dict here with parameters if you want to create a new parameter file. subdir – subdir is used as string representation.
Returns:	parameter file object.

Base class: ParamsBase¶

AnnotationParams¶

class tmtk.params.AnnotationParams.AnnotationParams(path=None, parameters=None, subdir=None, parent=None)[source]¶

Bases: tmtk.params.base.ParamsBase

is_viable()[source]¶

Returns:	True if both the platform is set and the annotations file is located, else returns False.

mandatory¶

optional¶

ClinicalParams¶

class tmtk.params.ClinicalParams.ClinicalParams(path=None, parameters=None, subdir=None, parent=None)[source]¶

Bases: tmtk.params.base.ParamsBase

docslink = 'https://github.com/thehyve/transmart-batch/blob/master/docs/clinical.md'¶

mandatory¶

optional¶

HighDimParams¶

class tmtk.params.HighDimParams.HighDimParams(path=None, parameters=None, subdir=None, parent=None)[source]¶

Bases: tmtk.params.base.ParamsBase

docslink = 'https://github.com/thehyve/transmart-batch/blob/master/docs/hd-params.md'¶

is_viable()[source]¶

Returns:	True if both the datafile and map file are located, else returns False.

mandatory¶

optional¶

StudyParams¶

class tmtk.params.StudyParams.StudyParams(*args, **kwargs)[source]¶

Bases: tmtk.params.base.ParamsBase

docslink = 'https://github.com/thehyve/transmart-batch/blob/master/docs/study-params.md'¶

mandatory¶

optional¶

write_to(path, *args, **kwargs)[source]¶

Writes parameters in object to file in path. Does not overwrite existing files unless specifically told.

Parameters:	path – path to store parameters to. overwrite – allow overwriting existing files.

TagsParams¶

class tmtk.params.TagsParams.TagsParams(path=None, parameters=None, subdir=None, parent=None)[source]¶

Bases: tmtk.params.base.ParamsBase

is_viable()[source]¶

Returns:	True if both the column mapping file is located, else returns False.

mandatory¶

optional¶

Clinical classes¶

Clinical Container¶

class tmtk.clinical.Clinical(clinical_params=None)[source]¶

Bases: tmtk.utils.validate.ValidateMixin

Container class for all clinical data related objects, i.e. the column mapping, word mapping, and clinical data files.

This object has methods that add data files, and for lookups of clinical files and variables.

ColumnMapping¶

add_datafile(filename, dataframe=None)[source]¶

Add a clinical data file to study.

Parameters:	filename – path to file or filename of file in clinical directory. dataframe – if given, add pd.DataFrame to study.

all_variables¶: Dictionary where {tmtk.VarID: tmtk.Variable} for all variables in the column mapping file.

apply_blueprint(blueprint, omit_missing=False)[source]¶

Update the column mapping by applying a template.

Parameters:

blueprint –
expected input is a dictionary where keys are column names as found in clinical datafiles. Each column header name has a dictionary describing the path and data label and other information. For example:

{

“GENDER”: {

“path”: “CharacteristicsDemographics”, “label”: “Gender”, “concept_code”: “SNOMEDCT/263495000”, “metadata_tags”: {

”Info”: “As measured when born.”

}, “force_categorical”: “Y”, “word_map”: {

”goo”: “values”, “pile”: “list”

}, “expected_categorical”: [

”pile”, “of”, “goo”

]

}, “BPBASE”: {

”path”: “Lab resultsBlood”, “label”: “Blood pressure (baseline)”, “expected_numerical”: {

”min”: 1, “max”: 9

}

}

}
omit_missing – if True, then variable that are not present in the blueprint

will be set to OMIT.

clinical_files¶

filtered_variables¶: Dictionary where {tmtk.VarID: tmtk.Variable} for all variables in the column mapping file that do not have a data label in the RESERVED_KEYWORDS list

find_variables_by_label(label: str, in_file: str = None) → list[source]¶

Search for variables based on data label. All labels are converted to lower case.

Parameters:	label – in_file –
Returns:

get_datafile(name: str)[source]¶

Find datafile object by filename.

Parameters:	name – name of file.
Returns:	tmtk.DataFile object.

get_patients()[source]¶

Creates a dictionary that has subject identifiers as keys and each value is a map that contains an nothing or an ‘age’ and/or ‘gender’ key that maps to this value.

Returns:	patients dict.

get_trial_visits()[source]¶

Returns a list of all trial visits present in this study. Visits are identified by the TRIAL_VISIT_LABEL keyword in column mapping and can be annotated with a value and unit using the TrialVisits object.

Returns:	list of dicts.

get_variable(var_id: tuple)[source]¶

Return a Variable object based on variable id.

Parameters:	var_id – tuple of filename and column number.
Returns:	tmtk.Variable.

load_to¶

params¶

show_changes()[source]¶: Print changes made to the column mapping and word mapping file.

validate_all(verbosity=3)[source]¶

ColumnMapping¶

class tmtk.clinical.ColumnMapping(params=None)[source]¶

Bases: tmtk.utils.filebase.FileBase, tmtk.utils.validate.ValidateMixin

Class with utilities for the column mapping file for clinical data. Can be initiated with by giving a clinical params file object.

RESERVED_KEYWORDS = ('SUBJ_ID', 'START_DATE', 'END_DATE', 'MODIFIER', 'TRIAL_VISIT_LABEL', 'INSTANCE_NUM', 'DATA_LABEL', 'VISIT_NAME', 'SITE_ID', '\\', 'OMIT', 'PATIENT_VISIT')¶

append_from_datafile(datafile)[source]¶

Appends the column mapping file with rows based on datafile column names.

Parameters:	datafile – tmtk.DataFile object.

build_index(df=None)[source]¶

Build index for the column mapping dataframe. If pd.DataFrame (optional) is given, modify and return that.

Parameters:	df – pd.DataFrame.
Returns:	pd.DataFrame.

create_df()[source]¶

Create pd.DataFrame with a correct header.

Returns:	pd.DataFrame.

get_concept_path(var_id: tuple)[source]¶

Return concept path for given variable identifier tuple.

Parameters:	var_id – tuple of filename and column number.
Return str:	concept path for this variable.

ids¶: A list of variable identifier tuples.

included_datafiles¶: List of datafiles included in column mapping file.

path_changes(silent=False)[source]¶

Determine changes made to column mapping file.

Parameters:	silent – if True, only print output.
Returns:	if silent=False return dictionary with changes since load.

path_id_dict¶: Dictionary with all variable ids as keys and paths as value.

select_row(var_id: tuple)[source]¶

Select row based on variable identifier tuple. Raises exception if variable is not in this column mapping.

Parameters:	var_id – tuple of filename and column number.
Returns:	list of items in selected row.

set_column_type(var_id: tuple, value: str)[source]¶

Set variable to a given data type.

Parameters:	var_id – tuple of filename and column number. value – value to set column type to.

set_concept_code(var_id: tuple, value)[source]¶

Set the concept code for a variable.

Parameters:	var_id – tuple of filename and column number. value – value to set concept code to.

set_concept_path(var_id: tuple, path=None, label=None)[source]¶

Set the concept path or data label for given variable identifier tuple.

Parameters:	var_id – tuple of filename and column number. path – new value for path. label – new value for data label.

set_reference_column(var_id: tuple, value)[source]¶

Set the reference column for a variable, this is used for modifiers to specify which columns are affected by this modifier variable.

Parameters:	var_id – tuple of filename and column number. value – value to set reference column to.

subj_id_columns¶: A list of tuples with datafile and column index for SUBJ_ID, e.g. (‘cell-line.txt’, 1).

DataFile¶

class tmtk.clinical.DataFile(path=None)[source]¶

Bases: tmtk.utils.filebase.FileBase

Class for clinical data files, does not do much more than tmkt.FileBase.

Variable¶

class tmtk.clinical.Variable(datafile, column: int = None, clinical_parent=None)[source]¶

Bases: object

Base class for clinical variables

VIS_CATEGORICAL = 'LAC'¶

VIS_DATE = 'LAD'¶

VIS_NUMERIC = 'LAN'¶

VIS_TEXT = 'LAT'¶

category_code¶

The second column of the column mapping file for this variable. This combines with self.data_label to create self.concept_path.

Returns:	str.

column_map_data¶

Column mapping row as dictionary where keys are short descriptors.

Returns:	dict.

column_type¶: Column data type setting can be found in modifiers file for MODIFIER vars, else it is in the DataType column of column mapping. If it is not found, it will be either numerical or categorical based on the datafile values.

concept_code¶

concept_path¶

Concept path after conversions by transmart-batch. Combination of self.category_code and self.data_label. Cannot be set.

Returns:	str.

data_label¶

Variable data label.

Returns:	str.

end_date¶

forced_categorical¶

Check if forced categorical by entering ‘CATEGORICAL’ in data type column. Can be changed by setting this to True or False.

Returns:	bool.

header¶

is_empty¶

Check if variable is fully empty.

Returns:	bool.

is_in_wordmap¶

Check if variable is represented in word mapping file.

Returns:	bool.

is_numeric¶

True if transmart-batch will load this concept as numerical. This includes information from word mapping and column mapping.

Returns:	bool.

is_numeric_in_datafile¶

True if the datafile contains only numerical items.

Returns:	bool.

mapped_values¶

Data items after word mapping.

Returns:	list.

max¶

min¶

modifier_code¶: Requires implementation, always returns ‘@’.

modifiers¶

Returns a list of all modifier variable that apply to this variable. The data label for these variables have to be ‘MODIFIER’ and the fifth column (reference column) has to either be empty or the column this variable has.

Returns:	list of modifier variables.

reference_column¶

start_date¶

subj_id¶

trial_visit¶

unique_values¶

Returns:	Unique set of values in the datafile.

values¶

Returns:	All values as found in the datafile.

var_id¶

Returns:	Variable identifier tuple (datafile.name, column).

visual_attributes¶

word_map_dict¶

A dictionary with word mapped categoricals. Keys are items in the datafile, values are what they will be mapped to through the word mapping file. Unmapped items are also added as key, value pair.

Returns:	dict.

word_mapped_not_present()[source]¶

Gets the values that are in the word map but not in the data file.

Returns:	set.

WordMapping¶

class tmtk.clinical.WordMapping(params=None)[source]¶

Bases: tmtk.utils.filebase.FileBase, tmtk.utils.validate.ValidateMixin

Class representing the word mapping file.

build_index(df=None)[source]¶

Build and sort multi-index for dataframe based on filename and column number columns. If no df parameter is not set, build index for self.df.

Parameters:	df – pd.DataFrame.
Returns:	pd.DataFrame.

create_df()[source]¶

Create pd.DataFrame with a correct header.

Returns:	pd.DataFrame.

get_word_map(var_id)[source]¶

Return dict with value in data file, and the mapped value as keyword-value pairs.

Parameters:	var_id – tuple of filename and column number.
Returns:	dict.

included_datafiles¶: List of datafiles included in word mapping file.

set_word_map(var_id, d)[source]¶

Set the word mapping for specific variable based on its filename and column number.

Parameters:	var_id – variable identifier tuple. d – dictionary that contains the value map.

word_map_changes(silent=False)[source]¶

Determine changes made to word mapping file.

Parameters:	silent – if True, only print output.
Returns:	if silent=False return dictionary with changes since load.

word_map_dicts¶: Dictionary with all variable ids as keys and word map dicts as value.

Annotations¶

Annotations Container¶

class tmtk.annotation.Annotations.Annotations(params_list=None, parent=None)[source]¶

Bases: object

Class containing all AnnotationFile objects.

annotation_files¶

validate_all(verbosity=3)[source]¶

Base class: AnnotationBase¶

class tmtk.annotation.AnnotationBase.AnnotationBase(params=None, path=None)[source]¶

Bases: tmtk.utils.filebase.FileBase, tmtk.utils.validate.ValidateMixin

Base class for annotation files.

load_to¶

marker_type¶

ChromosomalRegions¶

class tmtk.annotation.ChromosomalRegions.ChromosomalRegions(params=None, path=None)[source]¶

Bases: tmtk.annotation.AnnotationBase.AnnotationBase

Subclass for CNV (aCGh, qDNAseq) annotation

biomarkers¶

MicroarrayAnnotation¶

class tmtk.annotation.MicroarrayAnnotation.MicroarrayAnnotation(params=None, path=None)[source]¶

Bases: tmtk.annotation.AnnotationBase.AnnotationBase

Subclass for microarray (mRNA) expression annotation files.

biomarkers¶

MirnaAnnotation¶

class tmtk.annotation.MirnaAnnotation.MirnaAnnotation(params=None, path=None)[source]¶

Bases: tmtk.annotation.AnnotationBase.AnnotationBase

Subclass for micro RNA (miRNA) expression annotation files.

biomarkers¶

ProteomicsAnnotation¶

class tmtk.annotation.ProteomicsAnnotation.ProteomicsAnnotation(params=None, path=None)[source]¶

Bases: tmtk.annotation.AnnotationBase.AnnotationBase

Subclass for proteomics annotation

biomarkers¶

High Dimensional data¶

HighDim¶

class tmtk.highdim.HighDim.HighDim(params_list=None, parent=None)[source]¶

Bases: tmtk.utils.validate.ValidateMixin

Container class for all High Dimensional data types.

Parameters:	params_list – contains a list with Params objects.

high_dim_files¶

sample_mapping_files¶

update_high_dim_paths(high_dim_paths)[source]¶

Update sample mapping if path has been changed.

Parameters:	high_dim_paths – dictionary with paths and old concept paths.

validate_all(verbosity='INFO')[source]¶

HighDimBase¶

class tmtk.highdim.HighDimBase.HighDimBase(params=None, path=None, parent=None)[source]¶

Bases: tmtk.utils.filebase.FileBase, tmtk.utils.validate.ValidateMixin

Base class for high dimensional data structures.

load_to¶

CopyNumberVariation¶

class tmtk.highdim.CopyNumberVariation.CopyNumberVariation(params=None, path=None, parent=None)[source]¶

Bases: tmtk.highdim.HighDimBase.HighDimBase

Base class for copy number variation datatypes (aCGH, qDNAseq)

allowed_header¶

remap_to(destination=None)[source]¶

Parameters:	destination –
Returns:

samples¶

Expression¶

class tmtk.highdim.Expression.Expression(params=None, path=None, parent=None)[source]¶

Bases: tmtk.highdim.HighDimBase.HighDimBase

Base class for microarray mRNA expression data.

samples¶

Mirna¶

class tmtk.highdim.Mirna.Mirna(params=None, path=None, parent=None)[source]¶

Bases: tmtk.highdim.HighDimBase.HighDimBase

Base class for proteomics data.

samples¶

Proteomics¶

class tmtk.highdim.Proteomics.Proteomics(params=None, path=None, parent=None)[source]¶

Bases: tmtk.highdim.HighDimBase.HighDimBase

Base class for proteomics data.

samples¶

ReadCounts¶

class tmtk.highdim.ReadCounts.ReadCounts(params=None, path=None, parent=None)[source]¶

Bases: tmtk.highdim.HighDimBase.HighDimBase

Subclass for ReadCounts.

allowed_header¶

remap_to(destination=None)[source]¶

Parameters:	destination –
Returns:

samples¶

SampleMapping¶

class tmtk.highdim.SampleMapping.SampleMapping(path=None)[source]¶

Bases: tmtk.utils.filebase.FileBase, tmtk.utils.validate.ValidateMixin

Base class for subject sample mapping

get_concept_paths¶

Get all concept paths from file, replaces ATTR1 and ATTR2.

Returns:	dictionary with md5 hash values as key and paths as value

platform¶

Returns:	the platform id in this sample mapping file.

samples¶

slice_path(path)[source]¶: Give slice of the dataframe where the paths are equal to given path. :param path: path (will be converted using global logic). :return: slice of dataframe.

study_id¶

Returns:	study_id in sample mapping file

update_concept_paths(path_dict)[source]¶

Metadata Tags¶

Tags¶

class tmtk.tags.Tags.MetaDataTags(params=None, parent=None)[source]¶

Bases: tmtk.utils.filebase.FileBase, tmtk.utils.validate.ValidateMixin

apply_blueprint(blueprint)[source]¶

Add metadata tags from a blueprint object.

Parameters:	blueprint – blueprint object.

static create_df()[source]¶

get_tags()[source]¶

generator that gets tags from tags file.

Returns:	tuples (<path>, <title>, <description>)

invalid_paths¶

load_to¶

tag_paths¶: Return tag paths delimited by the path_converter.

Utilities¶

FileBase¶

class tmtk.utils.filebase.FileBase[source]¶

Bases: object

Super class with shared utilities for file objects.

df¶: The pd.DataFrame for this file object.

df_has_changed¶

header¶

name¶

save()[source]¶: Overwrite the original file with the current dataframe.

tabs_in_first_line()[source]¶: Check if file is tab delimited.

write_to(path, overwrite=False)[source]¶

Wrapper for tmtk.utils.df2file().

Parameters:	path – path to write file to. overwrite – write over existing files in the filesystem)

Generic module¶

tmtk.utils.Generic.clean_for_namespace(path) → str[source]¶

Converts a path and returns a namespace safe variant. Converts characters that give errors to underscore.

Parameters:	path – usually a descriptive subdirectory
Returns:	string

tmtk.utils.Generic.column_map_diff(a_column, b_column)[source]¶

tmtk.utils.Generic.df2file(df=None, path=None, overwrite=False)[source]¶

Write a dataframe to file safely. Does not overwrite existing files automatically. This function converts concept path delimiters.

Parameters:	df – pd.DataFrame path – path to write to overwrite – False (default) or True

tmtk.utils.Generic.file2df(path=None)[source]¶

Load a file specified by path into a Pandas dataframe. If hashed is True, return a a (dataframe, hash) value tuple.

Parameters:	path – to file to load
Returns:	pd.DataFrame

tmtk.utils.Generic.find_fully_unique_columns(df)[source]¶

Check if a dataframe contains a fully unique column (SUBJ_ID candidate).

Parameters:	df – pd.DataFrame
Returns:	list of names of unique columns

tmtk.utils.Generic.fix_everything()[source]¶

Scans over all the data and indicates which errors have been fixed. This function is great for stress relieve.

Returns:	All your problems fixed by Rick

tmtk.utils.Generic.is_not_a_value(value)[source]¶: Returns whether value is None, pd.np.nan, or an empty string

tmtk.utils.Generic.md5(s: str)[source]¶

utf-8 encoded md5 hash string of input s.

Parameters:	s – string
Returns:	md5 hash string

tmtk.utils.Generic.merge_two_dicts(x, y)[source]¶: Given two dicts, merge them into a new dict as a shallow copy.

tmtk.utils.Generic.path_converter(path, to_internal=False, from_internal=False)[source]¶

Convert paths by creating delimiters of backslash “” and “+” sign, additionally converting underscores “_” to a single space.

Parameters:	path – concept path to_internal – if path is for internal use delimit with Mappings.PATH_DELIM from_internal – replace + and _ with escaped versions.
Returns:	delimited path

tmtk.utils.Generic.path_join(*args)[source]¶

Join items with the used path delimiter.

Parameters:	args – path items
Returns:	path as string

tmtk.utils.Generic.summarise(list_or_dict=None, max_items: int = 7) → str[source]¶

Takes an iterable and returns a summarized string statement. Picks a random sample if number of items > max_items.

Parameters:	list_or_dict – list or dict to summarise max_items – maximum number of items to keep.
Returns:	the items joined as string with end statement.

tmtk.utils.Generic.word_map_diff(a_word_map, b_word_map)[source]¶

utils.CPrint module¶

utils.Exceptions module¶

exception tmtk.utils.Exceptions.ArboristException[source]¶: Bases: Exception

exception tmtk.utils.Exceptions.BlueprintException[source]¶: Bases: Exception

exception tmtk.utils.Exceptions.ClassError(found=None, expected=None)[source]¶

Bases: Exception

Error raised when unexpected class is found.

Parameters:	found – is the Object class of found expected – is the required Object class

exception tmtk.utils.Exceptions.DatatypeError(found=None, expected=None)[source]¶

Bases: Exception

Error raised when incorrect datatype is found.

Parameters:	found – is the datatype of object expected – is the required datatype

exception tmtk.utils.Exceptions.PathError(found=None)[source]¶

Bases: Exception

Error raised when an incorrect path is given.

exception tmtk.utils.Exceptions.ReservedKeywordException[source]¶: Bases: Exception

exception tmtk.utils.Exceptions.TooManyValues(found=None, expected=None, id_=None)[source]¶

Bases: Exception

Error raised when too many values are found.

utils.HighDimUtils module¶

utils.mappings module¶

class tmtk.utils.mappings.Mappings[source]¶

Bases: object

Collection of statics used in various parts of the code.

EXT_PATH_DELIM = '\\'¶

PATH_DELIM = '∕'¶

ancestors = 'Ancestors'¶

annotation_data_types = {'cnv': 'ACGH data', 'expression': 'Messenger RNA data (microarray)', 'mirna': 'micro RNA data (PCR)', 'proteomics': 'Proteomics data (mass spec)', 'rnaseq': 'Messenger RNA data (sequencing)', 'vcf': 'Genomic variant data'}¶

annotation_marker_types = {'cnv_annotation': 'Chromosomal', 'mirna_annotation': 'MIRNA_QPCR', 'mrna_annotation': 'Gene expression', 'proteomics_annotation': 'PROTEOMICS', 'rnaseq_annotation': 'RNASEQ_RCNT', 'vcf_annotation': 'VCF'}¶

blob = 'blob'¶

cat_cd = 'Category Code'¶

cat_cd_s = 'ccd'¶

col_num = 'Column Number'¶

col_num_s = 'col'¶

column_mapping_header = ['Filename', 'Category Code', 'Column Number', 'Data Label', 'Magic 5th', 'Ontology Code', 'Data Type']¶

column_mapping_s = ['fn', 'ccd', 'col', 'dl', 'm5', 'm6', 'cty']¶

column_type = 'Data Type'¶

concept_type = 'Data Type'¶

concept_type_s = 'cty'¶

data_label = 'Data Label'¶

data_label_s = 'dl'¶

df_value = 'Datafile Value'¶

df_value_s = 'dfv'¶

filename = 'Filename'¶

filename_s = 'fn'¶

static get_annotations(dtype=None)[source]¶

Return mapping for annotations classes. Return only for datatype if dtype is set. Else return full map.

Parameters:	dtype – optional datatype (e.g. cnv_annotation)
Returns:	dict with mapping, or class.

static get_highdim(dtype=None)[source]¶

Return mapping for high dimensional classes. Return only for datatype if dtype is set. Else return full map.

Parameters:	dtype – optional datatype (e.g. cnv)
Returns:	dict with mapping, or class.

static get_params(dtype=None)[source]¶

Return mapping for params classes. Return only for datatype if dtype is set. Else return full map.

Parameters:	dtype – optional datatype (e.g. cnv)
Returns:	dict with mapping, or class.

magic_5 = 'Magic 5th'¶

magic_5_s = 'm5'¶

magic_6_s = 'm6'¶

map_value = 'Mapping Value'¶

map_value_s = 'map'¶

modifier_cd = 'modifier_cd'¶

modifier_path = 'modifier_path'¶

modifiers_header = ['modifier_path', 'modifier_cd', 'name_char', 'Data Type']¶

name_char = 'name_char'¶

ontology_code = 'Ontology Code'¶

ontology_header = ['Ontology Code', 'Label', 'Ancestors', 'blob']¶

tags_description = 'Description'¶

tags_header = ['Concept Path', 'Title', 'Description', 'Weight']¶

tags_node_name = 'Tags'¶

tags_path = 'Concept Path'¶

tags_title = 'Title'¶

tags_weight = 'Weight'¶

term_label = 'Label'¶

trial_visits_header = ['name', 'relative_time', 'time_unit']¶

tv_label = 'name'¶

tv_unit = 'time_unit'¶

tv_value = 'relative_time'¶

word_mapping_header = ['Filename', 'Column Number', 'Datafile Value', 'Mapping Value']¶

Toolbox package¶

Generate chromosomal regions file¶

tmtk.toolbox.generate_chromosomal_regions_file.generate_chromosomal_regions_file(platform_id=None, reference_build='hg19', **kwargs)[source]¶

This creates a new chromosomal regions annotation file.

Parameters:	platform_id – Give the new platform a name to fill first column reference_build – choose either hg18, hg19 or hg38
Returns:	a pandas dataframe with the new platform

Remap chromosomal regions data¶

tmtk.toolbox.remap_chromosomal_regions.map_index_to_region_ids(gene, origin_platform, region_origin)[source]¶

tmtk.toolbox.remap_chromosomal_regions.remap_chromosomal_regions(origin_platform=None, destination_platform=None, datafile=None, flag_indicator='.flag', to_dest=2, start_dest=3, end_dest=4, region_dest=1, chr_origin=2, start_origin=3, end_origin=4, region_origin=1, region_data=0)[source]¶

tmtk.toolbox.remap_chromosomal_regions.return_mean(datafile, mapping, flag_columns=None)[source]¶

Study Wizard¶

tmtk.toolbox.wizard.create_study(path)[source]¶

Start a study object by pointing by giving a folder that contains clinical data files only.

Parameters:	path – path to folder with files.
Returns:	study object.

Create study from templates¶

tmtk.toolbox.create_study_from_templates(ID, source_dir, output_dir=None, sec_req='Y')[source]¶

Create tranSMART files in designated output_dir for all data provided in templates in the source_dir.

Parameters:	ID – study ID. source_dir – directory containing all the templates. output_dir – directory where the output should be written. sec_req – security required? “Y” or “N”, default=”Y”.
Returns:	None

The Arborist¶

tmtk.arborist.common module¶

tmtk.arborist.common.call_boris(study=None, **kwargs)[source]¶

This function loads the Arborist if it has been properly installed in your environment.

Parameters:	study – a <tmtk.Study> object.

tmtk.arborist.common.launch_arborist_gui(json_data: str, height=650)[source]¶

Parameters:	json_data – json data to launch the Arborist with. height – IFrame height for output cell.

tmtk.arborist.common.update_study_from_json(study, json_data)[source]¶

Update an existing tmtk.Study object with the JSON response from the Arborist.

Parameters:	study – tmtk.Study object. json_data – json response from Arborist.

tmtk.arborist.connect_to_baas module¶

tmtk.arborist.connect_to_baas.get_instance_url(url)[source]¶

tmtk.arborist.connect_to_baas.get_json_from_baas(url, username=None)[source]¶

Get a json file from a Boris as a Service instance.

Parameters:	url – url should study name and version. (e.g. http://transmart-arborist.thehyve.nl/trees/study-name/1/~edit/). username – if no username is given, you will be prompted for one.
Returns:	the JSON string from BaaS.

tmtk.arborist.connect_to_baas.json_url(url)[source]¶

tmtk.arborist.connect_to_baas.login_url(url)[source]¶

tmtk.arborist.connect_to_baas.publish_to_baas(url, json, study_name, username=None)[source]¶

Publishes a tree on a Boris as a Service instance.

Parameters:	url – url to a BaaS instance. json – the stringified json you want to publish. study_name – a nice name. username – if no username is given, you will be prompted for one.
Returns:	the url that points to the study you’ve just uploaded.

tmtk.arborist.connect_to_baas.start_session(url, username)[source]¶

tmtk.arborist.jstreecontrol module¶

class tmtk.arborist.jstreecontrol.ConceptNode(path, var_id=None, node_type='numeric', data_args=None)[source]¶: Bases: object

class tmtk.arborist.jstreecontrol.ConceptTree(json_data=None)[source]¶

Bases: object

Build a ConceptTree to be used in the graphical tree editor.

add_node(path, var_id=None, node_type=None, data_args=None)[source]¶

Add ConceptNode object nodes list.

Parameters:	path – Concept path for this node. var_id – Unique ID that allows to keep track of a node. node_type – Explicitly set node type (highdim, numerical, categorical) data_args – Any additional parameters are put a ‘data’ dictionary.

column_mapping_file¶

Returns:	Column Mapping file based on ConceptTree object.

high_dim_paths¶: All high dimensional nodes in concept tree as dict

jstree¶

tags_file¶

word_mapping¶

class tmtk.arborist.jstreecontrol.JSNode(path, oid=None, **kwargs)[source]¶

Bases: object

This class exists as a helper to the JSTree. Its “json_data” method can generate sub-tree JSON without putting the logic directly into the JSTree.

get_child(var_id, text)[source]¶

json_data()[source]¶

class tmtk.arborist.jstreecontrol.JSTree(concept_nodes)[source]¶

Bases: object

An json like object that converts a list of nodes into something that jQuery jstree can use.

json_data¶: Convert this object to json ready to be consumed by jstree.

json_data_string¶

Returns:	Returns the json_data properly formatted as string.

pretty(root=None, depth=0, spacing=2)[source]¶: Create a pretty representation of tree.

to_clipboard()[source]¶

class tmtk.arborist.jstreecontrol.MyEncoder(skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]¶

Bases: json.encoder.JSONEncoder

Overwriting the standard JSON Encoder to treat numpy ints as native ints.

default(obj)[source]¶

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)

tmtk.arborist.jstreecontrol.create_concept_tree(column_object)[source]¶

Parameters:	column_object – tmtk.Study object, tmtk.Clinical object, or ColumnMapping dataframe
Returns:	json string to be interpreted by the JSTree

tmtk.arborist.jstreecontrol.create_tree_from_clinical(clinical_object, concept_tree=None)[source]¶

Parameters:	clinical_object – concept_tree –
Returns:

tmtk.arborist.jstreecontrol.create_tree_from_study(study, concept_tree=None)[source]¶

Parameters:	study – concept_tree –
Returns: