Skip to content

Example Guide

Guides lead a user through a specific task they want to accomplish, often with a sequence of steps. Writing a good guide requires thinking about what your users are trying to do.

Spark NLP for Healthcare provides functionality to interact with the NLP Lab using easy-to-use functions. NLP Lab is a tool for multi-modal data annotation. It allows annotation teams to efficiently collaborate to generate training data for ML models and/or to validate automatic annotations generated by those.

NLP Lab Intreacting Module provides programmatic interactions with the NLP Lab. A detailed usage examples can be found at Complete NLP Lab Module SparkNLP JSL, and Python’s documentation in the Python API. Following are the functionalities supported by the module:

  • Generating a CoNLL formatted file from the annotation JSON for training an NER model.
  • Generating a csv/excel formatted file from the annotation JSON for training classification, assertion, and relation extraction models.
  • Build preannotation JSON file using Spark NLP pipelines, saving it as a JSON and uploading preannotations to a project.
  • Interacting with the NLP Lab instance, and setting up projects for NLP Lab.
  • Getting the list of all projects in the NLP Lab instance.
  • Creating New Projects.
  • Deleting Projects.
  • Setting & editing configuration of projects.
  • Accessing/getting configuration of any existing project.
  • Upload tasks to a project.
  • Deleting tasks of a project.
# import the module
from sparknlp_jsl.alab import AnnotationLab
alab = AnnotationLab()

Generate Data for Traing a Classification Model

Section titled “Generate Data for Traing a Classification Model”
alab.get_classification_data(
# required: path to NLP Lab JSON export
input_json_path='alab_demo.json',
# optional: set to True to select ground truth completions, False to select latest completions,
# defaults to False
# ground_truth=False)

Converting The Json Export into a Conll Format Suitable for Training an Ner Model

Section titled “Converting The Json Export into a Conll Format Suitable for Training an Ner Model”
alab.get_conll_data(
# required: Spark session with spark-nlp-jsl jar
spark=spark,
# required: path to NLP Lab JSON export
input_json_path="alab_demo.json",
# required: name of the CoNLL file to save
output_name="conll_demo",
# optional: path for CoNLL file saving directory, defaults to 'exported_conll'
# save_dir="exported_conll",
# optional: set to True to select ground truth completions, False to select latest completions,
# defaults to False
# ground_truth=False,
# optional: labels to exclude from CoNLL; these are all assertion labels and irrelevant NER labels,
# defaults to empty list
# excluded_labels=['ABSENT'],
# optional: set a pattern to use regex tokenizer, defaults to regular tokenizer if pattern not defined
# regex_pattern="\\s+|(?=[-.:;*+,$&%\\[\\]])|(?<=[-.:;*+,$&%\\[\\]])"
# optional: list of NLP Lab task titles to exclude from CoNLL, defaults to empty list
# excluded_task_ids = [2, 3]
# optional: list of NLP Lab task titles to exclude from CoNLL, defaults to None
# excluded_task_titles = ['Note 1'])