Example Guide
Guides lead a user through a specific task they want to accomplish, often with a sequence of steps. Writing a good guide requires thinking about what your users are trying to do.
Further reading
Section titled “Further reading”- Read about how-to guides in the Diátaxis framework
NLP Lab (Annotation Lab) Interface Module
Section titled “NLP Lab (Annotation Lab) Interface Module”Spark NLP for Healthcare provides functionality to interact with the NLP Lab using easy-to-use functions. NLP Lab is a tool for multi-modal data annotation. It allows annotation teams to efficiently collaborate to generate training data for ML models and/or to validate automatic annotations generated by those.
NLP Lab Intreacting Module provides programmatic interactions with the NLP Lab. A detailed usage examples can be found at Complete NLP Lab Module SparkNLP JSL, and Python’s documentation in the Python API. Following are the functionalities supported by the module:
- Generating a CoNLL formatted file from the annotation JSON for training an NER model.
- Generating a csv/excel formatted file from the annotation JSON for training classification, assertion, and relation extraction models.
- Build preannotation JSON file using Spark NLP pipelines, saving it as a JSON and uploading preannotations to a project.
- Interacting with the NLP Lab instance, and setting up projects for NLP Lab.
- Getting the list of all projects in the NLP Lab instance.
- Creating New Projects.
- Deleting Projects.
- Setting & editing configuration of projects.
- Accessing/getting configuration of any existing project.
- Upload tasks to a project.
- Deleting tasks of a project.
Start Module
Section titled “Start Module”# import the modulefrom sparknlp_jsl.alab import AnnotationLabalab = AnnotationLab()Generate Data for Traing a Classification Model
Section titled “Generate Data for Traing a Classification Model”alab.get_classification_data(
# required: path to NLP Lab JSON exportinput_json_path='alab_demo.json',
# optional: set to True to select ground truth completions, False to select latest completions,# defaults to False# ground_truth=False)Converting The Json Export into a Conll Format Suitable for Training an Ner Model
Section titled “Converting The Json Export into a Conll Format Suitable for Training an Ner Model”alab.get_conll_data(
# required: Spark session with spark-nlp-jsl jarspark=spark,
# required: path to NLP Lab JSON exportinput_json_path="alab_demo.json",
# required: name of the CoNLL file to saveoutput_name="conll_demo",
# optional: path for CoNLL file saving directory, defaults to 'exported_conll'# save_dir="exported_conll",
# optional: set to True to select ground truth completions, False to select latest completions,# defaults to False# ground_truth=False,
# optional: labels to exclude from CoNLL; these are all assertion labels and irrelevant NER labels,# defaults to empty list# excluded_labels=['ABSENT'],
# optional: set a pattern to use regex tokenizer, defaults to regular tokenizer if pattern not defined# regex_pattern="\\s+|(?=[-.:;*+,$&%\\[\\]])|(?<=[-.:;*+,$&%\\[\\]])"
# optional: list of NLP Lab task titles to exclude from CoNLL, defaults to empty list# excluded_task_ids = [2, 3]
# optional: list of NLP Lab task titles to exclude from CoNLL, defaults to None# excluded_task_titles = ['Note 1'])