image4.jpg

For most of the summer of ’22, I spent my time as a Google Summer of Code (GSoC) contributor, contributing to the KerasNLP library$^{[1]}$ (part of the TensorFlow organisation). This blog will walk the reader through my experience and contributions.

KerasNLP

Before we dive into the details, let's talk a bit more about KerasNLP. KerasNLP is an API which exposes NLP building blocks (layers, models, metrics) in Keras. Its focus is similar to core Keras - simplicity and ease of usage. My take is that KerasNLP has something for every user - easy customisability for the academic; ready-to-use, out-of-the-box models and pipelines for use by the industry, and readable, modular code for the student. Therein lies the strength of KerasNLP - it caters to a wide array of users.

Mentors

I was mentored by Matthew Watson and Chen Qian, two very cool people, who work in the Keras team at Google. Other than helping me with the technical nitty-gritties, they were very accommodating and supportive throughout my journey!

My Role

KerasNLP is a relatively new library, and is still in pre-release. This gave me a plethora of opportunities to contribute. I principally worked on adding metrics, tokenisers and models, improving the performance of text generation functions and fixing bugs.

Metrics

Pre-GSoC, I had worked on adding Perplexity as a Keras metric$^{[2]}$. Keras metrics follow a standard design — any Keras metric is a class inherited from the abstract base Keras metric class$^{[3]}$, with four methods which have to be overriden: update_state() , result() , reset_state() and get_config() . These methods are pretty self-explanatory — update_state()takes the current batch as input and “updates” the metric value, result() gives you the metric value, reset_state()allows you to forget the current metric value and start computation all over again. Keras metrics can be passed to model.compile() before calling model.fit() for live evaluation during training, testing, etc.

The aim was to expose NLP metrics which are not a part of core Keras, as APIs. For instance, metrics such as accuracy, precision, etc. which are extensively used for NLP tasks like text classification, Named Entity Recognition (NER) and so on, are already exposed as APIs in core Keras. We shortlised ROUGE score, BLEU score and Edit Distance. ROUGE and BLEU are used in text generation tasks such as summarisation and neural machine translation, whereas edit distance is used in tasks like automatic OCR, and speech-to-text.

Metric Pull Request Link Status
ROUGE-N Score Merged
ROUGE-L Score Merged
BLEU Score Merged
Edit Distance Merged

The main challenge we faced with the text generation metrics mentioned above is implementing them using TensorFlow graph ops $^{[4]}$ . TensorFlow graph execution results in speedups and in general, better performance. However, implementing complex operations like n-gram matching, etc. with TF ops was a major obstacle. As a result, we decided to opt for alternatives.

For ROUGE-N and ROUGE-L, we decided to wrap the official Python implementation$^{[5]}$ of ROUGE score with tf.py_function() $^{[6]}$. While this allows graph execution, it might not offer the same speedup benefits which a TF ops implementation may provide.

I experimented with TF ops for BLEU score here, but the graph ops implementation was pretty slow compared to the tf.py_function() approach:

Google Colaboratory

Hence, we again proceeded with a similar approach as the one we followed for ROUGE. We used a modified version of TensorFlow’s Python implementation of BLEU$^{[7]}$. We implemented the corpus BLEU variant of BLEU score since sentence BLEU isn’t very useful at the corpus-level, and it is easy to calculate sentence BLEU, given a corpus BLEU implementation.

For Edit Distance, however, luckily, TensorFlow has a graph implementation of edit distance$^{[8]}$, which we could easily leverage, although it required dealing with sparse tensors.

Tokenisers