https://github.com/amartyaamp/CodeComb/blob/ecc5a7b9642309d1817f56f91d080d0d83c6c13b/CodeComb_Core/embeddings.py#L15 word_tokenize is a much more sophisticated tokenizer and is recommended for NLP tasks