Use nltk.tokenize.word_tokenize instead of string.split

https://github.com/amartyaamp/CodeComb/blob/ecc5a7b9642309d1817f56f91d080d0d83c6c13b/CodeComb_Core/embeddings.py#L15

word_tokenize is a much more sophisticated tokenizer and is recommended for NLP tasks