I think the reason is that pickled Punkt tokenizer available in nltk_data was trained on byte strings, and implicit byte strings fail under Python 3.x. Other pickled data installable with ...