I have a pytohn script which counts how many times a character is met in the text file.
from __future__ import unicode_literals
import string
from collections import Counter
freqs = {}
text = sorted(open("rabi2.txt", "r" ,encoding='utf-8').read())
bad_chars = [')', '(', '-', '?', '?',',','!','—',' ','!','.','
']
text1 = ''.join(i for i in text if not i in bad_chars)
texts = [[words for words in sentences.lower().split()] for sentences in text1]
for line in texts:
for char in line:
if char in freqs:
freqs[char] += 1
else:
freqs[char] = 1
print(freqs)
I need to divide the text by 2 characters(and by 3 characters, this is a separate program)including the space and count how many times each syllable occurred, for example:
input: hello world hello everybody
output: he,ll,o(space),wo,rl,d (space),he,ll,o(space),ev,er,yb,od,y(space) and count how many times each met,
e.g: he - 2 times
ll - 2 times
wo - 1 time and so on
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…