词云分析

Reads: 2923 Edit

1 问题描述

从网上复制2021年政府工作报告到txt文件，命名为report.txt，并放置于Python工作目录下方。

2 基本词云分析

2.1 导入所需模块

import jieba
from wordcloud import WordCloud
import numpy as np
from PIL import Image

2.2 读取文字的文件

f = open('report.txt','r',encoding = 'gbk')
txt = f.read()
f.close

2.3 采用jieba进行分词

words = jieba.lcut(txt)     #精确分词
newtxt = ''.join(words)    #空格拼接

2.4 根据分好的词绘制词云

其中，“mysh.ttc”是系统中的中文字体文件，需要放置到Python工作目录下面！

wordcloud = WordCloud(
    font_path = "msyh.ttc",
    background_color='white',
    max_font_size = 50
).generate(newtxt)
wordcloud.to_file('词云.jpg')

绘制好词云后，再Python工作路径下方，可以找到词云.jpg的图片！

3 自定义词云形状

上面绘制的词云图形是正方形，我们可以自定义词云的图形。例如，根据如下熊猫图形来设定词云的形状！

需要注意的是图片必须是只保留形状的png图形才可以（需要再photoshop等软件中制作），制作好后将文件放置于Python的工作目录下面。

3.1 根据图片设定词云形状

shape = np.array(Image.open("panda.png"))

3.2 绘制词云

wordcloud = WordCloud(
    font_path = "msyh.ttc",
    background_color="white",
    width = 800,
    height = 600,
    max_words = 200,
    max_font_size = 80,
    mask = shape,
    contour_width = 3,
    contour_color = 'steelblue'
).generate(newtxt)
wordcloud.to_file('词云-自定义形状.png')

4 去除停用词后的词云

上面的词云中，可以发现有许多无意义的词，例如“以上”，为了进一步完善词云，需要将无意义的词去掉！这里我们使用百度的停用词表。

4.1 加载停用词

stopwords = [line.strip() for line in open('百度停词表.txt', 'r', encoding='utf-8').readlines()]

4.2 去除停用词

outwords = ''
for word in newtxt:
    if word not in stopwords:
        if word != '\t' and '\n':
            outwords = outwords+' '+word

4.3 绘制词云

shape = np.array(Image.open("panda.png"))
wordcloud = WordCloud(
    font_path = "msyh.ttc",
    background_color="white",
    width = 800,
    height = 600,
    max_words = 200,
    max_font_size = 80,
    mask = shape,
    contour_width = 3,
    contour_color = 'steelblue'
).generate(outwords)
wordcloud.to_file('词云-去除停用词.png')

获取案例数据和源代码，请关注微信公众号并回复:`Python_dt32`

词云分析

1 问题描述

2 基本词云分析

2.1 导入所需模块

2.2 读取文字的文件

2.3 采用jieba进行分词

2.4 根据分好的词绘制词云

3 自定义词云形状

3.1 根据图片设定词云形状

3.2 绘制词云

4 去除停用词后的词云

4.1 加载停用词

4.2 去除停用词

4.3 绘制词云

获取案例数据和源代码，请关注微信公众号并回复:`Python_dt32`

Comments

Make a comment

词云分析

1 问题描述

2 基本词云分析

2.1 导入所需模块

2.2 读取文字的文件

2.3 采用jieba进行分词

2.4 根据分好的词绘制词云

3 自定义词云形状

3.1 根据图片设定词云形状

3.2 绘制词云

4 去除停用词后的词云

4.1 加载停用词

4.2 去除停用词

4.3 绘制词云

获取案例数据和源代码，请关注微信公众号并回复:Python_dt32

Comments

Make a comment

获取案例数据和源代码，请关注微信公众号并回复:`Python_dt32`