让我们使用Twitter情绪分析数据来计算每条推文的字数。我们将使用不同的方法,例如dataframe iterrows方法,NumPy数组和apply方法。你可以从此处下载数据集(https://datahack.analyticsvidhya.com/contest/practice-problem-twitter-sentiment-analysis/?utm_source=blog&utm_medium=4-methods-optimize-python-code-data-science)。
- '''
- 优化方法:apply方法
- '''
- # 导入库
- import pandas as pd
- import numpy as np
- import time
- import math
- data = pd.read_csv('train_E6oV3lV.csv')
- # 打印头部信息
- print(data.head())
- # 使用dataframe iterows计算字符数
- print('nnUsing Iterrowsnn')
- start_time = time.time()
- data_1 = data.copy()
- n_words = []
- for i, row in data_1.iterrows():
- n_words.append(len(row['tweet'].split()))
- data_1['n_words'] = n_words
- print(data_1[['id','n_words']].head())
- end_time = time.time()
- print('nTime taken to calculate No. of Words by iterrows :',
- (end_time-start_time),'seconds')
- # 使用Numpy数组计算字符数
- print('nnUsing Numpy Arraysnn')
- start_time = time.time()
- data_2 = data.copy()
- n_words_2 = []
- for row in data_2.values:
- n_words_2.append(len(row[2].split()))
- data_2['n_words'] = n_words_2
- print(data_2[['id','n_words']].head())
- end_time = time.time()
- print('nTime taken to calculate No. of Words by numpy array : ',
- (end_time-start_time),'seconds')
- # 使用apply方法计算字符数
- print('nnUsing Apply Methodnn')
- start_time = time.time()
- data_3 = data.copy()
- data_3['n_words'] = data_3['tweet'].apply(lambda x : len(x.split()))
- print(data_3[['id','n_words']].head())
- end_time = time.time()
- print('nTime taken to calculate No. of Words by Apply Method : ',
- (end_time-start_time),'seconds')
(编辑:晋中站长网)
【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!
|