利用文本挖掘技术来找出网络中的“小鲜词”
发布时间:2021-01-20 19:27:41 所属栏目:大数据 来源:网络整理
导读:开始之前,先看一下从人人网中发现的90后用户爱用的词 是不是很好玩,哈哈。写这篇文章就是让你简单的自动的从文本中找出新的词,这样就知道现在的年轻人喜欢什么了(对于博主这种上了年纪的人来说,真的是很有用,呜呜) 项目结构 当然,text.dat和common.d
文本选择器,筛选出可能为新词的词汇 CnTextSelector.javapackage grid.text.selector; import grid.common.TextUtils; public class CnTextSelector extends CommonTextSelector { public CnTextSelector(String document,int minSelectLen,int maxSelectLen) { super(document,minSelectLen,maxSelectLen); } protected void adjustCurLen() { while (pos < docLen && !TextUtils.isCnLetter(document.charAt(pos))) { pos++; } for (int i = 0; i < maxSelectLen && pos + i < docLen; i++) { if (!TextUtils.isCnLetter(document.charAt(pos + i))) { curLen = i; if (curLen < minSelectLen) { pos++; adjustCurLen(); } return; } } curLen = pos + maxSelectLen > docLen ? docLen - pos : maxSelectLen; } } CommonTextSelector.javapackage grid.text.selector; public class CommonTextSelector implements TextSelector { protected String document; protected int pos = 0; protected int maxSelectLen = 5; protected int minSelectLen = 2; protected int curLen; protected final int docLen; public CommonTextSelector(String document,int maxSelectLen) { this.document = document; this.minSelectLen = minSelectLen; this.maxSelectLen = maxSelectLen; docLen = document.length(); adjustCurLen(); } public void select() { pos += ++curLen; adjustCurLen(); } protected void adjustCurLen() { curLen = pos + maxSelectLen > docLen ? docLen - pos : maxSelectLen; } public String next() { if (curLen < minSelectLen) { pos++; adjustCurLen(); } if (pos + curLen <= docLen && curLen >= minSelectLen) { return document.substring(pos,pos + curLen--); } else { curLen--; // return document.substring(pos,docLen); return ""; } } public boolean end() { return curLen < minSelectLen && curLen + pos >= docLen - 1; } @Override public int getCurPos() { return pos; } } TextSelector.javapackage grid.text.selector; public interface TextSelector { public boolean end(); public void select(); public String next(); public int getCurPos(); } 测试代码(编辑:晋中站长网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |