博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
mongodb读取测试
阅读量:5127 次
发布时间:2019-06-13

本文共 2643 字,大约阅读时间需要 8 分钟。

1. there are a large amount of data in Hbase (row numer = 3849920), read all data from mongodb again and below is the log: [azureuser@h3crd-wlan31 ~]$ kubectl logs histclientsdataetl-20161028141055-doize |grep @ETL@ @ETL@ getLatestOnlineTime: 2016-10-27 23:59:59.815@ETL@ get uplineDate from hbase: 2016-10-27 23:59:59.815@ETL@ getCurrentPosFromDB: 2016-10-27 23:59:59.815@ETL@ Start finding at time: 2016-10-28 15:00:17.991, startFindDayIndex=17101@ETL@ Finding finished at time: 2016-10-28 15:00:21.551@ETL@ Wrting to Hbase has been finished !@ETL@readCount: 3849920,putCount: 0,time: 2016-10-28 15:11:48.784@ETL@ child file name: 1.1.1.1-histclientsdataetl@ETL@ Change logfile name has finished: 2016-10-28 15:11:48.792@ETL@ cmd: sh /home/mongo-hive-hbase/Hive_HistClientsInfoAnalysis.sh@ETL@: exec has finished!@ETL@ new Executing remote hive shell has finished: 2016-10-28 15:12:49.273@ETL@ result:

 

将代码改为批量获取后的效果(10695026条数据共4个G左右):

public MongoCursor
find(Bson filter) { FindIterable
findIterable = this.collection.find(filter).batchSize(50000); MongoCursor
mongoCursor = findIterable.iterator(); return mongoCursor; }

 

[azureuser@h3crd-wlan31 ~]$ kubectl logs histclientsdataetl-xubd1 |grep @ETL@@ETL@ getLatestOnlineTime: 2016-10-28 23:59:59.982@ETL@ get uplineDate from hbase: 2016-10-28 23:59:59.982@ETL@ getCurrentPosFromDB: 2016-10-28 23:59:59.982@ETL@ Start finding at time: 2016-10-31 00:05:02.858, startFindDayIndex=17102@ETL@ Finding finished at time: 2016-10-31 00:05:04.154@ETL@ Wrting to Hbase has been finished !@ETL@ upDateLatestOnlineTime: 2016-10-30 23:59:59.999@ETL@readCount: 10695026,putCount: 10695026,time: 2016-10-31 00:48:41.78

数据量是之前的2.7倍,拉取时间4倍,读取的时候是从Mongodb的备服务器上读取

 

继续修改代码加batchsize增加一倍后:

public MongoCursor
find(Bson filter) { FindIterable
findIterable = this.collection.find(filter).batchSize(100000); MongoCursor
mongoCursor = findIterable.iterator(); return mongoCursor; }

 

[azureuser@h3crd-wlan31 ~]$ kubectl logs histclientsdataetl-20161031095007-qv4fd |grep @ETL@@ETL@ getLatestOnlineTime: 2016-10-30 23:59:59.999@ETL@ get uplineDate from hbase: 2016-10-30 23:59:59.999@ETL@ getCurrentPosFromDB: 2016-10-30 23:59:59.999@ETL@ Start finding at time: 2016-10-31 10:26:55.071, startFindDayIndex=17104@ETL@ Finding finished at time: 2016-10-31 10:26:56.406@ETL@ Wrting to Hbase has been finished !@ETL@readCount: 7936993,putCount: 0,time: 2016-10-31 11:04:02.219

数据量2.06倍,时间3.08倍

转载于:https://www.cnblogs.com/zhengchunhao/p/6008829.html

你可能感兴趣的文章
前端对于需要对参数做处理的接口的相关技术
查看>>
阅读计划——《软件需求十步走》01
查看>>
Redis 发布与订阅模式
查看>>
JAVA8方法引用
查看>>
js中数组的字符串表示
查看>>
优先队列实现哈弗曼最小权值
查看>>
7.数字处理类
查看>>
基础-1
查看>>
znpc改版前后网址修改办法
查看>>
前端体系知识图谱
查看>>
javascript之日期对象(Date)
查看>>
Vue路由编程式导航以及hash模式
查看>>
野派,阡陌人生路,泪断愁肠,滚滚红尘中,情留心房
查看>>
表单项
查看>>
JavaSE Map的使用
查看>>
VMWare9下基于Ubuntu12.10搭建Hadoop-1.2.1集群—整合Zookeeper和Hbase
查看>>
idea+maven+Struts2 之struts.xml中标签介绍
查看>>
Nodejs的介绍
查看>>
js 禁止表单提交的方法(文件上传)
查看>>
[ZT] 医学图像分析相关的会议
查看>>