您现在的位置是:首页 >技术杂谈 >使用datax 把数据从oracle导出到elasticsearch中网站首页技术杂谈

使用datax 把数据从oracle导出到elasticsearch中

小瑞 2023-06-06 20:00:03
简介使用datax 把数据从oracle导出到elasticsearch中

https://blog.csdn.net/mynameisjinxiaokai/article/details/128716467

一、datax需要python环境,需要先安装python
打开官网 https://www.python.org/downloads/
此处下载2.6.5版本安装
添加环境变量 path   C:Python27
安装完成后 cmd  使用   python -V 查看是否已安装成功

datax 下载后解压至本地某个目录,进入bin目录,即可运行同步作业
E: uanjianJAVAdataxdatax     , YOUR_JOB.json 因为datax会自动从github拉取对应插件的json的模版给我们
例子
$ cd  {YOUR_DATAX_HOME}/bin
$ python datax.py {YOUR_JOB.json}    
本地
cd E: uanjianJAVAdataxdataxin
python E: uanjianJAVAdataxdataxindatax.py  E: uanjianJAVAdataxdataxinlocal-oracle-test-json.txt
备注,1   json文件需要检查 json格式是否正确
2  执行输出乱码,临时解决,  输入   chcp 65001   
长久解决,  https://cloud.tencent.com/developer/article/2084809   修改window编码
3   写入模式(writeMode)配置错误. 因为Oracle不支持配置项 writeMode: xxx, Oracle只能使用insert sql 插入数据  ,所以删除 writeMode 这行配置
4   datax :java.sql.SQLException: No suitable driver found for ["jdbc:oracle:thin:@localhost:1521:orcl"] 
jobjson的writer中的jdbcUrl 不能使用[]扩起来,因为写入的数据库只能有一个

执行转出到 es 命令
python E: uanjianJAVAdataxdataxindatax.py  E: uanjianJAVAdataxdataxinoracleToES-json.txt

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "oraclereader",
                    "parameter": {
                        "username": "test1",
                        "password": "Xia_12345",
                        "column": ["NAME","SEX","SCHOOL"],
                        "connection": [
                            {
                                "table": [
                                    "DATAX1"
                                ],
                                "jdbcUrl": [
                                    "jdbc:oracle:thin:@localhost:1521:orcl"
                                ]
                            }
                        ]
                    }
                },
                "writer": {
                        "name": "oraclewriter",
                        "parameter": {
                        "username": "test1",
                        "password": "Xia_12345",
                        "column": [
                            "NAME",
                            "SEX",
                            "SCHOOL"
                        ],
                        "preSql": [
                            "delete from DATAX2"
                        ],
                        "connection": [
                            {
		                       "table": [
                                    "DATAX2"
                                ],
                                "jdbcUrl": "jdbc:oracle:thin:@localhost:1521:orcl",

                            }
                        ]
                    }
                }
            }
        ],
        "setting": {
            "speed": {
                "channel": 1
            },
         "errorLimit": {
             "record": 10,
             "percentage": 0.05
            }
        }
    }
}
{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "oraclereader",
                    "parameter": {
                        "username": "test1",
                        "password": "Xia_12345",
                        "column": ["NAME","SEX","SCHOOL"],
                        "connection": [
                            {
                                "table": [
                                    "DATAX1"
                                ],
                                "jdbcUrl": [
                                    "jdbc:oracle:thin:@localhost:1521:orcl"
                                ]
                            }
                        ]
                    }
                },
                 "writer": {
					  "name": "elasticsearchwriter",
					  "parameter": {
						"endpoint": "http://127.0.0.1:9200",
						"index": "myindex",
						"type": "data",
						"cleanup": true, #true表示插入前清空,即覆盖同步;false则追加同步
						"dynamic": true,  #这里一定要指定为true,否则使用的是datax的模板(就是下面定义的字段类型),而不会使用es的模板
						"settings": {"index" :{"number_of_shards": 2, "number_of_replicas": 1}},            
						"batchSize": 10000,
						"splitter": ",",
						"column": [
						  {"name": "NAME", "type": "text"},
						  { "name": "SEX", "type": "text"},
						  { "name": "SCHOOL", "type": "text"}                 
						]
					  }
					}
            }
        ],
        "setting": {
            "speed": {
                "channel": 1
            },
         "errorLimit": {
             "record": 10,
             "percentage": 0.05
            }
        }
    }
}

风语者!平时喜欢研究各种技术,目前在从事后端开发工作,热爱生活、热爱工作。