Deliver superior customer experiences with an AI-driven platform for creating and deploying cognitive chatbots
Deliver Awesome UI with the most complete toolboxes for .NET, Web and Mobile development
Automate UI, load and performance testing for web, desktop and mobile
A complete cloud platform for an app or your entire digital business
Detect and predict anomalies by automating machine learning to achieve higher asset uptime and maximized yield
Automate decision processes with a no-code business rules engine
Optimize data integration with high-performance connectivity
Connect to any cloud or on-premises data source using a standard interface
Build engaging multi-channel web and digital experiences with intuitive web content management
Personalize and optimize the customer experience across digital touchpoints
Build, protect and deploy apps across any platform and mobile device
Rapidly develop, manage and deploy business apps, delivered as SaaS in the cloud
In this tutorial, you'll learn how to export data out of Hadoop to boost throughput using the DataDirect SQL Server JDBC driver and Apache Sqoop.
We've gotten everyone connected to SQL Server using Progress DataDirect's exclusive support for both NTLM and Kerberos authentication from Linux with Sqoop. Now, we plan to blow your minds with high flying bulk insert performance into SQL Server using Sqoop's Generic JDBC Connector. Linux clients will get similar throughput to the Microsoft BCP tool.
So far, Cloudera and HortonWorks have been pointing shops to the high performance DataDirect SQL Server JDBC driver to help load data volumes anywhere from 10GB to 1TB into SQL Server Data Marts and
Warehouses. It's common for the DataDirect SQL Server JDBC driver to speed up load times by 15-20X; and Sqoop will see similar improvement since it leverages JDBC batches that we transparently convert into SQL Server's native bulk load protocol. Moving data out of Hadoop and into external JDBC sources are exciting projects that represent the democratization of big data for downstream application consumers. You're definitely doing something right if you are ready to read on!
SELECT name, recovery_model_desc
WHERE name = ‘database_name’ ;
Note the recovery_model_desc returned by this query (expect to return,’BULK_LOGGED’).
sqoop export --connect 'jdbc:datadirect:sqlserver://nc-sqlserver:1433;database=test;user=test01;password=test01;EnableBulkLoad=true;BulkLoadBatchSize=1024;BulkLoadOptions=0' --driver com.ddtek.jdbc.sqlserver.SQLServerDriver --table 'blah_1024MB' --export-dir /user/hdfs/blah_1024MB/ --input-lines-terminated-by "n" --input-fields-terminated-by ',' --batch -m 10
INSERT INTO films (code, title, did, date_prod, kind) VALUES
('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'),
('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy');
View the Sqoop user's guide for complete reference.
Special thanks to Mike Spinak, Principal Software Engineer and Danh Huynh, Systems Administrator with their help with setup and testing in the 6 node Cloudera CDH5.2.0-1.cdh5.2.0.p0.36 cluster to export data into SQL Server 2008 R2.
Results are still coming in from several shops and the best to date is a load of 40 GB into SQL Server within 10 minutes. In the above system, we were loading 37 GB in 18.5 minutes. There are several properties across Hadoop, Sqoop, JDBC driver and SQL Server you can tune to improve performance even further.
Download your trial driver today and tweet your load times to @DataDirect_News. If you have questions, please contact us; or call us at 1-800-876-3101.
Download Free Trial
Saikrishna is a DataDirect Developer Evangelist at Progress. Prior to working at Progress, he worked as Software Engineer for 3 years after getting his undergraduate degree, and recently graduated from NC State University with Masters in Computer Science. His interests are in the areas of Data Connectivity, SaaS and Mobile App Development.
Copyright © 2018 Progress Software Corporation and/or its subsidiaries or affiliates.
All Rights Reserved.
Progress, Telerik, and certain product names used herein are trademarks or registered trademarks of Progress Software Corporation and/or one of its subsidiaries or affiliates in the U.S. and/or other countries. See Trademarks for appropriate markings.