BULK INSERT

What’s a BULK INSERT?

If you want to get larger amounts of data into a database, the wonder weapon in the SQL Server environment is of course SQL Server Integration Services. If you want to do this for a SQL database in the cloud, you don’t have an SSIS engine available for the time being, you need to use an Azure Data Factory with an Integration Services Runtime. But if it’s just a matter of loading a formatted file into a table, you may be able to take a slightly “easier” way and load the data into the table using a BULK INSERT statement.

How can you use BULK INSERT?

The BULK INSERT command writes the data from a file into a table, and the command automatically works in batches, the size of which can be configured. To have data that we can import, we export the data from a query on the WideWorldImporters database to a CSV file. To do this, we execute the following query in Azure Data Studio:

SELECT 
     StockItemID
    ,StockItemName
    ,ColorName
    ,RecommendedRetailPrice
    ,TaxRate
    ,UnitPrice
    ,SupplierName
    ,SupplierCategoryName
FROM [Warehouse].[StockItems] si 
LEFT JOIN 
[Warehouse].[Colors] c
ON si.ColorID = c.ColorID
LEFT JOIN [Purchasing].[Suppliers] sup 
ON si.SupplierID = sup.SupplierID
LEFT JOIN [Purchasing].[SupplierCategories] supc
ON sup.SupplierCategoryID = supc.SupplierCategoryID

We now export this data as CSV. We can do this both in SQL Server Management Studio and Azure Data Studio by selecting the whole table and then copying and pasting it into Excel, where you will save the result as CSV. It would be easier to save in Azure Data Studio by right-clicking on the table and selecting “Save as CSV”, but then we have no control over thousand separators and column separators, which we may need to adjust, because even in 2020 computers are not yet ready to paste data you have exported from a database directly back into it.

Now we want to use this CSV file to load it into a new table via BULK INSERT. For this we have to create a table first. A few weeks ago you already got to know the pattern SELECT TOP 0 INTO, which we now use to create a table:

SELECT TOP 0 
     StockItemID
    ,StockItemName
    ,ColorName
    ,RecommendedRetailPrice
    ,TaxRate
    ,UnitPrice
    ,SupplierName
    ,SupplierCategoryName
INTO Sales.SpecialItems
FROM [Warehouse].[StockItems] si 
LEFT JOIN 
[Warehouse].[Colors] c
ON si.ColorID = c.ColorID
LEFT JOIN [Purchasing].[Suppliers] sup 
ON si.SupplierID = sup.SupplierID
LEFT JOIN [Purchasing].[SupplierCategories] supc
ON sup.SupplierCategoryID = supc.SupplierCategoryID

Now you can use BULK INSERT to write to this empty table, but you must make sure that the file exists on your database server, not on your client. This means you must either copy it there, or put it on a network share, or if you are using a Docker container like us, copy it into that container:

docker cp WWI_Items.csv container_name:/tmp/

If you have now provided the file cleanly (comma as column separator, no thousand separators and dot as decimal separator), and deleted all NULL values (the string “NULL” is not a NULL value during import), you can now load the file into your table:

BULK INSERT Sales.SpecialItems
FROM '/tmp/WWI_Items.csv'
WITH ( FORMAT='CSV');

In the WITH block you have further possibilities to intervene in a controlling way, you can for example set the column separator.

Provided your CSV files are clean and correspond to the desired input format, you now have a technically simple way to quickly load data from a CSV file into a table in your database.

References

Ninja-Notebooks @ GitHub

Leave a Reply

Your email address will not be published.