All we need is an easy explanation of the problem, so here it is.
To summarize the specifics: We need to stage approximately 5 million rows into a vendor (Oracle) database. Everything goes great for batches of 500k rows using
OracleBulkCopy (ODP.NET), but when we try to scale up to 5M, the performance starts slowing to a crawl once it hits the 1M mark, gets progressively slower as more rows are loaded, and eventually times out after 3 hours or so.
I suspect it’s related to a primary key on the table, but I’ve been trawling the Oracle forums and Stack Overflow for information and a lot of what I’m reading contradicts that (also, a lot of posts seem to contradict each other). I’m hoping that somebody can set the record straight on some closely-related questions about the process:
OracleBulkCopyclass use conventional or direct-path loading? Is there some way I can confirm this, one way or another?
Assuming it does use direct-path loading: Is it true that Oracle automatically sets all indexes to unusable during the load and puts them back online afterward? I’ve read several statements to this effect but again, cannot confirm it.
If #2 is true, then should it make any difference what indexes are on the table before I initiate a bulk copy operation? If so, why?
Related to #3, is there any practical difference, in general, between bulk loading with an unusable index vs. actually dropping the index before the load and recreating it afterward?
If #2 is not correct, or if there are some caveats I’m not understanding, then would it make any difference to explicitly make the index unusable before the bulk load, and then explicitly rebuild it afterward?
Is there anything else, other than index builds, that could cause a bulk copy operation to grow progressively slower as more and more records are added? (Maybe something to do with logging, although I would expect that bulk operations are not logged?)
If there really is no other way to get the performance up to snuff aside from dropping the PK/index first, what steps can I take to make sure that the index doesn’t completely disappear, i.e. if the connection to the database is lost in the middle of the process?
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
A few more days of reading and experimentation and I was able to (mostly) answer a lot of these:
I found this buried in the ODP.NET documentation (ironically not in the
The ODP.NET Bulk Copy feature uses a direct path load approach, which is similar to,
but not the same as Oracle SQL*Loader. Using direct path load is faster than conventional loading (using conventional SQL
So it appears that it does use direct path.
This I was able to verify by doing a large bulk copy operation and getting the index properties from SQL Developer. The index did appear as
UNUSABLEwhile the bulk copy was in progress. However, I’ve also discovered that
OracleBulkCopy.WriteToServerwill refuse to run if the index starts in an
UNUSABLEstate, so clearly there’s more going on here, because if it were as simple as disabling and rebuilding the index then it shouldn’t care about initial state.
It does make a difference specifically if the index is also a constraint. Found this little gem in the documentation linked above:
During an Oracle bulk copy, the following constraints are automatically enabled by
PRIMARY KEY(unique-constraints on not-null columns)
NOT NULLconstraints are checked at column array build time. Any row that violates the
NOT NULLconstraint is rejected.
UNIQUEconstraints are verified when indexes are rebuilt at the end of the load. The index is left in an Index Unusable state if it violates a
The documentation is a little hazy on what happens during the load, especially with primary keys, but one thing is absolutely certain – it behaves differently with a primary key vs. without one. Since the
OracleBulkCopywill happily allow you violate index constraints (and punt the index into
UNUSABLEstate when it’s done), my hunch is that it’s building the PK index during the bulk copy but simply not validating it until afterward.
I’m not sure whether the difference observed is within Oracle itself or just a quirk of the
OracleBulkCopy. The jury’s still out on this one.
OracleBulkCopywill throw an exception if an index is initially in the
UNUSABLEstate, so it’s really a moot point.
If there are other factors, indexes (and especially PK indexes) are still the most important, as I found out by:
Creating a global temporary table with the same schema (using
CREATE AS), then bulk copying into the temporary table, and finally doing a plain old
INSERTfrom the temp table into the real table. Since the temp table has no index, the bulk copy happens very fast, and the final
INSERTis also fast because the data is already in a table (I haven’t tried the append hint yet, since a 5M row table-to-table copy already takes less than 1 minute).
I’m not yet sure of the potential ramifications of (ab)using the temporary table space this way, but so far it hasn’t given me any trouble, and it’s much safer than the alternative by way of preventing corruption of either the rows or indexes.
The success of this also pretty clearly demonstrates that the PK index is the problem, as that is the only practical difference between the temp table and the permanent table – both started with zero rows during the performance tests.
Conclusion: Don’t bother trying to bulk copy more than around 100k rows into an indexed Oracle table using ODP.NET. Either drop the index (if you don’t actually need it) or “preload” the data into a different (non-indexed) table.
Here is an article from Oracle which explains when it will be beneficial to use bulk insert or when not. Also, has an insight on what happens at database level.
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂