Load Huge Csv File Into Oracle Database Table Using Pyspark

September 19, 2023 Post a Comment

Currently I am using Python to connect to RESTAPI and extracting huge volume of data in csv file. The number of rows are almost 80 million. Now i want to load this huge data into O

Solution 1:

Let me show you an example of a control file I use to load a very big file ( 120 Million records each day )

OPTIONS (SKIP=0, ERRORS=500, PARALLEL=TRUE, MULTITHREADING=TRUE, DIRECT=TRUE, SILENT=(ALL))
UNRECOVERABLE
LOAD DATA
CHARACTERSET WE8ISO8859P1
INFILE '/path_to_your_file/name_of_the_file.txt'
BADFILE '/path_to_your_file/name_of_the_file.bad'
DISCARDFILE '/path_to_your_file/name_of_the_file.dsc'
APPEND
INTOTABLE yourtablename
TRAILING NULLCOLS
(
COLUMN1 POSITION(1:4) CHAR
,COLUMN2 POSITION(5:8)  CHAR
,COLUMN3 POSITION(9:11) CHAR
,COLUMN4 POSITION(12:18) CHAR
....
....)

Some considerations

It is always faster loading by positions than using delimiters
Use the options of PARALLEL, MULTITHREADING and DIRECT to optimize loading performace.
UNRECOVERABLE is also a good advice if you always have the file in case you need to recover the database, you'd need to load the data again.
Use the appropriate characterset.
The TRAILING NULLCOLS clause tells SQL*Loader to treat any relatively positioned columns that are not present in the record as null columns.
Position means that each row contains data without any delimiter, so you know the position of each field in the table by the length.

AAAAABBBBBBCCCCC19828733UUUU

If your txt or csv file has a field separator, let's say semicolon, then you need to use the FIELDS DELIMITED BY

This is stored in a control file, normally a text file with extension ctl. Then you invoke from command line

sqlldr userid=youuser/pwd@tns_string control=/path_to_control_file/control_file.ctl

Python Developer

Load Huge Csv File Into Oracle Database Table Using Pyspark

Solution 1:

Post a Comment for "Load Huge Csv File Into Oracle Database Table Using Pyspark"