Skip to content Skip to sidebar Skip to footer

Load Huge Csv File Into Oracle Database Table Using Pyspark

Currently I am using Python to connect to RESTAPI and extracting huge volume of data in csv file. The number of rows are almost 80 million. Now i want to load this huge data into O

Solution 1:

Let me show you an example of a control file I use to load a very big file ( 120 Million records each day )

OPTIONS (SKIP=0, ERRORS=500, PARALLEL=TRUE, MULTITHREADING=TRUE, DIRECT=TRUE, SILENT=(ALL))
UNRECOVERABLE
LOAD DATA
CHARACTERSET WE8ISO8859P1
INFILE '/path_to_your_file/name_of_the_file.txt'
BADFILE '/path_to_your_file/name_of_the_file.bad'
DISCARDFILE '/path_to_your_file/name_of_the_file.dsc'
APPEND
INTOTABLE yourtablename
TRAILING NULLCOLS
(
COLUMN1 POSITION(1:4) CHAR
,COLUMN2 POSITION(5:8)  CHAR
,COLUMN3 POSITION(9:11) CHAR
,COLUMN4 POSITION(12:18) CHAR
....
....)

Some considerations

  • It is always faster loading by positions than using delimiters
  • Use the options of PARALLEL, MULTITHREADING and DIRECT to optimize loading performace.
  • UNRECOVERABLE is also a good advice if you always have the file in case you need to recover the database, you'd need to load the data again.
  • Use the appropriate characterset.
  • The TRAILING NULLCOLS clause tells SQL*Loader to treat any relatively positioned columns that are not present in the record as null columns.
  • Position means that each row contains data without any delimiter, so you know the position of each field in the table by the length.

AAAAABBBBBBCCCCC19828733UUUU

  • If your txt or csv file has a field separator, let's say semicolon, then you need to use the FIELDS DELIMITED BY

This is stored in a control file, normally a text file with extension ctl. Then you invoke from command line

sqlldr userid=youuser/pwd@tns_string control=/path_to_control_file/control_file.ctl

Post a Comment for "Load Huge Csv File Into Oracle Database Table Using Pyspark"