Load Huge Csv File Into Oracle Database Table Using Pyspark
Currently I am using Python to connect to RESTAPI and extracting huge volume of data in csv file. The number of rows are almost 80 million. Now i want to load this huge data into O
Solution 1:
Let me show you an example of a control file I use to load a very big file ( 120 Million records each day )
OPTIONS (SKIP=0, ERRORS=500, PARALLEL=TRUE, MULTITHREADING=TRUE, DIRECT=TRUE, SILENT=(ALL))
UNRECOVERABLE
LOAD DATA
CHARACTERSET WE8ISO8859P1
INFILE '/path_to_your_file/name_of_the_file.txt'
BADFILE '/path_to_your_file/name_of_the_file.bad'
DISCARDFILE '/path_to_your_file/name_of_the_file.dsc'
APPEND
INTOTABLE yourtablename
TRAILING NULLCOLS
(
COLUMN1 POSITION(1:4) CHAR
,COLUMN2 POSITION(5:8) CHAR
,COLUMN3 POSITION(9:11) CHAR
,COLUMN4 POSITION(12:18) CHAR
....
....)
Some considerations
- It is always faster loading by positions than using delimiters
- Use the options of
PARALLEL
,MULTITHREADING
andDIRECT
to optimize loading performace. UNRECOVERABLE
is also a good advice if you always have the file in case you need to recover the database, you'd need to load the data again.- Use the appropriate characterset.
- The TRAILING NULLCOLS clause tells SQL*Loader to treat any relatively positioned columns that are not present in the record as null columns.
- Position means that each row contains data without any delimiter, so you know the position of each field in the table by the length.
AAAAABBBBBBCCCCC19828733UUUU
- If your txt or csv file has a field separator, let's say semicolon, then you need to use the
FIELDS DELIMITED BY
This is stored in a control file, normally a text file with extension ctl. Then you invoke from command line
sqlldr userid=youuser/pwd@tns_string control=/path_to_control_file/control_file.ctl
Post a Comment for "Load Huge Csv File Into Oracle Database Table Using Pyspark"