How to specify colClasses when reading a very big csv file into R using read.table.ffdf? -


I use the read.table.ffdf function, trying to read a very large CCV file around the size of 20g I was having trouble in specifying the colClasses option in "FF" package, but read.csv ().

I have to specify the callclass option because some column labels are very long integers in the file, such as with 11 digits, for example, the file has two rows

  86246, 205,17,171 9,104116343,8435,2013-03-13,12, oz, 1,2,5 9 86246,205,17, 1719,10800749282,8435,2013-03-13,12, oz, 1,2.5 9 < / Code>  

Integer 10800749282 is too large for the "integer" type, and it is not only "numerical" or "character" but above the 104116343 value in the row above is not enough, so r is the default From this column will treat this column as "integer"

I tried the following but found an error. Does anyone know the solution to this problem? Highly appreciated!

  dat < - read.table.ffdf (file = "file.csv", FUN = "read.csv", na.strings = "", colClasses = "character" error in FIF "(initdata = initdata, length = length, level = Level, ordered = command given: vmode 'character' is not implemented    

Your error indicates that there is no 'character' data type implemented within the FF environment. All characters should be treated x while accepting your file as factors Strong> Number of columns, working below Is:

dat not to import all the data as factors, because it is very inefficient. Just click on all your numerical data Import as a 'numeric.' Assume your first 5 column and the remaining 3 characters are:

dat < - read.csv.ffdf (NULL, file = "file.csv", na.strings = "", colClasses = c (representative ("numeric", 5), representative ("factor", 3)))

Comments

Popular posts from this blog

c - Mpirun hangs when mpi send and recieve is put in a loop -

python - Apply coupon to a customer's subscription based on non-stripe related actions on the site -

java - Unable to get JDBC connection in Spring application to MySQL -