Royal Genes


Safe For Kids





Converting large text files to excel



26 Feb 2006 20:50:29 -0800 soc.genealogy.computing
previous


fari...
I have several text files having more than 65000 rows. The character
separating the rows is | (piped character). Can someone help me as to
how this can be done. Its all right if the data runs across several
sheets for each file.

Dennis Lee Bieber...
Well, the | is no problem... Presuming it never appears within the
data itself.

Any decent text editor (probably not Notepad) should be able to
slurp in the file, and do a find&replace on | with \n (or \n\r if using
raw MS-DOS convention). Or, if one has Python available, something on
the likes of:

fin = open("big_ugly_file", "r")
data = fin.read().split("|")
fout = open("real_lines_file", "wt")
for ln in data:
fout.write(ln + "\n")

fin.close()
fout.close()

As for import? Is the data comma separated, tab separated, fixed
width? String data quoted?

Excel's CSV import function (also available in Access) allows for
defining the columns...

Too many rows? Well, split the file using some editor...


Denis Beauregard...
With Excel, you can select the line where you begin to read
the file.


Steve Hayes...
I'd call it into my word processor and do a search and replace -- replacing |
with the LF character (for Unix) or CR-LF for other formats.

T.M. Sommers...
If it's on Unix, then use tr.


I will really appreciate the help.

Warm regards

Robert G. Eldridge...
The following is one option provided the content of the longest row
will fit into an Excel cell.

1. Divide the text files so that there are only 65,000 rows or less in
each file.

2. Copy and paste each text file into Excel. All rows should now be in
the A column.

3. Select the A column and then use Excels Data/Text to Columns...
menu function first selecting Delimited then Next then remoce the tick
from the default Tab selection and select Other and enter the |
character in the field to the right of Other then Next and Finish.

The data will now be divided into columns and all | characters gone.

If some of the data is dates then all bets are off - use a database
application instead . Dates and Excel don't mix easily.

Note
Excels text to columns facility is great for removing the leading
space often found when data is copy/pasted from Web pages.
Use the fixed width option, set the divider where the data starts and
then don't import the first column to instantly remove the leading
space content.
next