Skip to content

Commit

Permalink
implemented skipcomments
Browse files Browse the repository at this point in the history
  • Loading branch information
alimanfoo committed Feb 16, 2012
1 parent b8455ae commit b4107cd
Show file tree
Hide file tree
Showing 8 changed files with 64 additions and 10 deletions.
Empty file removed CHANGES.txt
Empty file.
2 changes: 1 addition & 1 deletion LICENSE.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright (c) 2011 Alistair Miles
Copyright (c) 2012 Alistair Miles

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@

# General information about the project.
project = u'petl'
copyright = u'2011, Alistair Miles'
copyright = u'2012, Alistair Miles'

# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
Expand Down
3 changes: 3 additions & 0 deletions docs/index.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ and loading tables of data.
- Download: http://pypi.python.org/pypi/petl
- Mailing List: http://groups.google.com/group/python-etl

For an overview of all functions in the package, see the
:ref:`genindex`.

E.g., given the following data in a file at 'example.csv' in the current working directory::

foo,bar,baz
Expand Down
16 changes: 10 additions & 6 deletions docs/todo.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,25 +4,30 @@ Version 0.4
- [done] bug in aggregate where key is string?
- [done] head and tail default n arg to 10

- [done] skipcomments - handling of comments in text files

- fromtext, strip trailing newline character
- alternative signature to rename when you want to rename a single field only
- convertall - convenience function to convert all fields the same way
- convertnumbers - attempt to convert string values to ints, floats etc.

- exclude - convenience functions as complement of select...
- selectout - select complement

- implement fromjson somehow?
- optimised implementation of merge() where input files are presorted? I.e., avoid need to concatenate then sort the concatenation, rather just do the merge sort then reduce
- implement fromxml using etree find and patterns

- docs - merge() example has an error, uses list instead of tuple for conflict
- docs - add section about dependencies and petlx
- docs - add section about chaining and pipelining
- docs - add python-pipeline and continuum.io to related work
- docs - fix link to pandas?

- ignore some fields when finding conflicts
- selectout - select complement
- regex replace (re.sub)
- convertall - convenience function to convert all fields the same way
- replace, replaceall - convenience function to convert replacing a single value
- categorise/classify - convenient way to extend table with a classification field? or just use extend with function? example, convert continuous variable to categorical variable using custom ranges
- sampling for pivot, avoid two complete passes
- implement fromxml using etree find and patterns
- to and from compressed files
- alpha facet?
- partitioning of alpha fields?
Expand All @@ -40,8 +45,7 @@ Version 0.4
- implement rowgroupby? generate sequence of key/row iterator pairs?
- implement recordgroupby?
- show rows/values causing conversion or translation errors
- autoconvert multiple fields to numeric types - convenience function
- handling of comments in text files
- optimised implementation of merge() where input files are presorted? I.e., avoid need to concatenate then sort the concatenation, rather just do the merge sort then reduce


Version 0.3
Expand Down
2 changes: 1 addition & 1 deletion src/petl/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
selectrangeopenright, selectrangeopen, selectrangeclosed, rangerowreduce, \
rangerecordreduce, selectin, selectnotin, selectre, rowselect, recordselect, \
fieldselect, rowlenselect, selectis, selectisnot, selectisinstance, transpose, \
intersection, pivot, recordcomplement, recorddiff, cutout
intersection, pivot, recordcomplement, recorddiff, cutout, skipcomments



Expand Down
18 changes: 17 additions & 1 deletion src/petl/test/test_transform.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
crossjoin, antijoin, rangeaggregate, rangecounts, rangefacet, \
rangerowreduce, rangerecordreduce, selectre, rowselect, recordselect, \
rowlenselect, strjoin, transpose, intersection, pivot, recorddiff, \
recordcomplement, cutout
recordcomplement, cutout, skipcomments


def test_rename():
Expand Down Expand Up @@ -1852,6 +1852,22 @@ def test_skip():
iassertequal(expect2, table2) # can iterate twice?


def test_skipcomments():

table1 = (('##aaa', 'bbb', 'ccc'),
('##mmm',),
('#foo', 'bar'),
('##nnn', 1),
('a', 1),
('b', 2))
table2 = skipcomments(table1, '##')
expect2 = (('#foo', 'bar'),
('a', 1),
('b', 2))
iassertequal(expect2, table2)
iassertequal(expect2, table2) # can iterate twice?


def test_unpack():

table1 = (('foo', 'bar'),
Expand Down
31 changes: 31 additions & 0 deletions src/petl/transform.py
Original file line number Diff line number Diff line change
Expand Up @@ -4418,6 +4418,37 @@ def iterskip(source, n):
return islice(source, n, None)


def skipcomments(table, prefix):
"""
Skip any row where the first value is a string and starts with
`prefix`. E.g.::
TODO
"""

return SkipCommentsView(table, prefix)


class SkipCommentsView(object):

def __init__(self, source, prefix):
self.source = source
self.prefix = prefix

def __iter__(self):
return iterskipcomments(self.source, self.prefix)

def cachetag(self):
try:
return hash((self.source.cachetag(), self.prefix))
except Exception as e:
raise Uncacheable(e)


def iterskipcomments(source, prefix):
return (row for row in source if len(row) > 0 and not(isinstance(row[0], basestring) and row[0].startswith(prefix)))


def unpack(table, field, newfields=None, maxunpack=None, include_original=False):
"""
Unpack data values that are lists or tuples. E.g.::
Expand Down

0 comments on commit b4107cd

Please sign in to comment.