-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Added a generic FileStream
(still in active development!)
#2654
refactor: Added a generic FileStream
(still in active development!)
#2654
Conversation
CodSpeed Performance ReportMerging #2654 will not alter performanceComparing Summary
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2654 +/- ##
==========================================
+ Coverage 90.21% 90.50% +0.29%
==========================================
Files 58 62 +4
Lines 4895 4994 +99
Branches 964 974 +10
==========================================
+ Hits 4416 4520 +104
+ Misses 331 328 -3
+ Partials 148 146 -2 ☔ View full report in Codecov by Sentry. |
d3d86fe
to
4dfdd17
Compare
4dfdd17
to
bd5c138
Compare
FileStream
FileStream
FileStream
FileStream
(still in active development!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ready for review. Everything here is subject to change: naming conventions, implementation, abstractions so feel free to comment on those.
samples/sample_tap_csv/client.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest reviewing this module in split view. All that's left after the refactor are the get_schema
and read_file
implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest reviewing this module in split view.
CSV-specific settings were added and custom discovery was removed in favor of the default implementation.
@property | ||
def partitions(self) -> list[dict[str, t.Any]]: | ||
"""Return the list of partitions for this stream.""" | ||
return self._partitions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using partitions allows us to track state for each individual file in merge mode.
These are working now FTP{
"filesystem": "ftp",
"path": "fixtures/csv",
"read_mode": "one_stream_per_file",
"delimiter": "\t",
"ftp": {
"host": "127.0.0.1",
"port": 21,
"username": "my_ftp_user",
"password": "my_ftp_password"
}
} SFTP{
"filesystem": "sftp",
"path": "fixtures/csv",
"read_mode": "one_stream_per_file",
"delimiter": "\t",
"sftp": {
"host": "127.0.0.1",
"port": 2022,
"username": "my_ftp_user",
"password": "my_ftp_password"
}
} |
886444a
to
e527cee
Compare
References:
📚 Documentation preview 📚: https://meltano-sdk--2654.org.readthedocs.build/en/2654/