Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOLfYN issue reading large AD2CP files #366

Open
akeeste opened this issue Dec 4, 2024 · 5 comments
Open

DOLfYN issue reading large AD2CP files #366

akeeste opened this issue Dec 4, 2024 · 5 comments

Comments

@akeeste
Copy link
Contributor

akeeste commented Dec 4, 2024

Describe the bug:

When using DOLfYN to read a large (12GB) Nortek Signature 1000 .ad2cp file, the initialization of _Ad2cpReader fails due to the file size. In this case, _Ad2cpReader._check_header() calls _reopen() with bufsize = self._eof , which is equivalent to the file size in bytes (~12e9). This value is too large for python to convert to a C int behind the scenes. The nens argument is not used here, and I cannot see any other user-facing arguments that can get around this issue.

I can manually decrease _eof during debugging and the rest of the read function continues as expected, though of course I don't know if the headers are being determined successfully by _check_header. @jmcvey3 do you have a work around for files this size?

To Reproduce:

@akeeste can send a link to the file in question if needed

Minimal working example:
Using the develop branch:

import mhkit.dolfyn as dolfyn
ds_adp = dolfyn.io.api.read("S100604A032_5_beam_8Hz.ad2cp", nens=[0,100])

Expected behavior:

Expected large file size to not be an issue when nens is used.

Screenshots:

Error messages:

  Traceback (most recent call last):
    File "c:\Users\akeeste\anaconda3\envs\mhkit\Lib\runpy.py", line 198, in _run_module_as_main
      return _run_code(code, main_globals, None,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "c:\Users\akeeste\anaconda3\envs\mhkit\Lib\runpy.py", line 88, in _run_code
      exec(code, run_globals)
    File "c:\Users\akeeste\.vscode\extensions\ms-python.debugpy-2024.0.0-win32-x64\bundled\libs\debugpy\adapter/../..\debugpy\launcher/../..\debugpy\__main__.py", line 39, in <module>
      cli.main()
    File "c:\Users\akeeste\.vscode\extensions\ms-python.debugpy-2024.0.0-win32-x64\bundled\libs\debugpy\adapter/../..\debugpy\launcher/../..\debugpy/..\debugpy\server\cli.py", line 430, in main
      run()
    File "c:\Users\akeeste\.vscode\extensions\ms-python.debugpy-2024.0.0-win32-x64\bundled\libs\debugpy\adapter/../..\debugpy\launcher/../..\debugpy/..\debugpy\server\cli.py", line 284, in run_file
      runpy.run_path(target, run_name="__main__")
    File "c:\Users\akeeste\.vscode\extensions\ms-python.debugpy-2024.0.0-win32-x64\bundled\libs\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 321, in run_path
      return _run_module_code(code, init_globals, run_name,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "c:\Users\akeeste\.vscode\extensions\ms-python.debugpy-2024.0.0-win32-x64\bundled\libs\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 135, in _run_module_code
      _run_code(code, mod_globals, init_globals,
    File "c:\Users\akeeste\.vscode\extensions\ms-python.debugpy-2024.0.0-win32-x64\bundled\libs\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 124, in _run_code
      exec(code, run_globals)
    File "C:\Users\akeeste\Documents\Software\mhkit_test\dolfyn_test.py", line 3, in <module>
      ds_adp = dolfyn.io.api.read("S100604A032_5_beam_8Hz.ad2cp")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "c:\users\akeeste\documents\software\github\mhkit-python\mhkit\dolfyn\io\api.py", line 114, in read
      return func(fname, userdata=userdata, nens=nens, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "c:\users\akeeste\documents\software\github\mhkit-python\mhkit\dolfyn\io\nortek2.py", line 79, in read_signature
      rdr = _Ad2cpReader(
            ^^^^^^^^^^^^^
    File "c:\users\akeeste\documents\software\github\mhkit-python\mhkit\dolfyn\io\nortek2.py", line 157, in __init__
      self.start_pos = self._check_header()
                       ^^^^^^^^^^^^^^^^^^^^
    File "c:\users\akeeste\documents\software\github\mhkit-python\mhkit\dolfyn\io\nortek2.py", line 209, in _check_header
      self._reopen(self._eof)
    File "c:\users\akeeste\documents\software\github\mhkit-python\mhkit\dolfyn\io\nortek2.py", line 226, in _reopen
      self.f = open(_abspath(self.fname), "rb", bufsize)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  OverflowError: Python int too large to convert to C int

Desktop (please complete the following information):

  • OS: Windows 11
  • MHKiT Version: 0.8.2 dev
@ssolson
Copy link
Contributor

ssolson commented Dec 4, 2024

@akeeste I think this is related to #314

@akeeste
Copy link
Contributor Author

akeeste commented Dec 5, 2024

@akeeste I think this is related to #314

agreed that the file size is an issue, but the errors in that case seem different. The nens flag is not helping here since the number of bytes in the file is ultimately too large for a c int. Though I saw this other thread where @jmcvey3 mentions using the linux truncate command. I'll test out with types, truncating, and etc some more

@jmcvey3
Copy link
Contributor

jmcvey3 commented Dec 5, 2024

Huh, that size shouldn't be an issue for a 64-bit integer. self._eof is determined by self.f.tell(0,2), so maybe check what that filetype is?

For the future, it's usually a good idea to configure the instrument to save multiple files instead of just one 12 Gb file. One so if the file gets corrupted you don't lose everything, and two because it's easier to download a bunch of <1 Gb files.

@akeeste
Copy link
Contributor Author

akeeste commented Dec 5, 2024

Definitely agreed on the second point, I'll certainly convey that to this user.

On the first point, IMO the error message is a bit ambiguous about the specific c integer type that the python int is converted to. As it throws an error I assumed it must be a C int or long int (16-bit and 32-bit respectively). I'll post more updates as I test things out

@akeeste
Copy link
Contributor Author

akeeste commented Dec 12, 2024

In testing this file, the _check_header function fails if _eof is larger than 2**31-1, indicating that bufsize is being convert to a c long int (32-bit). Casting bufsize to different numerical types (float, np.int64, etc) does not work. This size is also much smaller than sys.maxsize. Similar errors occur on linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants