Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to get data when Deye inverter get offline #203

Open
bobybob69 opened this issue Oct 8, 2024 · 35 comments
Open

Unable to get data when Deye inverter get offline #203

bobybob69 opened this issue Oct 8, 2024 · 35 comments
Labels
invalid This doesn't seem right stale

Comments

@bobybob69
Copy link

Describe the bug
When Deye inverter M200G4 get offline, impossible to have data live. All entities get offline.
Need to re-load the integration

Attach the debug log
Will be uploaded soon

To Reproduce
Just after overnight when first sunlight appear and panel produce energy, nothing happen on the HA app, however everything is live on the SOLARMAN app.

Expected behavior
Data should be back online even after inverter get offline during overnight.

Screenshots

image

Energy dashboard not showing anything for todays

image

Production from the SOLARMAN app for the same day of energy dashboard

image

Entities offline except one "Total Production 4", don't know why...

Metadata:
Version: v24.10.04

@bobybob69 bobybob69 added the bug Something isn't working label Oct 8, 2024
@CrazyUs3r
Copy link
Contributor

double #87

@davidrapan
Copy link
Owner

@bobybob69 described situation where device in HA don't come up in the morning so it's not duplicate.

I'm waiting for the logs. 😉

@bobybob69
Copy link
Author

hey @davidrapan and @CrazyUs3r thanks for checking , hope you're good !

Yesterday I hope my HA dashboard to check energy production and I didn't get any data (as share on the screenshot). Yesterday I try rebooting the instance but it doesn't change anything. I didn't get opportunity to DL the log as I was on my iPhone.

Today I open again and I can see my inverter is still offline again. I'll see around 12 if still offline but I'm sure it will be.

I was not sure about the similarity with #87 but If you mention it , perhaps it is ?

here are the logs

home-assistant_solarman_2024-10-09T03-53-16.385Z.log

@davidrapan
Copy link
Owner

This is enabled debug log from previous day until morning?

@bobybob69
Copy link
Author

Hi @davidrapan
it should but as you ask it mean that it's not the case ?

Here are the pictures of my HA ENERGY dashboard just now

image

We can see the entities are offline

image

What would be the best to help you troubleshoot ?

@davidrapan
Copy link
Owner

I really ha no idea what is happening.. I will need a moment to think about it. 😉

@davidrapan
Copy link
Owner

Hi @bobybob69, did you for example just try to hit the Reload button under that 3 dots menu on the list with Solarman devices?

@bobybob69
Copy link
Author

hi @davidrapan , yes I try and bellow are the logs

As it's overnight, inverter aren't producing anything. But they can't be reachable.. it's strange and that's the issue that occurred . Is it expected the inverter to not be reachable when their is no production ?

I'll try again tomorrow morning just so the log record the changes

thanks for helping mate

@davidrapan
Copy link
Owner

Yes microinverters are turning off when there is no sunlight.

@bobybob69
Copy link
Author

bobybob69 commented Oct 13, 2024

Hi @davidrapan you good ?

Bellow are the logs and what I'm seeing on the entities .. I don't know why they all turn as unavailable .
Does a rename of the entities can cause an issue ?
Otherwise the issue for the inverter turning off and on but not on HA .. no idea why It happen

Any though ?

Thanks mate !
home-assistant_solarman_2024-10-13T10-12-41.204Z.log

Capture d’écran 2024-10-13 à 12 14 09

@davidrapan
Copy link
Owner

Did you tried that reload button when it gets into this state?

@bobybob69
Copy link
Author

Hi @davidrapan yes I click the reload button but it stay unavailable.
I re-press this button just now and it goes back online with the data.
Also, I'll check tonight when they'll go offline if when buck on in the next day, data canes back normally.
I activate the logs and will share then with you tomorrow

@bobybob69
Copy link
Author

hey @davidrapan

Here are the logs , for 2 days of works and right now here's what I'm seeing : error message everywhere .

If I clicked reload, it works back as expected

home-assistant_solarman_2024-10-14T16-01-30.348Z.log

Capture d’écran 2024-10-14 à 18 02 43

after clicking the reload button

Capture d’écran 2024-10-14 à 18 03 38 Capture d’écran 2024-10-14 à 18 04 27

@davidrapan
Copy link
Owner

This behavior is honestly really weird and I can't think of anything we could try to reveal what's going on... :-/

@bobybob69
Copy link
Author

Hi @davidrapan , just to let you know, is happen again this morning.. the inverter goes offline from the integration.
It's strange because I was using Stephane Joubert integration and I didn't get theses issues.. what could cause it to happen ?
I re-enable the logs and will share them later.
Yesterday they were offline due to non-production, and when they produce back , integration show the error.

Capture d’écran 2024-10-17 à 08 57 16

What would you need as infos to have better context to understand what can cause the issue ? Only the logs are enough ?

Thanks and have a great day

@davidrapan
Copy link
Owner

Hello @githubDante, do you maybe have any idea (cause I'm out of them) of what could be wrong here?

@githubDante
Copy link

This is really bad:

OSError: [Errno 24] No file descriptors available

It's an indication for FD leak somewhere. The question is who is causing it. It can be this integration, but it could also be something HA related (e.g. other modules).

What happens at night when these micro inverters are offline ?!? Retries until successful connection or something else ?

@davidrapan
Copy link
Owner

davidrapan commented Oct 17, 2024

Ou I did not notice that OSError... That truly is bad.

What happens at night when these micro inverters are offline ?!? Retries until successful connection or something else ?

Yes. Retries.

There are quite few of users with microinverters which also go offline during the night but do not experience this issue.

@githubDante
Copy link

Maybe they don't have so many inverters. There are at least 3 here.

@bobybob69 can you provide a log for the interval between e.g. 18:00PM and 07:00AM, or an extended log for 24 hours or more.

@davidrapan
Copy link
Owner

Maybe they don't have so many inverters. There are at least 3 here.

Yeah that's true though.

@davidrapan
Copy link
Owner

Isn't there any way how we could easily reuse sockets?

@githubDante
Copy link

No, they must be released. The good news is that the issue is not caused by the integration/pysolarmanV5, it must be something else in the @bobybob69 installation that leak FDs (not necessary network related).

How I know that the issue is elsewhere - with HA in a container and several fake inverters with different addresses of running hosts (one is connected to a web server on port 80 😄 and it's very noisy ) in it, then I monitor the connections and their states.

@bobybob69 what's the output of this command:

ls /proc/`ps xalf | grep hass | grep -v grep | awk '{print $3}'`/fd | wc -l 

How many are network connections ?!?

lsof -i -a -np `ps xalf | grep hass | grep -v grep | awk '{print $3}'` | grep TCP

or with ss:

ss -ntp | grep `ps xalf | grep hass | grep -v grep | awk '{print $3}'`

@davidrapan
Copy link
Owner

No, they must be released. The good news is that the issue is not caused by the integration/pysolarmanV5, it must be something else in the @bobybob69 installation that leak FDs (not necessary network related).

I also ran a test with one real and three fake inverters and came to the same conclusion...

@davidrapan davidrapan removed the bug Something isn't working label Oct 18, 2024
@davidrapan davidrapan added the invalid This doesn't seem right label Nov 1, 2024
@bobybob69
Copy link
Author

bobybob69 commented Nov 2, 2024

This is really bad:

OSError: [Errno 24] No file descriptors available

It's an indication for FD leak somewhere. The question is who is causing it. It can be this integration, but it could also be something HA related (e.g. other modules).

What happens at night when these micro inverters are offline ?!? Retries until successful connection or something else ?

hi @githubDante , tonight I was trying something to integrate my smart meter and I had to restart HA instance, when back on, I go on the solarman and the inverter are offline (as there is no production)

bellow are the log after reloading the solarman instance for each devices
sorry I didn't notice your message earlier

home-assistant_solarman_2024-11-02T21-31-55.556Z.log

here's the actual looking of the solarman integration for my inverter and the smart meter I'm trying to integrate on #187 with @davidrapan

solarman error status inverter

@bobybob69
Copy link
Author

No, they must be released. The good news is that the issue is not caused by the integration/pysolarmanV5, it must be something else in the @bobybob69 installation that leak FDs (not necessary network related).

How I know that the issue is elsewhere - with HA in a container and several fake inverters with different addresses of running hosts (one is connected to a web server on port 80 😄 and it's very noisy ) in it, then I monitor the connections and their states.

@bobybob69 what's the output of this command:

ls /proc/`ps xalf | grep hass | grep -v grep | awk '{print $3}'`/fd | wc -l 

How many are network connections ?!?

lsof -i -a -np `ps xalf | grep hass | grep -v grep | awk '{print $3}'` | grep TCP

or with ss:

ss -ntp | grep `ps xalf | grep hass | grep -v grep | awk '{print $3}'`

@githubDante , I try from the terminal menu of home assistant, and bellow are the result (all seems to fail, except the second, I press enter but nothing happen..)

line comand result

anything I could help with to troubleshoot ?
thanks

@githubDante
Copy link

Hi,

The name of the main process is not hass in your installation. Try to identify it and use it in the grep command

ls /proc/`ps xalf | grep <process name> | grep -v grep | awk '{print $3}'`/fd | wc -l 

and

lsof -i -a -np `ps xalf | grep <process name> | grep -v grep | awk '{print $3}'` | grep TCP

If you know the PID you can use it directly:

ls /proc/<PID>/fd | wc -l

and

lsof -i -a -np <PID> | grep TCP

@davidrapan
Copy link
Owner

davidrapan commented Nov 3, 2024

How is your HA installed?

BTW, you are from the future? Cause your latest posts says "bobybob69 commented in 30 minutes"! 😆

@bobybob69
Copy link
Author

hey @githubDante , any tips to know how can I know which process I should look at ? and same for the PID ?
Sorry it's not familiar for me here, but ready to know how to !

Please found below the logs after the solar production start , all the inverter get back online. But I had to manually refresh the configuration from each integration for the inverter.

Thanks for your help mate !

home-assistant_solarman_2024-11-03T08-43-57.388Z.log

@githubDante
Copy link

You can use ps xalf to list all processes or to scan manually /proc/*/comm & /proc/*/cmdline with ls -l & cat in order to find it. Considering the fact that this is some tiny system (using busybox) you should not have many processes, especially python3.12 related.

The last log shows something which is definitely related to the FD leak issue. The ics_calendar extension is behaving rather funky. It starts here:

2024-11-02 22:24:02.142 ERROR (SyncWorker_4) [custom_components.ics_calendar.calendar] Schedule Apple Loris: Failed to open url...

continues with:

(error count: 4 - this error is ratelimited)

and then its connection limit is getting exhausted:

2024-11-02 22:24:08.916 WARNING (SyncWorker_4) [urllib3.connectionpool] Connection pool is full, discarding connection: p139-caldav.icloud.com. Connection pool size: 10

Another limit is reached here:

2024-11-02 22:24:53.112 WARNING (MainThread) [homeassistant.components.homekit] Cannot add climate.clim_mael as this would exceed the 150 device limit. Consider using the filter option

The OS errors OSError: [Errno 24] No file descriptors available start 20-30 minutes later while the ics_calendar still tries to open that URL.

The tests performed by me & @davidrapan on a clean install show no issues with this integration, so the root of the issue must be in another extension. Try to disable them one by one and you should find the culprit.

@bobybob69
Copy link
Author

hey @githubDante , just checked on terminal, what I did few month ago is to replace the RPi and in the mean time I did a clean install + backup restoration from my previous HA installation. Does this can cause issue ?

Please found bellow results for ps xalf

ps xalf results

bellow are the list for command busybox --list

busybox 1 busybox 2 busybox 3 busybox 4 busybox 5 busybox 6

does this help ?

Are you suggesting to clean install HA and re-install module one by one ? maybe that could solve the issue ?

thanks for your help

@githubDante
Copy link

does this help ?

No.

Are you suggesting to clean install HA and re-install module one by one ? maybe that could solve the issue ?

I'm not fammiliar with HA/HA OS at all, but yes, start from scratch or disable/uninstall the modules/integrations which you do not use. Does this ics_calendar even work for you?!?

@davidrapan
Copy link
Owner

davidrapan commented Nov 3, 2024

I don't understand why you just don't try to remove devices from these other integrations (or even remove them completely) which are running there. It does not look like they even work so... it's really no brainer.

@bobybob69
Copy link
Author

bobybob69 commented Nov 3, 2024

Hey guys @githubDante yes the ics calendar works but if it need to be deleted it will not be a problem to do so.

What should I do to help so ? From where the command should be executed ?
I'm sorry to not be as efficient as you would 😅

@davidrapan which devices should I remove that you suppose are set incorrectly? I'm not sure to understand

@davidrapan
Copy link
Owner

davidrapan commented Nov 3, 2024

According to the log ics_calendar have or is causing some issues for example.

We told you, try start disabling some integrations (from HACS) one by one until the problem with solarman disappears...

Something in your HA is exhausting resources and thus causing issues which results in solarman not working... 😉

Copy link

github-actions bot commented Dec 4, 2024

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right stale
Projects
None yet
Development

No branches or pull requests

4 participants