Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for GoogleDriveLoader #71

Open
davidmigloz opened this issue Jul 12, 2023 · 9 comments
Open

Add support for GoogleDriveLoader #71

davidmigloz opened this issue Jul 12, 2023 · 9 comments
Assignees
Labels
c:doc-loaders Document loaders. f:good first issue Good for newcomers t:enhancement New feature or request

Comments

@davidmigloz davidmigloz converted this from a draft issue Jul 12, 2023
@davidmigloz davidmigloz added t:enhancement New feature or request c:doc-loaders Document loaders. f:good first issue Good for newcomers labels Jul 12, 2023
@Sahil-kachhap
Copy link

Sahil-kachhap commented Aug 7, 2023

@davidmigloz I am interested in working on adding this support so can you assign this to me. Also, Can you share any relevant reference other than the official documentation (above python codebase reference link is broken). And yes, Is there any specific time period under which I have to complete this functionality ?

I saw @faithoflifedev has worked on some of the document loader implementations and this is my first interaction with this repo. Will you be interested in sharing any piece of advice or suggestions on this.

@davidmigloz
Copy link
Owner Author

davidmigloz commented Aug 7, 2023

hey @Sahil-kachhap ,

Happy to hear that you are willing to collaborate with the project! 🙂

I've just fixed the links that point to the LangChain Python documentation and implementation. I would recommend that you take a look to it, then get familiar with the googleapis package and finally look at some of the other document loaders implementations to have a reference. There is no specific deadlines for this feature, so no pressure.

If you have any question or need more guidance don't hesitate to ask in the #contributing channel of our Discord Server, (or in this issue).

@faithoflifedev
Copy link
Contributor

Hi @Sahil-kachhap, I'll help where I can.

I've worked with the googleapis_auth package in my yt package that wraps a lot of the Youtube APIs.

I've been hesitant to start on any cloud file system loaders, since I really haven't had the time to research how they deal with different file types (CSV, JSON, etc.) so I figured my time was better spent on the individual file types thinking that these might need to be integrated into a cloud loader.

@Sahil-kachhap
Copy link

Sahil-kachhap commented Aug 11, 2023

[Need Review of below code snippet]: Dart Implementation of loadCredential Method

@faithoflifedev @davidmigloz here is a snippet of loadCredentials method that returns an authenticated HTTP client which can then be used by all other methods to interact with Google APIs:-

  /// Loads credentials from serviceAccountKey, tokenPath or credentialsPath (whichever exists)
  /// and returns an auto refreshing authenticated HTTP client which can then be utlisied to interact with google APIs.
  Future<AutoRefreshingAuthClient> loadCredentials() async {
    AutoRefreshingAuthClient? client;
    try {
      // If serviceAccountKey Path is not null, load credentials from serviceAccountKey.
      // By default it is set to this path '~/.credentials/credentials.json'
      // If serviceAccountKey file does not exist, then it loads credentials from tokenPath.
      if (serviceAccountKey != null) {
        final File file = File(serviceAccountKey!);
        if (file.existsSync()) {
          final String contents = await file.readAsString();
          final Map<String, dynamic> credentials = jsonDecode(contents) as Map<String, dynamic>;
          client = await clientViaServiceAccount(ServiceAccountCredentials.fromJson(credentials), scopes);
        }
      } else if (tokenPath != null) {
        /// clientViaApplicationDefaultCredentials method: Looks for credentials in the following order of preference:
        //1. A JSON file whose path is specified by GOOGLE_APPLICATION_CREDENTIALS, this file typically contains exported service account keys.
        //2. A JSON file created by gcloud auth application-default login in a well-known location (%APPDATA%/gcloud/application_default_credentials.json on Windows and $HOME/.config/gcloud/application_default_credentials.json on Linux/Mac).
        //3. On Google Compute Engine and App Engine Flex we fetch credentials from GCE metadata service.
        client = await clientViaApplicationDefaultCredentials(scopes: scopes);
      } else {
        /// if credentialsPath is valid, load credentials from credentialsPath.
        final File credentialFile = File(credentialsPath!);
        if (credentialFile.existsSync()) {
          final String contents = await credentialFile.readAsString();
          final Map<String, dynamic> credentials = jsonDecode(contents) as Map<String, dynamic>;
          // Obtains oauth2 credentials and returns an authenticated HTTP client.
          client = await clientViaUserConsent(ClientId.fromJson(credentials),scopes, (final String url) async {});
        }
      }

      if (client == null) {
        throw Exception('Error occured while loading credentials');
      }
      return client;
    } catch (e) {
      //print(e);
      throw Exception('Error occured while loading credentials');
    } finally {
      client?.close();
    }
  }
  • Can you please review and let me know the changes required in this method. I have implemented other methods for this class just need to test things and review if this method works as expected
  • One thing found about clientViaApplicationDefaultCredentials method - it by default checks for GOOGLE_APPLICATION_CREDENTIALS environment variable to get the service account key, but user needs to manually create this environment variable to work with this.
  • There are 3 types of credential path we are accounting - service account key path, token path (it gets auto generated I think), can you tell me what does the credential path points to, in the langchain documentation (Is it the credentials generated by Oauth 2.0 client).

@davidmigloz
Copy link
Owner Author

davidmigloz commented Aug 11, 2023

@Sahil-kachhap thanks for your work.

I think it's better if LangChain.dart is not responsible for the authentication/authorization part, but delegates that to googleapis_auth package. As there are many ways you may want to authenticate depending on your use case.

I would make the loader require a AuthClient in the constructor and then it's up to the user of the loader how it creates the authenticated HTTP client.

That's the strategy I've followed for implementing the Vertex AI API client.

Let me know if that makes sense to you.

By the way, I forgot to mention that the GoogleDriveLoader should go in the langchain_google package (instead of the main langchain package where the basic loaders are). I've created a directory where you can add it.

@faithoflifedev
Copy link
Contributor

HI @Sahil-kachhap , no comments on the code specifically. More generally, my approach to this type of scenario was to use different static methods for the varied authentication types:
static Future<AutoRefreshingAuthClient>withEnvironmentVar()
static Future<AutoRefreshingAuthClient>withOAuth(ClientId? oAuthClientId)
static Future<AutoRefreshingAuthClient>withFile(String fileNameAndPath)
static Future<AutoRefreshingAuthClient>withGenerator(RefreshTokenGenerator refreshTokenGenerator)

@Sahil-kachhap
Copy link

Oh yes that really makes sense @davidmigloz . Got it, I will now add the AuthClient in the constructor and yes will move the current progress in the code to that relevant package (langchain_google).

@Sahil-kachhap
Copy link

Sahil-kachhap commented Aug 11, 2023

HI @Sahil-kachhap , no comments on the code specifically. More generally, my approach to this type of scenario was to use different static methods for the varied authentication types: static Future<AutoRefreshingAuthClient>withEnvironmentVar() static Future<AutoRefreshingAuthClient>withOAuth(ClientId? oAuthClientId) static Future<AutoRefreshingAuthClient>withFile(String fileNameAndPath) static Future<AutoRefreshingAuthClient>withGenerator(RefreshTokenGenerator refreshTokenGenerator)

@faithoflifedev Oh Yeah that's a good approach too but since I created the class for googledriveloader functionality, I doubted, whether to place all these auth methods in the same place as other loader methods. Like I wanted to keep all stuffs focused towards loader functionality (Single Responsibility Principle)

@davidmigloz
Copy link
Owner Author

We can always add convenient factory constructors later on if needed, but I would keep it simple for now.

@davidmigloz davidmigloz moved this from 📋 Backlog to 🏗 In Progress in LangChain.dart Aug 16, 2023
@davidmigloz davidmigloz moved this from 🏗 In Progress to 📋 Backlog in LangChain.dart Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c:doc-loaders Document loaders. f:good first issue Good for newcomers t:enhancement New feature or request
Projects
Status: 📋 Backlog
Development

No branches or pull requests

3 participants