Skip to content

Commit

Permalink
Merge pull request #6 from Microsoft/csharp-update
Browse files Browse the repository at this point in the history
Csharp update
  • Loading branch information
Mmdixon authored Oct 5, 2018
2 parents 64c1cbd + 14bfe61 commit 04c64cd
Show file tree
Hide file tree
Showing 125 changed files with 7,440 additions and 2,500 deletions.
63 changes: 63 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
###############################################################################
# Set default behavior to automatically normalize line endings.
###############################################################################
* text=auto

###############################################################################
# Set default behavior for command prompt diff.
#
# This is need for earlier builds of msysgit that does not have it on by
# default for csharp files.
# Note: This is only used by command line
###############################################################################
#*.cs diff=csharp

###############################################################################
# Set the merge driver for project and solution files
#
# Merging from the command prompt will add diff markers to the files if there
# are conflicts (Merging from VS is not affected by the settings below, in VS
# the diff markers are never inserted). Diff markers may cause the following
# file extensions to fail to load in VS. An alternative would be to treat
# these files as binary and thus will always conflict and require user
# intervention with every merge. To do so, just uncomment the entries below
###############################################################################
#*.sln merge=binary
#*.csproj merge=binary
#*.vbproj merge=binary
#*.vcxproj merge=binary
#*.vcproj merge=binary
#*.dbproj merge=binary
#*.fsproj merge=binary
#*.lsproj merge=binary
#*.wixproj merge=binary
#*.modelproj merge=binary
#*.sqlproj merge=binary
#*.wwaproj merge=binary

###############################################################################
# behavior for image files
#
# image files are treated as binary by default.
###############################################################################
#*.jpg binary
#*.png binary
#*.gif binary

###############################################################################
# diff behavior for common document formats
#
# Convert binary document formats to text before diffing them. This feature
# is only available from the command line. Turn it on by uncommenting the
# entries below.
###############################################################################
#*.doc diff=astextplain
#*.DOC diff=astextplain
#*.docx diff=astextplain
#*.DOCX diff=astextplain
#*.dot diff=astextplain
#*.DOT diff=astextplain
#*.pdf diff=astextplain
#*.PDF diff=astextplain
#*.rtf diff=astextplain
#*.RTF diff=astextplain
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -70,3 +70,11 @@ typings/

# dotenv environment variables file
.env

# .net
bin/
obj/
packages/
*.Cache
*.nupkg
*.csproj.user
92 changes: 79 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Docs can be found at: https://microsoft.github.io/PhoneticMatching/
Supported API:
* C++
* Node.js (>=8.11.2)
* C# .NET Core (>=2.1)

Supported Languages
* English
Expand All @@ -34,32 +35,51 @@ npm install phoneticmatching
## Usage
See the typings for more details. <br> Classes prefixed with `En` make certain assumptions that are specific to the English language.
```ts
import maluuba, { EnPronouncer, EnPhoneticDistance, FuzzyMatcher, AcceleratedFuzzyMatcher, EnHybridDistance, StringDistance } from "phoneticmatching";
import { EnPronouncer, EnPhoneticDistance, FuzzyMatcher, AcceleratedFuzzyMatcher, EnHybridDistance, StringDistance } from "phoneticmatching";
```
__maluuba__ Default export, contains everything below.

__Speech__ The namespace containing the type interfaces of the library objects.

__FuzzyMatcher__ Main use case for this library. Returns matches against a list of targets for a given query. The comparisions are not remembered and therefore better for one-off use cases.
__EnPronouncer__ Pronounces a string, as a General English speaker, into its IPA string or array of Phones format.

__AcceleratedFuzzyMatcher__ Same interface as `FuzzyMatcher` but the list of targets are precomputed, so beneficial for multiple queries at the cost of a higher initialization time.
__matchers__ module:

__EnPronouncer__ Pronounces a string, as a General English speaker, into its IPA string or array of Phones format.
* __FuzzyMatcher__ Main use case for this library. Returns matches against a list of targets for a given query. The comparisions are not remembered and therefore better for one-off use cases.

* __AcceleratedFuzzyMatcher__ Same interface as `FuzzyMatcher` but the list of targets are precomputed, so beneficial for multiple queries at the cost of a higher initialization time.

* __EnContactMatcher__ A domain specialization of using the `AcceleratedFuzzyMatcher` for English speakers searching over a list of names. Does additional preprocessing and setups up the distance function for you.

* __EnPlaceMatcher__ A domain specialization of using the `AcceleratedFuzzyMatcher` for English speakers searching over a list of places. Does additional preprocessing and setups up the distance function for you.

__distance__ module:

* __EnPhoneticDistance__ Returns a metric distance score between two English pronunciations.

* __StringDistance__ Returns a metric distance score between two strings (edit distance).

* __EnHybridDistance__ Returns a metric distance score based on a combination of the two above distance metrics (English pronunciations and strings).

__EnPhoneticDistance__ Returns a metric distance score between two English pronunciations.
* __DistanceInput__ Input object for EnHybridDistance. Hold the text and the pronunciation of that text

__StringDistance__ Returns a metric distance score between two strings (edit distance).
__nlp__ module:

__EnHybridDistance__ Returns a metric distance score based on a combination of the two above distance metrics (English pronunciations and strings).
* __EnPreProcessor__ English Pre-processor.

* __EnPlacesPreProcessor__ English Pre-processor with specific rules for places.

* __SplittingTokenizer__ Tokenizing base-class that will split on the given RegExp.

Here are some example of how to import modules and classes:

```ts
import { EnContactMatcher, EnPlaceMatcher } from "phoneticmatching/lib/matchers";
import { EnContactMatcher, EnPlaceMatcher } from "phoneticmatching";
```
```ts
import * as Matchers from "phoneticmatching/lib/matchers";
```
__EnContactMatcher__ A domain specialization of using the `AcceleratedFuzzyMatcher` for English speakers searching over a list of names. Does additional preprocessing and setups up the distance function for you.

__EnPlaceMatcher__ A domain specialization of using the `AcceleratedFuzzyMatcher` for English speakers searching over a list of places. Does additional preprocessing and setups up the distance function for you.

## Example
JavaScript
```js
// Import core functionality from the library.
const { EnPhoneticDistance, FuzzyMatcher } = require("phoneticmatching");
Expand Down Expand Up @@ -93,6 +113,48 @@ const result = matcher.nearest("blu airy");
*/
console.log(result);
```
C#
```csharp
using System;

// Import core functionality from the library.
using Microsoft.PhoneticMatching.Matchers.FuzzyMatcher.Normalized;

public class Program
{
public static void Main(string[] args)
{
// The target list to match against.
string[] targets =
{
"Apple",
"Banana",
"Blackberry",
"Blueberry",
"Grapefruit",
"Pineapple",
"Raspberry",
"Strawberry",
};

// Create the fuzzy matcher.
var matcher = new EnPhoneticFuzzyMatcher<string>(targets);

// Find the nearest match.
var result = matcher.FindNearest("blu airy");

/* The result should be:
* {
* // The object from the targets list.
* element: 'Blueberry',
* // The distance score the from distance function.
* distance: 0.0416666666666667
* }
*/
Console.WriteLine("element : [{0}] - distance : [{1}]", result.Element, result.Distance);
}
}
```

## Build
### TypeScript Transpiling
Expand Down Expand Up @@ -138,6 +200,10 @@ npm publish
npm run package
```

## NuGet Publish
A .NET Core NuGet package is published for this project. The package is published by Microsoft. Hence, it must follow guidance at https://aka.ms/nuget and sign package content and package itself with an official Microsoft certificate. To ease signing and publishing process, we integrate ESRP signing to Azure DevOps build tasks.
To publish a new version of the package, create a release for the latest build (Pipelines->Releases->PublishNuget->Create a release).

# Contributors
This project welcomes contributions and suggestions. Most contributions require you to
agree to a Contributor License Agreement (CLA) declaring that you have the right to,
Expand Down
22 changes: 22 additions & 0 deletions binding.gyp
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,28 @@
},
},
"targets": [
{
"target_name": "maluubaspeech-csharp",
"dependencies": [
"maluubaspeech-source",
],
"sources": [
"src/maluuba/speech/csharp/bindings.cpp"
],
"xcode_settings": {
"CLANG_CXX_LANGUAGE_STANDARD": "c++17",
"GCC_ENABLE_CPP_EXCEPTIONS": "YES", # remove -fno-exceptions
"GCC_ENABLE_CPP_RTTI": "YES", # remove -fno-rtti
"OTHER_CFLAGS+": [
"-Wall",
"-pedantic",
],
"WARNING_CFLAGS!": [
"-Wall",
"-Wextra",
],
},
},
{
"target_name": "maluubaspeech",
"dependencies": [
Expand Down
2 changes: 1 addition & 1 deletion docs/assets/js/search.js

Large diffs are not rendered by default.

9 changes: 3 additions & 6 deletions docs/classes/chainedrulebasedpreprocessor.html
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,9 @@ <h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">string</s
<li class=" tsd-kind-class">
<a href="interval.html" class="tsd-kind-icon">Interval</a>
</li>
<li class=" tsd-kind-class">
<a href="matcherconfig.html" class="tsd-kind-icon">Matcher<wbr>Config</a>
</li>
<li class=" tsd-kind-class">
<a href="placematcherconfig.html" class="tsd-kind-icon">Place<wbr>Matcher<wbr>Config</a>
</li>
Expand Down Expand Up @@ -305,12 +308,6 @@ <h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">string</s
<li class=" tsd-kind-variable">
<a href="../globals.html#stringdistance" class="tsd-kind-icon">String<wbr>Distance</a>
</li>
<li class=" tsd-kind-object-literal">
<a href="../globals.html#matchers" class="tsd-kind-icon">matchers</a>
</li>
<li class=" tsd-kind-object-literal">
<a href="../globals.html#nlp" class="tsd-kind-icon">nlp</a>
</li>
</ul>
</nav>
</div>
Expand Down
Loading

0 comments on commit 04c64cd

Please sign in to comment.