Need help with Vector_Similarity?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

taki0112
237 Stars 39 Forks MIT License 50 Commits 2 Opened issues

Description

Python, Java implementation of TS-SS called from "A Hybrid Geometric Approach for Measuring Similarity Level Among Documents and Document Clustering"

Services available

!
?

Need anything else?

Contributors list

# 7,595
Python
Java
Tensorf...
incepti...
3 commits
# 306,556
Python
Java
variati...
manifol...
1 commit
# 339,367
Jupyter...
HTML
JavaScr...
Shell
1 commit
# 626,991
Python
Java
1 commit

Vector_Similarity

  • Python, Java implementation of TS-SS called from "A Hybrid Geometric Approach for Measuring Similarity Level Among Documents and Document Clustering"
  • Also, I have summarized "A Hybrid Geometric Approach for Measuring Similarity Level Among Documents and Document Clustering"
  • I recommend TS-SS instead of Cosine distance or Euclidean distance.

The reasons are...

Cosine drawbacks

coise_drawback

Euclidean drawbacks

euclidean drawback

Triangle's Area Similarity (TS)

TS

Sector's Area Similarity (SS)

SS

TS-SS

TS_SS

Results

results

Conclusion

  • In biggest dataset, TS-SS outperforms Cosine with a significant difference, while in other datasets TS-SS outperforms Cosine slightly

  • Therefore, the significant better result of TS-SS in biggest dataset justifies the robustness and reliability of the model for big data and real world data where the variety of documents/texts are high

Reference

[1] A Hybrid Geometric Approach for Measuring Similarity Level Among Documents and Document Clustering [link1] [link2] [View Article]

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.