[NUTCH-2856] Implement a protocol-smb plugin based on hierynomus/smbj#826
[NUTCH-2856] Implement a protocol-smb plugin based on hierynomus/smbj#826HiranChaudhuri wants to merge 10 commits intoapache:masterfrom
Conversation
Draft version of a protocol-smb plugin. Lots of todo comments still, but it seems to work.
|
Moving this to DRAFT status and acknowledging the PR @HiranChaudhuri thank you. |
lewismc
left a comment
There was a problem hiding this comment.
Hi @HiranChaudhuri I added quite a few comments for your consideration. Thanks for submitting this PR 👍
Please ping me once your ready and we can go for round # 2 of peer review.
Further out, I think we could implement some testing for this protocol plugin. We could use testcontainers and essentially spin up a local Samba server using @nddipiazza 's smbj-inttest image. We can come back to this one the PR has evolved a bit.
| filePattern="${hadoop.log.dir}/$${date:yyyy-MM}/nutch-%d{yyyy-MM-dd}.log.gz"> | ||
| <PatternLayout pattern="%d %p %c{1.} [%t] %m%n" /> | ||
| <!--<PatternLayout pattern="%d %p %c{1.} [%t] %m%n" />--> | ||
| <PatternLayout pattern="%d %p %c [%t] %m%n" /> |
There was a problem hiding this comment.
What's the reason for this change? Does this print the logger name in full?
There was a problem hiding this comment.
It does print in full as I am otherwise not entirely clear where a message is coming from.
We can revert the change before merge - for the time being on my side I need it.
src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/Smb.java
Outdated
Show resolved
Hide resolved
src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/Smb.java
Outdated
Show resolved
Hide resolved
src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/Smb.java
Outdated
Show resolved
Hide resolved
src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/Smb.java
Outdated
Show resolved
Hide resolved
src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/SmbURLConnection.java
Show resolved
Hide resolved
Improve error handling Rename class as requested Added license header Improve url parsing added robots.txt
lewismc
left a comment
There was a problem hiding this comment.
Thanks for all of the responses @HiranChaudhuri please see my updated suggestions.
| filePattern="${hadoop.log.dir}/$${date:yyyy-MM}/nutch-%d{yyyy-MM-dd}.log.gz"> | ||
| <PatternLayout pattern="%d %p %c{1.} [%t] %m%n" /> | ||
| <!--<PatternLayout pattern="%d %p %c{1.} [%t] %m%n" />--> | ||
| <PatternLayout pattern="%d %p %c [%t] %m%n" /> |
|
Hi @HiranChaudhuri now that you've activated the plugin Please let me know your thoughts on this. |
|
The container looks good. I have no clue about the @rule annotation and am interested to see how this gets combined together. |
lewismc
left a comment
There was a problem hiding this comment.
Trivial header requests. Thanks
src/plugin/protocol-smb/src/java/org/apache/nutch/protocol/smb/URLAuthentication.java
Show resolved
Hide resolved
lewismc
left a comment
There was a problem hiding this comment.
@HiranChaudhuri
Thanks for coming back to this PR, and thanks for the contribution.
Please see my comments, I am happy to work with you in a bid to get this PR into the Nutch codebase.
| filePattern="${hadoop.log.dir}/$${date:yyyy-MM}/nutch-%d{yyyy-MM-dd}.log.gz"> | ||
| <PatternLayout pattern="%d %p %c{1.} [%t] %m%n" /> | ||
| <!--<PatternLayout pattern="%d %p %c{1.} [%t] %m%n" />--> | ||
| <PatternLayout pattern="%d %p %c [%t] %m%n" /> |
| <dependencies> | ||
| <dependency org="com.hierynomus" name="smbj" rev="0.13.0"/> | ||
| <!-- | ||
| These dependencies are either contained in smbj (transitive) or |
There was a problem hiding this comment.
We can just remove these comments.
| @Before | ||
| public void setUp() { | ||
| LOG.warn("setUp()"); | ||
| Assert.fail(); |
There was a problem hiding this comment.
These tests cannot be committed in the failing state. Thy will destabilize the CI builds.
Previously I suggested using test containers.
If docker is not available on the host machine when tests are being run then we could use the @Testcontainers(disabledWithoutDocker = true) syntax.
[NUTCH-2856] Implement a protocol-smb plugin based on hierynomus/smbj
Draft version of a protocol-smb plugin. Lots of todo comments still,
but it seems to work.